## En este ejercicio vamos generar reglas de asociación, con el algoritmo a priori ##

<br>
<div class="alert alert-block alert-info">


El  <a href="https://es.wikipedia.org/wiki/Algoritmo_apriori" class="alert-link">algoritmo a priori</a> se usa en el aprendizaje no supervisado, para establecer relaciones entre los objetos. <br>

Tambien se le le llama Basket Case Analisys.
<br>
Se usa como un punto de partida para encontrar patrones ocultos entre las features.
</div>


En este ejercicio, vamos a usar un dataset que hice para el sistema de recomendación de repositorios de github. Cada fila es un usuario, y cada columna es un boolean, si el usuario tiene esa habilidad en sus repos, o en sus favoritos.

### Fases del ejercicio.###

 - Descargar el dataset de github, y cargarlo con pickle.
 - Preprocesado : Eliminar del dataset columnas redundantes.
 - Hallar items frecuentes.
 - Hallar reglas de asociacion

### Descargar el dataset ###

In [1]:
# Descargamos el dataset
! [ ! -f datasets/Users_Tag_Matrix.data ] && \\
wget https://raw.githubusercontent.com/jaimevalero/github-recommendation-engine/master/Users_Tag_Matrix.data.gz -O datasets/Users_Tag_Matrix.data.gz 
#  y descomprimimos    
! [ ! -f datasets/Users_Tag_Matrix.data ] && gunzip ./datasets/Users_Tag_Matrix.data.gz
    


### Cargar el dataset con pickle ###

In [2]:
import pandas as pd
import pickle
    
df = pickle.load( open( "datasets/Users_Tag_Matrix.data", "rb" ) )


    


### Preprocesado : Eliminar del dataset columnas redundantes. ###


In [3]:
# Borramos las columnas no deseadas
COLUMNS_TO_DELETE = ["c","resources","examples","components","iphone","awesome-lists", 
                     "package-manager", "material-design", "systems", "slides", "language",
                    "programming"]
try:
  for column in COLUMNS_TO_DELETE :   del df[column]
except: pass

# Pasamos a boolean
df  = df.astype(bool)

for column in df.columns : print (column)

assembly
batchfile
c#
c++
clojure
coffeescript
css
elixir
emacs lisp
go
haskell
html
java
javascript
jupyter notebook
kotlin
lua
matlab
objective-c
objective-c++
ocaml
perl
php
powershell
purebasic
python
rascal
ruby
rust
scala
shell
swift
tex
typescript
vim script
vue
1-wire
2d
3d
3d-engine
3d-game-engine
accessibility
accordion
acme
acme-client
activejob
activerecord
activity
activity-stream
actor-model
adc
addons
admin
admin-dashboard
admin-template
admin-theme
admin-ui
ado-net
adobe
after-effects
ag
agc
agent
airbnb
airtable
akka
alarm
alerting
algorithm
algorithm-challenges
algorithm-competitions
alignment
amd
analytics
android
android-application
android-architecture
android-cleanarchitecture
android-development
android-interview-questions
android-library
android-testing
android-ui
angular
angular-2
angular-components
angular2
angular4
angularclass
angularjs
angularjs-interview-questions
animation
animation-library
anonymity
ansible
antd
anticensorship
anyconnect
aot
aot-compilat

In [4]:
# Imprime el dataset listo para usarse
df.head()


Unnamed: 0,assembly,batchfile,c#,c++,clojure,coffeescript,css,elixir,emacs lisp,go,...,yeoman-generator,yii,yii2,youtube,zephir,zero-configuration,zeromq,zookeeper,zsh,zsh-configuration
007lva.json,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
06wj.json,False,False,True,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
0bserver07.json,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
0rca.json,False,False,False,False,True,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
0x00A.json,False,False,False,True,False,False,True,False,False,False,...,False,False,False,False,False,False,False,False,False,False


### Hallar items frecuentes. ###




In [8]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from IPython.display import display, HTML

### Generamos los items mas frecuentes, usando el algoritmo apriori
frequent_itemsets = apriori(df, min_support=0.03, use_colnames=True)

metrics = { "support" : """The support metric is defined for itemsets, not assocication rules, and computes the proportion of transactions that contain the antecedant"""  }



for key, value in metrics.items():
    s = f"""<h1> Metric : {key}</h1> <table>
<tr>
<th>Metric</th>
<th>Meaning</th>
</tr>
<tr>
<td>{key}</td>
<td>{value}</td>
</tr></table><br>The rules sort by {key} are :"""
    display(HTML(s))
    display(HTML(frequent_itemsets.sort_values(key,ascending=False).head(20).to_html()))



Metric,Meaning
support,"The support metric is defined for itemsets, not assocication rules, and computes the proportion of transactions that contain the antecedant"


Unnamed: 0,support,itemsets
11,0.796461,[javascript]
193,0.511826,[simple]
132,0.489605,[library]
17,0.459481,[python]
888,0.446366,"[javascript, simple]"
109,0.424088,[framework]
833,0.415268,"[javascript, library]"
4,0.412004,[css]
105,0.405131,[files]
21,0.402096,[shell]


### Hallar reglas de asociación. ###


In [6]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from IPython.display import display, HTML

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1).sort_values("confidence",ascending=False)


### Ordenamos las reglas de asociación, por cada métrica. ###



In [7]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from IPython.display import display, HTML

from IPython.display import display, HTML



metrics = {
            "confidence" :  """The confidence of a rule A->C is the probability of seeing the consequent in a transaction given that it also contains the antecedent. Note that the metric is not symmetric or directed; for instance, the confidence
for A->C is different than the confidence for C->A. """ ,
           "leverage" : """Leverage computes the difference between the observed frequency of A and C appearing together and the
frequency that would be expected if A and C were independent. An leverage value of 0 indicates independence """ ,
           "lift" : """The lift metric is commonly used to measure how much more often the antecedent and consequent of a
rule A->C occur together than we would expect if they were statistically independent. """,
           "conviction" : """A high conviction value means that the consequent is highly depending on the antecedent. For instance, in
the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) """ }

for key, value in metrics.items():
    s = f"""<h1> Metric : {key}</h1> <table>
<tr>
<th>Metric</th>
<th>Meaning</th>
</tr>
<tr>
<td>{key}</td>
<td>{value}</td>
</tr></table><br>The rules sort by {key} are :"""
    display(HTML(s))
    display(HTML(rules.sort_values(key,ascending=False).head(20).to_html()))


Metric,Meaning
confidence,"The confidence of a rule A->C is the probability of seeing the consequent in a transaction given that it also contains the antecedent. Note that the metric is not symmetric or directed; for instance, the confidence for A->C is different than the confidence for C->A."


Unnamed: 0,antecedants,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
20488,"(browser, nodejs)",(javascript),0.036367,0.796461,0.036367,1.0,1.255555,0.007402,inf
167619,"(image, shell, javascript, docker-image)",(docker),0.032014,0.163851,0.032014,1.0,6.103111,0.026769,inf
126125,"(library, web-application, framework)",(web-app),0.031327,0.138709,0.031327,1.0,7.209331,0.026982,inf
1050,(javascript-library),(javascript),0.04994,0.796461,0.04994,1.0,1.255555,0.010165,inf
121799,"(build, simple, angularjs)",(angular),0.030525,0.164424,0.030525,1.0,6.081853,0.025506,inf
108777,"(server, javascript, web-application)",(web-app),0.030353,0.138709,0.030353,1.0,7.209331,0.026143,inf
35674,"(asynchronous, framework)",(async),0.035909,0.10177,0.035909,1.0,9.826111,0.032254,inf
120355,"(image, shell, docker-image)",(docker),0.035737,0.163851,0.035737,1.0,6.103111,0.029881,inf
88883,"(javascript, ruby, web-application)",(web-app),0.037684,0.138709,0.037684,1.0,7.209331,0.032457,inf
126293,"(library, simple, web-application)",(web-app),0.030869,0.138709,0.030869,1.0,7.209331,0.026587,inf


Metric,Meaning
leverage,Leverage computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. An leverage value of 0 indicates independence


Unnamed: 0,antecedants,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
732,(android),(java),0.19157,0.336521,0.154516,0.806577,2.396807,0.090048,3.430195
733,(java),(android),0.336521,0.19157,0.154516,0.459156,2.396807,0.090048,1.494756
1785,(rails),(ruby),0.161388,0.391501,0.149762,0.927963,2.37027,0.086579,8.447044
1784,(ruby),(rails),0.391501,0.161388,0.149762,0.382534,2.37027,0.086579,1.35815
18211,(rails),"(javascript, ruby)",0.161388,0.339041,0.135387,0.838893,2.474309,0.08067,4.102603
18206,"(javascript, ruby)",(rails),0.339041,0.161388,0.135387,0.399324,2.474309,0.08067,1.396114
2219,(angularjs),(angular),0.094897,0.164424,0.094611,0.996982,6.063501,0.079008,276.910028
2218,(angular),(angularjs),0.164424,0.094897,0.094611,0.575409,6.063501,0.079008,2.131706
18210,(ruby),"(javascript, rails)",0.391501,0.145925,0.135387,0.345816,2.369819,0.078258,1.305558
18207,"(javascript, rails)",(ruby),0.145925,0.391501,0.135387,0.927786,2.369819,0.078258,8.426388


Metric,Meaning
lift,The lift metric is commonly used to measure how much more often the antecedent and consequent of a rule A->C occur together than we would expect if they were statistically independent.


Unnamed: 0,antecedants,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
167618,"(image, shell, javascript, docker)",(docker-image),0.056411,0.042609,0.032014,0.567513,13.319004,0.029611,2.213685
167647,(docker-image),"(image, shell, javascript, docker)",0.042609,0.056411,0.032014,0.751344,13.319004,0.029611,3.794756
167626,"(image, shell, docker)","(javascript, docker-image)",0.063169,0.038142,0.032014,0.5068,13.28713,0.029605,1.950238
167639,"(javascript, docker-image)","(image, shell, docker)",0.038142,0.063169,0.032014,0.839339,13.28713,0.029605,5.831114
120354,"(image, shell, docker)",(docker-image),0.063169,0.042609,0.035737,0.56573,13.277162,0.033045,2.204597
120367,(docker-image),"(image, shell, docker)",0.042609,0.063169,0.035737,0.83871,13.277162,0.033045,5.80835
167636,"(shell, docker-image)","(image, javascript, docker)",0.036195,0.068152,0.032014,0.884494,12.978272,0.029547,8.067507
167629,"(image, javascript, docker)","(shell, docker-image)",0.068152,0.036195,0.032014,0.469748,12.978272,0.029547,1.817636
120362,"(shell, docker-image)","(image, docker)",0.036195,0.076112,0.035737,0.987342,12.97214,0.032982,72.987114
120359,"(image, docker)","(shell, docker-image)",0.076112,0.036195,0.035737,0.469526,12.97214,0.032982,1.816875


Metric,Meaning
conviction,"A high conviction value means that the consequent is highly depending on the antecedent. For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1)"


Unnamed: 0,antecedants,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
20488,"(browser, nodejs)",(javascript),0.036367,0.796461,0.036367,1.0,1.255555,0.007402,inf
167619,"(image, shell, javascript, docker-image)",(docker),0.032014,0.163851,0.032014,1.0,6.103111,0.026769,inf
126125,"(library, web-application, framework)",(web-app),0.031327,0.138709,0.031327,1.0,7.209331,0.026982,inf
1050,(javascript-library),(javascript),0.04994,0.796461,0.04994,1.0,1.255555,0.010165,inf
121799,"(build, simple, angularjs)",(angular),0.030525,0.164424,0.030525,1.0,6.081853,0.025506,inf
108777,"(server, javascript, web-application)",(web-app),0.030353,0.138709,0.030353,1.0,7.209331,0.026143,inf
35674,"(asynchronous, framework)",(async),0.035909,0.10177,0.035909,1.0,9.826111,0.032254,inf
120355,"(image, shell, docker-image)",(docker),0.035737,0.163851,0.035737,1.0,6.103111,0.029881,inf
88883,"(javascript, ruby, web-application)",(web-app),0.037684,0.138709,0.037684,1.0,7.209331,0.032457,inf
126293,"(library, simple, web-application)",(web-app),0.030869,0.138709,0.030869,1.0,7.209331,0.026587,inf
