# Tema 8: Ejercicio Reglas de Asosciación

BÚSQUEDA DE PATRONES MEDIANTE REGLAS DE ASOCIACIÓN

Utilizando el dataset **IncomeESL** incluido con la librería arules (R), se pide generar
reglas de asociación.

Para ello, previamente deberá depurar el dataset. En particular:
-Revisar que no haya valores omitidos.
-Transformar los factores en valores numéricos. ← no es necesario!!!
-Una vez depurado el dataset, crear la matriz de transacciones usando la
función transactions.

A la hora de ejecutar el algoritmo para obtener las reglas, no olvide establecer
los valores de los parámetros de la función apriori, justificando el motivo de su elección.

Por último, elabore un breve informe resumiendo las reglas obtenidas y
analizando su significado.


Importamos dependencias

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from mlxtend.frequent_patterns import apriori
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import association_rules

## Paso 1: importar datos

In [79]:
## import data
income_raw = pd.read_csv(r"./income_raw.csv",sep=',')

In [80]:
income_raw

Unnamed: 0,Unnamed,income,sex,marital status,age,education,occupation,years in bay area,dual incomes,number in household,number of children,householder status,type of home,ethnic classification,language in home
0,1,75+,female,married,45-54,college (1-3 years),homemaker,>10,no,3,0,own,house,white,
1,2,75+,male,married,45-54,college graduate,homemaker,>10,no,5,2,own,house,white,english
2,3,75+,female,married,25-34,college graduate,professional/managerial,>10,yes,3,1,rent,apartment,white,english
3,4,"[0,10)",female,single,14-17,grades 9-11,student,>10,not married,4,2,live with parents/family,house,white,english
4,5,"[0,10)",female,single,14-17,grades 9-11,student,4-6,not married,4,2,live with parents/family,house,white,english
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8988,8989,"[0,10)",female,single,14-17,grade <9,sales,>10,not married,3,2,live with parents/family,house,white,english
8989,8990,"[10,15)",male,single,18-24,college (1-3 years),professional/managerial,>10,not married,4,0,live with parents/family,house,white,english
8990,8991,"[0,10)",female,single,14-17,grades 9-11,professional/managerial,>10,not married,3,2,live with parents/family,house,white,english
8991,8992,"[20,25)",male,married,55-64,college (1-3 years),laborer,>10,yes,3,1,rent,apartment,white,english


In [81]:
income_raw.describe(include='all')

Unnamed: 0,Unnamed,income,sex,marital status,age,education,occupation,years in bay area,dual incomes,number in household,number of children,householder status,type of home,ethnic classification,language in home
count,8993.0,8993,8993,8833,8993,8907,8857,8080,8993,8618.0,8993.0,8753,8636,8925,8634
unique,,9,2,5,7,6,9,5,3,9.0,10.0,3,5,8,3
top,,"[0,10)",female,single,25-34,college (1-3 years),professional/managerial,>10,not married,2.0,0.0,rent,house,white,english
freq,,1745,4918,3654,2249,3066,2820,5182,5438,2664.0,5724.0,3670,5073,5811,7794
mean,4497.0,,,,,,,,,,,,,,
std,2596.199819,,,,,,,,,,,,,,
min,1.0,,,,,,,,,,,,,,
25%,2249.0,,,,,,,,,,,,,,
50%,4497.0,,,,,,,,,,,,,,
75%,6745.0,,,,,,,,,,,,,,


## Paso 2: explorar y procesar datos

Tenemos que eliminar los registros que no estén completos.

In [82]:
#remove first column
#income_raw.drop('Unnamed', axis=1, inplace=True)
#income_raw.head()

In [83]:
#rename first column
income_raw.rename(columns={'Unnamed':'id'}, inplace=True)
income_raw.head()

Unnamed: 0,id,income,sex,marital status,age,education,occupation,years in bay area,dual incomes,number in household,number of children,householder status,type of home,ethnic classification,language in home
0,1,75+,female,married,45-54,college (1-3 years),homemaker,>10,no,3,0,own,house,white,
1,2,75+,male,married,45-54,college graduate,homemaker,>10,no,5,2,own,house,white,english
2,3,75+,female,married,25-34,college graduate,professional/managerial,>10,yes,3,1,rent,apartment,white,english
3,4,"[0,10)",female,single,14-17,grades 9-11,student,>10,not married,4,2,live with parents/family,house,white,english
4,5,"[0,10)",female,single,14-17,grades 9-11,student,4-6,not married,4,2,live with parents/family,house,white,english


In [84]:
#remove now no complete records
income_complete = income_raw.dropna(axis=0, inplace=False)

In [85]:
income_complete.describe(include='all')

Unnamed: 0,id,income,sex,marital status,age,education,occupation,years in bay area,dual incomes,number in household,number of children,householder status,type of home,ethnic classification,language in home
count,6876.0,6876,6876,6876,6876,6876,6876,6876,6876,6876.0,6876.0,6876,6876,6876,6876
unique,,9,2,5,7,6,9,5,3,9.0,10.0,3,5,8,3
top,,"[0,10)",female,single,25-34,college (1-3 years),professional/managerial,>10,not married,2.0,0.0,rent,house,white,english
freq,,1255,3809,2813,1768,2407,2333,4446,4114,2156.0,4276.0,2882,4102,4605,6277
mean,4515.674666,,,,,,,,,,,,,,
std,2570.738596,,,,,,,,,,,,,,
min,2.0,,,,,,,,,,,,,,
25%,2350.75,,,,,,,,,,,,,,
50%,4593.5,,,,,,,,,,,,,,
75%,6683.25,,,,,,,,,,,,,,


In [86]:
income_complete.head()

Unnamed: 0,id,income,sex,marital status,age,education,occupation,years in bay area,dual incomes,number in household,number of children,householder status,type of home,ethnic classification,language in home
1,2,75+,male,married,45-54,college graduate,homemaker,>10,no,5,2,own,house,white,english
2,3,75+,female,married,25-34,college graduate,professional/managerial,>10,yes,3,1,rent,apartment,white,english
3,4,"[0,10)",female,single,14-17,grades 9-11,student,>10,not married,4,2,live with parents/family,house,white,english
4,5,"[0,10)",female,single,14-17,grades 9-11,student,4-6,not married,4,2,live with parents/family,house,white,english
5,6,"[50,75)",male,married,55-64,college (1-3 years),retired,>10,no,2,0,own,house,white,english


In [93]:
# Total number of transactions and ítems
for item in income_complete:
    print(f"Total items {item}: {income_complete[item].nunique()}")

Total items id: 6876
Total items income: 9
Total items sex: 2
Total items marital status: 5
Total items age: 7
Total items education: 6
Total items occupation: 9
Total items years in bay area: 5
Total items dual incomes: 3
Total items number in household: 9
Total items number of children: 10
Total items householder status: 3
Total items type of home: 5
Total items ethnic classification: 8
Total items language in home: 3


In [94]:
income_complete.groupby('id')['income'].apply(list)


id
2           [75+]
3           [75+]
4        [[0,10)]
5        [[0,10)]
6       [[50,75)]
          ...    
8989     [[0,10)]
8990    [[10,15)]
8991     [[0,10)]
8992    [[20,25)]
8993    [[30,40)]
Name: income, Length: 6876, dtype: object

In [76]:
encoder = TransactionEncoder()
transaccitions = encoder.fit(income_complete).transform(income_complete)

In [78]:
transaccitions_df = pd.DataFrame(transaccitions, columns=encoder.columns_)
transaccitions_df

Unnamed: 0,Unnamed: 1,a,b,c,d,e,f,g,h,i,...,m,n,o,p,r,s,t,u,x,y
0,False,False,False,True,False,True,False,False,False,True,...,True,True,True,False,False,False,False,False,False,False
1,False,False,False,False,False,True,False,False,False,False,...,False,False,False,False,False,True,False,False,True,False
2,True,True,False,False,False,False,False,False,False,True,...,True,False,False,False,True,True,True,True,False,False
3,False,True,False,False,False,True,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,True,False,True,True,True,False,False,False,True,...,False,True,True,False,False,False,True,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6871,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
6872,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
6873,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
6874,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
