<a href="https://colab.research.google.com/github/facuhrodriguez/Inteligencia-Artificial/blob/master/Reglas_de_asociaci%C3%B3n.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Trabajo Práctico - Reglas de asociación**

***Rodríguez, Facundo Hernán***

# Descripción del dataset


El dataset seleccionado (*mushrooms*) describe hongos en relación a sus características físicas. Están clasificados por: venenosos o comestibles.

El dataset incluye descripciones de hipotéticas muestras correspondientes a 23 especies de hongos Agaricales de las familias Agaricus y Lepiota. Cada especie es definida como definitivamente comestible, definitivamente venenosa o desconocida comestibilidad y no recomendada. 

# Objetivo


El objetivo de aplicar la técnica de Reglas de Asociación en este dataset es encontrar relaciones entre las características de aquellos hongos clasificados como venenosos y entre los hongos clasificados como comestibles. 

Se esperan encontrar patrones entre las características que puedan indicar con un cierto nivel de confianza, cuáles son las que están involucradas en cada clasificación. 

Por ejemplo, si un hongo posee un valor de *stalk-surface-above-ring* (superficie del tallo sobre el anillo) igual a *smooth* (suave) y un valor de *gill-size* (tamaño de las branquias) igual a *broad* (ancha), es muy probable que ese hongo sea comestible. 

Por otro lado, si un hongo posee un valor de *bruises* (contusión) igual a f (sin contusiones) y un valor de *gill-size* (tamaño de las branquias) igual a *narrow* (estrecha), es muy probable que ese hongo sea venenoso.


---


Nota: *Este análisis surge de la descripción del dataset original (https://www.openml.org/d/24)* 

# Pre-procesamiento de los datos

Como primer paso para el preprocesamiento de los datos, se carga el dataset correspondiente y se define la estructura del dataframe.


Cargamos la librería que nos permite acceder al dataset

In [117]:
# soporte para cargar dataset de https://www.openml.org/
!pip install openml
import openml



Accedemos al dataset y definimos el dataframe.

In [4]:
import pandas as pd

# indicamos cual dataset queremos utilizar, en este caso el nro. 24
dataset = openml.datasets.get_dataset(24)

# separamos las información almacenada en el dataset
X, y, categorical_indicator, attribute_names = dataset.get_data(
    dataset_format='dataframe',
    target=dataset.default_target_attribute
)

#  concatenamos la información relevante en un único DataFrame 
df = pd.concat([X, y], axis=1)
df

Unnamed: 0,cap-shape,cap-surface,cap-color,bruises%3F,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,stalk-root,stalk-surface-above-ring,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat,class
0,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u,p
1,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g,e
2,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m,e
3,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u,p
4,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g,e
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8119,k,s,n,f,n,a,c,b,y,e,,s,s,o,o,p,o,o,p,b,c,l,e
8120,x,s,n,f,n,a,c,b,y,e,,s,s,o,o,p,n,o,p,b,v,l,e
8121,f,s,n,f,n,a,c,b,n,e,,s,s,o,o,p,o,o,p,b,c,l,e
8122,k,y,n,f,y,f,c,n,b,t,,s,k,w,w,p,w,o,e,w,v,l,p


En este caso como el dataset cuenta con atributos nominales, utilizamos la función *get_dummies()* de la librería pandas, para trabajar con un formato  one-hot encoded.




In [5]:
df_dummies = pd.get_dummies(df, columns =  df.columns  )
df_dummies

Unnamed: 0,cap-shape_b,cap-shape_c,cap-shape_f,cap-shape_k,cap-shape_s,cap-shape_x,cap-surface_f,cap-surface_g,cap-surface_s,cap-surface_y,cap-color_b,cap-color_c,cap-color_e,cap-color_g,cap-color_n,cap-color_p,cap-color_r,cap-color_u,cap-color_w,cap-color_y,bruises%3F_f,bruises%3F_t,odor_a,odor_c,odor_f,odor_l,odor_m,odor_n,odor_p,odor_s,odor_y,gill-attachment_a,gill-attachment_d,gill-attachment_f,gill-attachment_n,gill-spacing_c,gill-spacing_d,gill-spacing_w,gill-size_b,gill-size_n,...,veil-type_u,veil-color_n,veil-color_o,veil-color_w,veil-color_y,ring-number_n,ring-number_o,ring-number_t,ring-type_c,ring-type_e,ring-type_f,ring-type_l,ring-type_n,ring-type_p,ring-type_s,ring-type_z,spore-print-color_b,spore-print-color_h,spore-print-color_k,spore-print-color_n,spore-print-color_o,spore-print-color_r,spore-print-color_u,spore-print-color_w,spore-print-color_y,population_a,population_c,population_n,population_s,population_v,population_y,habitat_d,habitat_g,habitat_l,habitat_m,habitat_p,habitat_u,habitat_w,class_e,class_p
0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,1,...,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1
1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,...,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0
2,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,1,0,...,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0
3,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,1,...,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1
4,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,...,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8119,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,...,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0
8120,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,...,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0
8121,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,...,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0
8122,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,1,...,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1


Como se puede ver, ahora el dataset cuenta con mayor cantidad de columnas, ya que cada columna tendrá unicamente valores binarios, a diferencia del dataset anterior donde cada columna podía tomar distintos valores. 

Por ejemplo la columna *cap_shape* (forma del hongo) del dataset original podía tener los siguientes valores:  


*   b : bell
*   c : conical
*   x : convex
*   f : flat
*   k : knobbed
*   s : sunken

Una vez aplicada la función *get_dummies()* la columna se transforma en múltiples columnas que representan las valores posibles que podía tomar dicha columna original. 

Así, *cap_shape* se divide en las siguientes columnas, donde cada una de ellas puede tener un valor binario (1 si el hongo tiene la forma indicada en esa característica o 0 en caso contrario) : 

  * cap_shape_b
  * cap_shape_c
  * cap_shape_x
  * cap_shape_f
  * cap_shape_k
  * cap_shape_s

El proceso se repite para cada una de las columnas del dataset original.


## Extracción de ítemsets frecuentes

Una vez actualizado el dataset, procedemos a encontrar los ítemsets frecuentes. Esto permitirá establecer cuáles son los ítemsets que tienen un soporte mayor al mínimo establecido y posteriormente generar reglas deseadas.

En este caso, como el conjunto de datos es relativamente grande, establecemos un valor de soporte bajo (0.3) que permita encontrar relaciones entre aquellos hongos comestibles y entre hongos venenosos. De esta manera, elimino la posibilidad de que hayan aparecido reglas que no aporten información relevante al problema.

In [63]:
from mlxtend.frequent_patterns import apriori

frequent_itemsets = apriori(df_dummies, min_support=0.3, use_colnames=True )
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))

frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.387986,(cap-shape_f),1
1,0.450025,(cap-shape_x),1
2,0.314623,(cap-surface_s),1
3,0.399311,(cap-surface_y),1
4,0.584441,(bruises%3F_f),1
...,...,...,...
2728,0.311177,"(stalk-surface-above-ring_s, ring-type_p, ring...",8
2729,0.303299,"(gill-size_b, stalk-surface-below-ring_s, stal...",9
2730,0.317085,"(gill-size_b, stalk-surface-above-ring_s, ring...",9
2731,0.303299,"(gill-size_b, stalk-surface-above-ring_s, ring...",9


# Generación de Reglas de Asociación

Luego de obtener los ítemsets frecuentes, procedemos a encontrar reglas de asociación. Estas reglas permitirán descubrir información en común entre los distintos ítems(hongos) que forman los ítemsets frecuentes.

Para esto, utilizamos la función *association_rules* de la librería *mlxtend*, utilizando la métrica confianza(confidence) con un umbral mínimo de 0.8. 

Utilizamos la confianza para poder establecer el grado de relación entre los ítems encontrados. El valor mínimo establecido es de 0.8, ya que queremos encontrar relaciones fuertes entre los distintos ítems, y un valor bajo de ésta podría provocar la existencia  de reglas donde la relaciones entre antecedentes y consecuentes sean nulas.


In [90]:
from mlxtend.frequent_patterns import association_rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.8)
rules


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(cap-shape_f),(gill-attachment_f),0.387986,0.974151,0.381339,0.982868,1.008949,0.003382,1.508835
1,(cap-shape_f),(gill-spacing_c),0.387986,0.838503,0.332349,0.856599,1.021581,0.007021,1.126190
2,(cap-shape_f),(veil-type_p),0.387986,1.000000,0.387986,1.000000,1.000000,0.000000,inf
3,(cap-shape_f),(veil-color_w),0.387986,0.975382,0.381832,0.984137,1.008976,0.003397,1.551945
4,(cap-shape_f),(ring-number_o),0.387986,0.921713,0.371738,0.958122,1.039501,0.014126,1.869388
...,...,...,...,...,...,...,...,...,...
43552,"(ring-number_o, gill-attachment_f, bruises%3F_t)","(stalk-surface-below-ring_s, stalk-surface-abo...",0.379124,0.352536,0.316100,0.833766,2.365055,0.182446,3.894902
43553,"(ring-number_o, gill-spacing_c, bruises%3F_t)","(stalk-surface-below-ring_s, stalk-surface-abo...",0.366322,0.386017,0.316100,0.862903,2.235404,0.174694,4.478466
43554,"(stalk-surface-below-ring_s, bruises%3F_t)","(stalk-surface-above-ring_s, ring-type_p, veil...",0.374200,0.372230,0.316100,0.844737,2.269392,0.176812,4.043262
43555,"(ring-type_p, bruises%3F_t)","(stalk-surface-below-ring_s, stalk-surface-abo...",0.391925,0.392910,0.316100,0.806533,2.052717,0.162109,3.137946


# Post-Procesamiento de las Reglas de Asociación

Como paso siguiente, se analizan las reglas de asociación generadas. El objetivo en esta sección es encontrar reglas útiles e interesantes, que aporten información significativa al problema.

En este caso, nuestro objetivo planteado inicialmente fue encontrar relaciones entre los hongos clasificados como venenosos y entre aquellos clasificados como comestibles. Es por eso que solamente nos quedaremos con aquellas reglas que su antecedente sea *class_p* (venenoso) o *class_e* (comestible).

In [108]:
rules_pe = rules[(rules['antecedents'] == {'class_e'}) |
                 (rules['antecedents'] == {'class_p'}) ]
rules_pe['consequent_len'] = rules['consequents'].apply(lambda x : len(x))
rules_pe

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
21,(class_p),(bruises%3F_f),0.482029,0.584441,0.405219,0.840654,1.438389,0.123502,2.607898,1
41,(class_e),(odor_n),0.517971,0.434269,0.419498,0.809886,1.864941,0.194559,2.975746,1
63,(class_e),(gill-attachment_f),0.517971,0.974151,0.494338,0.954373,0.979697,-0.010244,0.566531,1
64,(class_p),(gill-attachment_f),0.482029,0.974151,0.479813,0.995403,1.021817,0.010244,5.623667,1
80,(class_p),(gill-spacing_c),0.482029,0.838503,0.468242,0.971399,1.158492,0.064060,5.646620,1
...,...,...,...,...,...,...,...,...,...,...
14057,(class_p),"(bruises%3F_f, veil-color_w, gill-spacing_c, g...",0.482029,0.405711,0.387986,0.804903,1.983930,0.192422,3.046118,5
14312,(class_p),"(bruises%3F_f, veil-color_w, veil-type_p, gill...",0.482029,0.517971,0.399803,0.829418,1.601281,0.150126,2.825784,5
14435,(class_p),"(bruises%3F_f, veil-color_w, gill-spacing_c, v...",0.482029,0.405711,0.387986,0.804903,1.983930,0.192422,3.046118,5
22843,(class_p),"(veil-color_w, gill-spacing_c, veil-type_p, gi...",0.482029,0.772033,0.454948,0.943820,1.222512,0.082806,4.057804,5


De las reglas post procesadas, divideremos entre las reglas de los hongos comestibles(rules_e) y entre las reglas de los hongos venenosos(rules_p)

In [109]:
rules_e = rules_pe[(rules_pe['antecedents'].apply(lambda x : 'class_e' in x))]
rules_e

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
41,(class_e),(odor_n),0.517971,0.434269,0.419498,0.809886,1.864941,0.194559,2.975746,1
63,(class_e),(gill-attachment_f),0.517971,0.974151,0.494338,0.954373,0.979697,-0.010244,0.566531,1
87,(class_e),(gill-size_b),0.517971,0.690793,0.482521,0.931559,1.348536,0.12471,4.517862,1
107,(class_e),(stalk-surface-above-ring_s),0.517971,0.637125,0.448055,0.865019,1.357692,0.118043,2.688345,1
112,(class_e),(stalk-surface-below-ring_s),0.517971,0.607582,0.418513,0.807985,1.329836,0.103803,2.043679,1
128,(class_e),(veil-type_p),0.517971,1.0,0.517971,1.0,1.0,0.0,inf,1
136,(class_e),(veil-color_w),0.517971,0.975382,0.494338,0.954373,0.978461,-0.010882,0.539554,1
142,(class_e),(ring-number_o),0.517971,0.921713,0.452979,0.874525,0.948803,-0.024442,0.62392,1
520,(class_e),"(odor_n, veil-type_p)",0.517971,0.434269,0.419498,0.809886,1.864941,0.194559,2.975746,2
601,(class_e),"(gill-size_b, gill-attachment_f)",0.517971,0.664943,0.458887,0.885932,1.332341,0.114466,2.93733,2


In [110]:
rules_p = rules_pe[(rules_pe['antecedents'].apply(lambda x : 'class_p' in x)) ]
rules_p

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
21,(class_p),(bruises%3F_f),0.482029,0.584441,0.405219,0.840654,1.438389,0.123502,2.607898,1
64,(class_p),(gill-attachment_f),0.482029,0.974151,0.479813,0.995403,1.021817,0.010244,5.623667,1
80,(class_p),(gill-spacing_c),0.482029,0.838503,0.468242,0.971399,1.158492,0.064060,5.646620,1
129,(class_p),(veil-type_p),0.482029,1.000000,0.482029,1.000000,1.000000,0.000000,inf,1
137,(class_p),(veil-color_w),0.482029,0.975382,0.481044,0.997957,1.023145,0.010882,12.050714,1
...,...,...,...,...,...,...,...,...,...,...
14057,(class_p),"(bruises%3F_f, veil-color_w, gill-spacing_c, g...",0.482029,0.405711,0.387986,0.804903,1.983930,0.192422,3.046118,5
14312,(class_p),"(bruises%3F_f, veil-color_w, veil-type_p, gill...",0.482029,0.517971,0.399803,0.829418,1.601281,0.150126,2.825784,5
14435,(class_p),"(bruises%3F_f, veil-color_w, gill-spacing_c, v...",0.482029,0.405711,0.387986,0.804903,1.983930,0.192422,3.046118,5
22843,(class_p),"(veil-color_w, gill-spacing_c, veil-type_p, gi...",0.482029,0.772033,0.454948,0.943820,1.222512,0.082806,4.057804,5


Sin embargo, esto no es suficiente debido a que hay reglas que no aportan información relevante.

Para poder refinar las reglas utilizamos algunas métricas adicionales que provee la función *asociation_rules()*. En este caso utilizamos *lift* que indica la proporción entre el soporte observado de un conjunto de
ítems respecto del soporte teórico de ese conjunto dado el supuesto de
independencia, y la métrica *conviction* que indica el grado de relación entre el antecedente y el consecuente.

Entonces, utilizaremos un valor de lift > 1 y un valor de conviction > 1.

In [111]:
rules_pp = rules_p[(rules_p['conviction'] > 2) &
                   (rules_p['lift'] > 1) &
                   (rules_p['consequent_len'] > 1)]
rules_pp

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
254,(class_p),"(bruises%3F_f, gill-attachment_f)",0.482029,0.558592,0.403003,0.836057,1.496723,0.133746,2.692452,2
262,(class_p),"(gill-spacing_c, bruises%3F_f)",0.482029,0.435746,0.392418,0.814096,1.868281,0.182376,3.035191,2
283,(class_p),"(bruises%3F_f, veil-type_p)",0.482029,0.584441,0.405219,0.840654,1.438389,0.123502,2.607898,2
293,(class_p),"(bruises%3F_f, veil-color_w)",0.482029,0.559823,0.404234,0.838611,1.497993,0.134384,2.727427,2
300,(class_p),"(ring-number_o, bruises%3F_f)",0.482029,0.54259,0.400788,0.831461,1.532393,0.139244,2.713967,2
578,(class_p),"(gill-spacing_c, gill-attachment_f)",0.482029,0.812654,0.466027,0.966803,1.189686,0.074304,5.643442,2
736,(class_p),"(veil-type_p, gill-attachment_f)",0.482029,0.974151,0.479813,0.995403,1.021817,0.010244,5.623667,2
760,(class_p),"(gill-attachment_f, veil-color_w)",0.482029,0.973166,0.478828,0.993361,1.020751,0.009734,4.041624,2
778,(class_p),"(ring-number_o, gill-attachment_f)",0.482029,0.89808,0.468735,0.972421,1.082778,0.035835,3.695552,2
886,(class_p),"(gill-spacing_c, veil-type_p)",0.482029,0.838503,0.468242,0.971399,1.158492,0.06406,5.64662,2


In [112]:
rules_ppe = rules_e[(rules_e['conviction'] > 2 ) 
                    & (rules_e['lift'] > 1)
                    & (rules_e['consequent_len'] > 1)]
rules_ppe

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
520,(class_e),"(odor_n, veil-type_p)",0.517971,0.434269,0.419498,0.809886,1.864941,0.194559,2.975746,2
601,(class_e),"(gill-size_b, gill-attachment_f)",0.517971,0.664943,0.458887,0.885932,1.332341,0.114466,2.93733,2
666,(class_e),"(stalk-surface-above-ring_s, gill-attachment_f)",0.517971,0.613491,0.424421,0.819392,1.335622,0.106651,2.14004,2
950,(class_e),"(stalk-surface-above-ring_s, gill-size_b)",0.517971,0.442147,0.415559,0.802281,1.814514,0.186539,2.82145,2
979,(class_e),"(gill-size_b, veil-type_p)",0.517971,0.690793,0.482521,0.931559,1.348536,0.12471,4.517862,2
987,(class_e),"(gill-size_b, veil-color_w)",0.517971,0.667159,0.458887,0.885932,1.327917,0.113318,2.917906,2
1096,(class_e),"(stalk-surface-above-ring_s, veil-type_p)",0.517971,0.637125,0.448055,0.865019,1.357692,0.118043,2.688345,2
1105,(class_e),"(stalk-surface-above-ring_s, veil-color_w)",0.517971,0.613491,0.424421,0.819392,1.335622,0.106651,2.14004,2
1131,(class_e),"(veil-type_p, stalk-surface-below-ring_s)",0.517971,0.607582,0.418513,0.807985,1.329836,0.103803,2.043679,2
3375,(class_e),"(gill-size_b, veil-type_p, gill-attachment_f)",0.517971,0.664943,0.458887,0.885932,1.332341,0.114466,2.93733,3


Sin embargo, analizando el conjunto de datos del problema (https://www.openml.org/d/24), hay características cuyas instancias no son representativas, dado que ambas clases (venenosos y comestibles) comparten el mismo valor.

Por ejemplo, la característica *veil_color* posee 4 valores posibles, pero las muestras casi en su totalidad corresponden al valor *white*. Más detalladamente, de las 4208 muestras de clase e, 4016 (aproximandamente el 95%) corresponden al valor white. De igual manera, de las 3916 muestras de clase p, 3908 (aproximadamente el 99%) corresponden al valor white.

Lo mismo ocurre con la característica *veil_type*. 

Entonces, para que las reglas aporten información relevante, filtramos aquellas reglas donde aparezcan estas dos características.*texto en cursiva*


In [113]:
rules_npe = rules_ppe[rules_ppe['consequents'].apply(lambda consequents: 'veil-type_p' not in consequents) &
          rules_ppe['consequents'].apply(lambda consequents: 'veil-color_w' not in consequents) &
          rules_ppe['consequents'].apply(lambda consequents: 'gill-attachment_f' not in consequents)]
rules_npe

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
950,(class_e),"(stalk-surface-above-ring_s, gill-size_b)",0.517971,0.442147,0.415559,0.802281,1.814514,0.186539,2.82145,2


In [114]:
rules_npp = rules_pp[rules_pp['consequents'].apply(lambda consequents: 'veil-type_p' not in consequents) &
          rules_pp['consequents'].apply(lambda consequents: 'veil-color_w' not in consequents) &
          rules_pp['consequents'].apply(lambda consequents: 'gill-attachment_f' not in consequents)]
rules_npp

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
262,(class_p),"(gill-spacing_c, bruises%3F_f)",0.482029,0.435746,0.392418,0.814096,1.868281,0.182376,3.035191,2
300,(class_p),"(ring-number_o, bruises%3F_f)",0.482029,0.54259,0.400788,0.831461,1.532393,0.139244,2.713967,2
917,(class_p),"(gill-spacing_c, ring-number_o)",0.482029,0.795667,0.454948,0.94382,1.1862,0.071414,3.637125,2
1620,(class_p),"(gill-spacing_c, bruises%3F_f, ring-number_o)",0.482029,0.429345,0.387986,0.804903,1.874722,0.18103,2.92498,3


# Ánalisis de las Reglas de Asociación obtenidas

En esta etapa se analizan los resultados de las reglas de asociación generadas en el paso anterior. 

En el caso de los hongos venenosos se obtuvieron estas reglas:

In [115]:
rules_npp

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
262,(class_p),"(gill-spacing_c, bruises%3F_f)",0.482029,0.435746,0.392418,0.814096,1.868281,0.182376,3.035191,2
300,(class_p),"(ring-number_o, bruises%3F_f)",0.482029,0.54259,0.400788,0.831461,1.532393,0.139244,2.713967,2
917,(class_p),"(gill-spacing_c, ring-number_o)",0.482029,0.795667,0.454948,0.94382,1.1862,0.071414,3.637125,2
1620,(class_p),"(gill-spacing_c, bruises%3F_f, ring-number_o)",0.482029,0.429345,0.387986,0.804903,1.874722,0.18103,2.92498,3


Y en el caso de los hongos comestibles:

In [116]:
rules_npe

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len
950,(class_e),"(stalk-surface-above-ring_s, gill-size_b)",0.517971,0.442147,0.415559,0.802281,1.814514,0.186539,2.82145,2


Analizando los datos obtenidos, las reglas generadas determinan que: 

*   Para que el hongo sea venenoso, tiene que tener las siguientes características: 

      *   *gill_spacing* = close,
      *   *bruises%3F* = no,
      *   *ring_number* = one


Ejemplos gráficos: 

 ![texto alternativo](https://image.slidesharecdn.com/mushroomtutorial-111116143440-phpapp02/95/mushroom-tutorial-httprjdataminingweeblycom-9-728.jpg?cb=1321475126)


![texto alternativo](https://image.slidesharecdn.com/mushroomtutorial-111116143440-phpapp02/95/mushroom-tutorial-httprjdataminingweeblycom-6-728.jpg?cb=1321475126)

*   En el caso de que el hongo sea comestible, tiene las siguientes características: 
      *   *stalk-surface-above-ring* = smooth,
      *   *gill-size* = broad


Ejemplos gráficos:

![texto alternativo](https://image.slidesharecdn.com/mushroomtutorial-111116143440-phpapp02/95/mushroom-tutorial-httprjdataminingweeblycom-13-728.jpg?cb=1321475126)

![texto alternativo](https://image.slidesharecdn.com/mushroomtutorial-111116143440-phpapp02/95/mushroom-tutorial-httprjdataminingweeblycom-8-728.jpg?cb=1321475126)


# Uso futuro


En un futuro, las reglas de asociación obtenidas podrán servir como estrategias para evitar problemas a la hora de elegir hongos para su recolección y posterior uso comestible. 

El recolector comparará cada característica del hongo con las características obtenidas en cada regla y eligirá en base a su uso/conveniencia.



