<a href="https://colab.research.google.com/github/Zero1401/Supermarket-Sales-Tracker-Mx/blob/main/asociation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Caso de Negocio

A una empresa vendedora de bebidas le interesa conocer las ventas que más frecuentan los consumidores al momento de comprar un refresco de la MARCA Coca-Cola, por ende nos fijamos en los pedidos que llevan consigo algún producto o dervidao de Coca-Cola, así como en sus pedidos sean mayores a la media de los productos comprados.​

### Preparación de Ambiente

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from collections import defaultdict
import pandas as pd
from mlxtend.frequent_patterns import apriori, fpgrowth, association_rules


  and should_run_async(code)


### Cargar Dataframe

In [None]:
df = pd.read_csv('/data/ventas_bebidas.csv', encoding='latin-1')

  and should_run_async(code)


### Limpieza y Transformación de Datos

In [None]:
df.columns = df.columns.str.lower()
columns_to_drop = ['producto', 'producto_desc', 'monto' , 'precio_unidad', 'contador', 'segmento']
df.drop(columns=columns_to_drop,inplace=True)

  and should_run_async(code)


In [None]:
df['fecha'] = pd.to_datetime(df['fecha'], format='%d-%m-%y')

def obtener_estacion(fecha):
    mes = fecha.month
    dia = fecha.day

    # Invierno: 21 diciembre a 20 marzo
    if (mes == 12 and dia >= 21) or mes in [1, 2] or (mes == 3 and dia <= 20):
        return 'Invierno'
    # Primavera: 21 marzo a 20 junio
    elif (mes == 3 and dia >= 21) or mes in [4, 5] or (mes == 6 and dia <= 20):
        return 'Primavera'
    # Verano: 21 junio a 22 septiembre
    elif (mes == 6 and dia >= 21) or mes in [7, 8] or (mes == 9 and dia <= 22):
        return 'Verano'
    # Otoño: 23 septiembre a 20 diciembre
    elif (mes == 9 and dia >= 23) or mes in [10, 11] or (mes == 12 and dia <= 20):
        return 'Otoño'

df['estacion'] = df['fecha'].apply(obtener_estacion)

df.drop(columns=['fecha'], inplace=True)

  and should_run_async(code)


In [None]:
def categorizar_temperatura(temp):
    if temp < 15:
        return 'Frío'
    elif 15 <= temp < 25:
        return 'Templado'
    elif 25 <= temp < 35:
        return 'Cálido'
    else:
        return 'Muy Cálido'

df['clima'] = df['prom_temperatura'].apply(categorizar_temperatura)

df.drop(columns=['prom_temperatura'], inplace=True)

  and should_run_async(code)


In [None]:
def categorizar_unidades(unidades):
    if unidades <= 50:
        return 'poco'
    elif 51 <= unidades <= 200:
        return 'moderado'
    else:
        return 'mucho'

df['cat_unidades'] = df['unidades'].apply(categorizar_unidades)

df.drop(columns=['unidades'], inplace=True)

  and should_run_async(code)


In [None]:
tienda_cat = defaultdict()
for i, tienda in enumerate(df['tienda'].unique(), start=1):
    tienda_cat[tienda] = f'Tienda {i}'

df['tienda'] = df['tienda'].replace(tienda_cat)

  and should_run_async(code)


### Selección de Variables

In [None]:
df.head()

  and should_run_async(code)


Unnamed: 0,pedido,tienda,sabor,marca,segmento_desc,municipio,estacion,clima,cat_unidades
0,1000244507,Tienda 1,MANZANA,DEL VALLE,EXTRA GRANDE BAJO AUDITABLE,GUADALUPE,Verano,Cálido,poco
1,1000242608,Tienda 1,MANGO,DEL VALLE,EXTRA GRANDE BAJO AUDITABLE,GUADALUPE,Verano,Cálido,poco
2,1000250602,Tienda 2,MANGO,DEL VALLE,MEDIANO BAJO AUDITABLE,SAN NICOLAS DE LOS GZA,Verano,Cálido,poco
3,1000239117,Tienda 3,MANGO,DEL VALLE,GRANDE BAJO AUDITABLE,SAN NICOLAS DE LOS GZA,Verano,Cálido,poco
4,1000244507,Tienda 1,MANGO,DEL VALLE,EXTRA GRANDE BAJO AUDITABLE,GUADALUPE,Verano,Cálido,poco


In [None]:
df['marca_estacion'] = df['marca'] +' - '+ df['estacion']
df['marca_clima'] = df['marca'] + ' - ' + df['clima']
df['marca_cat'] = df['marca'] + ' - ' + df['cat_unidades']

  and should_run_async(code)


## Aprori


### Situación 1:

In [None]:
basket = df.pivot_table(index=['pedido'], columns='marca_estacion', aggfunc='size', fill_value=0)
basket = (basket > 0).astype(int)

# Aplicar el algoritmo Apriori
frequent_itemsets = apriori(basket, min_support=0.21, use_colnames=True)

  and should_run_async(code)


In [None]:
# Generar reglas de asociación
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

  and should_run_async(code)


In [None]:
rules[:]

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(JOYA - Invierno),(COCA-COLA - Invierno),0.223679,0.263806,0.219309,0.980462,3.7166,0.160301,37.679741,0.941539
1,(COCA-COLA - Invierno),(JOYA - Invierno),0.263806,0.223679,0.219309,0.831325,3.7166,0.160301,4.602475,0.992859
2,(COCA-COLA - Primavera),(COCA-COLA LIGHT - Primavera),0.275725,0.234803,0.229241,0.831412,3.540887,0.1645,4.538859,0.990763
3,(COCA-COLA LIGHT - Primavera),(COCA-COLA - Primavera),0.234803,0.275725,0.229241,0.976311,3.540887,0.1645,30.574749,0.937778
4,(COCA-COLA - Primavera),(JOYA - Primavera),0.275725,0.239968,0.235598,0.854467,3.56075,0.169433,5.222397,0.992938
5,(JOYA - Primavera),(COCA-COLA - Primavera),0.239968,0.275725,0.235598,0.981788,3.56075,0.169433,39.769278,0.946224


### Situación 2

In [None]:
basket = df.pivot_table(index=['pedido'], columns='marca_clima', aggfunc='size', fill_value=0)
basket = (basket > 0).astype(int)

# Aplicar el algoritmo Apriori
frequent_itemsets = apriori(basket, min_support=0.43, use_colnames=True)

  and should_run_async(code)


In [None]:
frequent_itemsets

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.546683,(COCA-COLA - Cálido)
1,0.477155,(COCA-COLA LIGHT - Cálido)
2,0.490663,(JOYA - Cálido)
3,0.463647,"(COCA-COLA - Cálido, COCA-COLA LIGHT - Cálido)"
4,0.481128,"(JOYA - Cálido, COCA-COLA - Cálido)"


In [None]:
# Generar reglas de asociación
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

  and should_run_async(code)


In [None]:
rules

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(COCA-COLA - Cálido),(COCA-COLA LIGHT - Cálido),0.546683,0.477155,0.463647,0.84811,1.777431,0.202795,3.442269,0.964865
1,(COCA-COLA LIGHT - Cálido),(COCA-COLA - Cálido),0.477155,0.546683,0.463647,0.97169,1.777431,0.202795,16.012772,0.836558
2,(JOYA - Cálido),(COCA-COLA - Cálido),0.490663,0.546683,0.481128,0.980567,1.793668,0.212891,23.32696,0.868744
3,(COCA-COLA - Cálido),(JOYA - Cálido),0.546683,0.490663,0.481128,0.880087,1.793668,0.212891,4.247558,0.9761


### Situación 3

In [None]:
basket = df.pivot_table(index=['pedido'], columns='marca_cat', aggfunc='size', fill_value=0)
basket = (basket > 0).astype(int)

# Aplicar el algoritmo Apriori
frequent_itemsets = apriori(basket, min_support=0.70, use_colnames=True)

  and should_run_async(code)


In [None]:
frequent_itemsets

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.921335,(COCA-COLA - poco)
1,0.785856,(COCA-COLA LIGHT - poco)
2,0.811681,(JOYA - poco)
3,0.763607,"(COCA-COLA - poco, COCA-COLA LIGHT - poco)"
4,0.794597,"(COCA-COLA - poco, JOYA - poco)"


In [None]:
# Generar reglas de asociación
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

  and should_run_async(code)


In [None]:
rules

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(COCA-COLA - poco),(COCA-COLA LIGHT - poco),0.921335,0.785856,0.763607,0.828806,1.054653,0.039571,1.25088,0.658752
1,(COCA-COLA LIGHT - poco),(COCA-COLA - poco),0.785856,0.921335,0.763607,0.971689,1.054653,0.039571,2.778563,0.241991
2,(COCA-COLA - poco),(JOYA - poco),0.921335,0.811681,0.794597,0.862441,1.062537,0.046767,1.369005,0.748189
3,(JOYA - poco),(COCA-COLA - poco),0.811681,0.921335,0.794597,0.978953,1.062537,0.046767,3.737506,0.312535


# FP-Growth

## Situación 1

In [None]:
from mlxtend.frequent_patterns import fpgrowth

# Crear la tabla de transacciones binaria
basket2 = df.pivot_table(index=['pedido'], columns='marca_estacion', aggfunc='size', fill_value=0)
basket2 = (basket2 > 0).astype(int)

# Aplicar el algoritmo FP-Growth
frequent_itemsets2 = fpgrowth(basket2, min_support=0.21, use_colnames=True)

  and should_run_async(code)


In [None]:
rules2 = association_rules(frequent_itemsets2, metric="confidence", min_threshold=0.5)

  and should_run_async(code)


In [None]:
rules2[:]

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(JOYA - Invierno),(COCA-COLA - Invierno),0.223679,0.263806,0.219309,0.980462,3.7166,0.160301,37.679741,0.941539
1,(COCA-COLA - Invierno),(JOYA - Invierno),0.263806,0.223679,0.219309,0.831325,3.7166,0.160301,4.602475,0.992859
2,(COCA-COLA - Primavera),(JOYA - Primavera),0.275725,0.239968,0.235598,0.854467,3.56075,0.169433,5.222397,0.992938
3,(JOYA - Primavera),(COCA-COLA - Primavera),0.239968,0.275725,0.235598,0.981788,3.56075,0.169433,39.769278,0.946224
4,(COCA-COLA - Primavera),(COCA-COLA LIGHT - Primavera),0.275725,0.234803,0.229241,0.831412,3.540887,0.1645,4.538859,0.990763
5,(COCA-COLA LIGHT - Primavera),(COCA-COLA - Primavera),0.234803,0.275725,0.229241,0.976311,3.540887,0.1645,30.574749,0.937778


## Situación 2

In [None]:
basket2 = df.pivot_table(index=['pedido'], columns='marca_clima', aggfunc='size', fill_value=0)
basket2 = (basket2 > 0).astype(int)

# Aplicar el algoritmo
frequent_itemsets2 = fpgrowth(basket2, min_support=0.43, use_colnames=True)

  and should_run_async(code)


In [None]:
frequent_itemsets2

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.546683,(COCA-COLA - Cálido)
1,0.490663,(JOYA - Cálido)
2,0.477155,(COCA-COLA LIGHT - Cálido)
3,0.481128,"(COCA-COLA - Cálido, JOYA - Cálido)"
4,0.463647,"(COCA-COLA - Cálido, COCA-COLA LIGHT - Cálido)"


In [None]:
rules2 = association_rules(frequent_itemsets2, metric="confidence", min_threshold=0.5)

  and should_run_async(code)


In [None]:
rules2[:]

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(COCA-COLA - Cálido),(JOYA - Cálido),0.546683,0.490663,0.481128,0.880087,1.793668,0.212891,4.247558,0.9761
1,(JOYA - Cálido),(COCA-COLA - Cálido),0.490663,0.546683,0.481128,0.980567,1.793668,0.212891,23.32696,0.868744
2,(COCA-COLA - Cálido),(COCA-COLA LIGHT - Cálido),0.546683,0.477155,0.463647,0.84811,1.777431,0.202795,3.442269,0.964865
3,(COCA-COLA LIGHT - Cálido),(COCA-COLA - Cálido),0.477155,0.546683,0.463647,0.97169,1.777431,0.202795,16.012772,0.836558


## Situación 3

In [None]:
basket2 = df.pivot_table(index=['pedido'], columns='marca_cat', aggfunc='size', fill_value=0)
basket2 = (basket2 > 0).astype(int)

# Aplicar el algoritmo Apriori
frequent_itemsets2 = fpgrowth(basket2, min_support=0.70, use_colnames=True)

  and should_run_async(code)


In [None]:
frequent_itemsets2

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.921335,(COCA-COLA - poco)
1,0.785856,(COCA-COLA LIGHT - poco)
2,0.811681,(JOYA - poco)
3,0.763607,"(COCA-COLA - poco, COCA-COLA LIGHT - poco)"
4,0.794597,"(COCA-COLA - poco, JOYA - poco)"


  and should_run_async(code)



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.



In [None]:
rules2 = association_rules(frequent_itemsets2, metric="confidence", min_threshold=0.5)

  and should_run_async(code)


In [None]:
rules2[:]

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(COCA-COLA - poco),(COCA-COLA LIGHT - poco),0.921335,0.785856,0.763607,0.828806,1.054653,0.039571,1.25088,0.658752
1,(COCA-COLA LIGHT - poco),(COCA-COLA - poco),0.785856,0.921335,0.763607,0.971689,1.054653,0.039571,2.778563,0.241991
2,(COCA-COLA - poco),(JOYA - poco),0.921335,0.811681,0.794597,0.862441,1.062537,0.046767,1.369005,0.748189
3,(JOYA - poco),(COCA-COLA - poco),0.811681,0.921335,0.794597,0.978953,1.062537,0.046767,3.737506,0.312535


## Evaluación de Modelos

Obtenemos resultados sino identicos muy similares, con ligeras variaciones que las primeras muestras solo dse detectan con variaciones en el orden, pero no con el valor de cada atributo respecto a su renglon, por lo tanto ambos algoritmos nos dan resultados de confianza y no hay uno mejor que otro para este caso en particular.

## Evaluación de Resultados

Los resultados indican que los compradores tienen un consumo mayor en temporadas frías coomo son el invierno o temporadas calidas en primavera para manejar la temperatura, por ende hay que enfocarnos más en las temporadas calidas que se combate el calor con los liquidos, así como en el invierno por la cantidad de fisestas presentes que hacen que aumente el consumo.