**Varrer toda a base para encontrar associação entre os produtos**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#Configurações visuais
pd.set_option("display.max_columns", None)
sns.set(style="whitegrid")

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import fpgrowth


In [21]:
order_products_total = pd.read_csv(
    r"C:\Users\olive\OneDrive\Documentos\Dio\Portfólio\analise-compras\data\raw\order_products_total.csv"
)
products = pd.read_csv(
    r"C:\Users\olive\OneDrive\Documentos\Dio\Portfólio\analise-compras\data\raw\products.csv"
)

In [19]:
order_products_total.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered
0,2,33120,1,1
1,2,28985,2,1
2,2,9327,3,0
3,2,45918,4,1
4,2,30035,5,0


In [20]:
order_products_total.product_id.nunique()

49685

De um total de 49.685, os 100 produtos mais frequentes são mantidos

**Análise de Popularidade ou Ranking de Volume** 

O objetivo principal é identificar quais são os "carros-chefes" da plataforma, ou seja, os produtos que aparecem com mais frequência nos carrinhos dos clientes.

frequency (frequência) = o volume de vendas

In [22]:
product_counts = order_products_total.groupby('product_id')['order_id'].count().reset_index().rename(columns = {'order_id':'frequency'})
product_counts = product_counts.sort_values('frequency', ascending=False)[0:100].reset_index(drop = True)
product_counts = product_counts.merge(products, on = 'product_id', how = 'left')
product_counts.head(10)

Unnamed: 0,product_id,frequency,product_name,aisle_id,department_id
0,24852,491291,Banana,24,4
1,13176,394930,Bag of Organic Bananas,24,4
2,21137,275577,Organic Strawberries,24,4
3,21903,251705,Organic Baby Spinach,123,4
4,47209,220877,Organic Hass Avocado,24,4
5,47766,184224,Organic Avocado,24,4
6,47626,160792,Large Lemon,24,4
7,16797,149445,Strawberries,24,4
8,26209,146660,Limes,24,4
9,27845,142813,Organic Whole Milk,84,16


Mantendo os 100 itens mais frequentes no dataframe order_products

In [23]:
freq_products = list(product_counts.product_id)
freq_products[1:10]

[13176, 21137, 21903, 47209, 47766, 47626, 16797, 26209, 27845]

Confirmando que a lista tem os 100 produtos mais frequentes

In [24]:
len(freq_products)

100

Criando o df com os 100 produtos mais frequentes.
Para viabilizar o processamento computacional da análise de afinidade, aplicamos um filtro de inclusão baseado na lista de itens frequentes.

In [29]:
order_products_total = order_products_total[order_products_total.product_id.isin(freq_products)]
order_products_total.shape

(7795471, 4)

In [30]:
order_products_total.order_id.nunique()

2444982

In [31]:
order_products_total = order_products_total.merge(products, on = 'product_id', how='left')
order_products_total.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id
0,2,28985,2,1,Michigan Organic Kale,83,4
1,2,17794,6,1,Carrots,83,4
2,3,24838,2,1,Unsweetened Almondmilk,91,16
3,3,21903,4,1,Organic Baby Spinach,123,4
4,3,46667,6,1,Organic Ginger Root,83,4


Estruturar os dados para alimentar o algoritmo.
O Pivot Estratégico (unstack)

Para a execução da Análise de Cesta de Compras (Market Basket Analysis), os dados transacionais foram remodelados para um formato de matriz de incidência. Através da operação de unstacking, convertemos o log de vendas em um DataFrame de alta dimensionalidade, onde a interseção entre pedidos e produtos é representada de forma binária

In [32]:
basket = order_products_total.groupby(['order_id', 'product_name'])['reordered'].count().unstack().reset_index().fillna(0).set_index('order_id')
basket.head()

product_name,100% Raw Coconut Water,100% Whole Wheat Bread,2% Reduced Fat Milk,Apple Honeycrisp Organic,Asparagus,Bag of Organic Bananas,Banana,Bartlett Pears,Blueberries,Boneless Skinless Chicken Breasts,Broccoli Crown,Bunched Cilantro,Carrots,"Clementines, Bag",Cucumber Kirby,Extra Virgin Olive Oil,Fresh Cauliflower,Garlic,Granny Smith Apples,Grape White/Green Seedless,Grated Parmesan,Green Bell Pepper,Half & Half,Hass Avocados,Honeycrisp Apple,Jalapeno Peppers,Large Alfresco Eggs,Large Lemon,Lime Sparkling Water,Limes,Michigan Organic Kale,Orange Bell Pepper,Organic Avocado,Organic Baby Arugula,Organic Baby Carrots,Organic Baby Spinach,Organic Bartlett Pear,Organic Black Beans,Organic Blackberries,Organic Blueberries,Organic Broccoli,Organic Broccoli Florets,Organic Carrot Bunch,Organic Cilantro,Organic Cucumber,Organic D'Anjou Pears,Organic Fuji Apple,Organic Gala Apples,Organic Garlic,Organic Garnet Sweet Potato (Yam),Organic Ginger Root,Organic Grade A Free Range Large Brown Eggs,Organic Granny Smith Apple,Organic Grape Tomatoes,Organic Half & Half,Organic Hass Avocado,Organic Italian Parsley Bunch,Organic Kiwi,Organic Lacinato (Dinosaur) Kale,Organic Large Extra Fancy Fuji Apple,Organic Lemon,Organic Navel Orange,Organic Peeled Whole Baby Carrots,Organic Raspberries,Organic Red Bell Pepper,Organic Red Onion,Organic Reduced Fat 2% Milk,Organic Reduced Fat Milk,Organic Romaine Lettuce,Organic Small Bunch Celery,Organic Sticks Low Moisture Part Skim Mozzarella String Cheese,Organic Strawberries,Organic Tomato Cluster,Organic Unsweetened Almond Milk,Organic Whole Milk,Organic Whole String Cheese,Organic Yellow Onion,Organic Zucchini,Original Hummus,Pure Irish Butter,Raspberries,Red Onion,Red Peppers,Red Vine Tomato,Roma Tomato,Seedless Red Grapes,Shredded Parmesan,Small Hass Avocado,Soda,Sparkling Lemon Water,Sparkling Natural Mineral Water,Sparkling Water Grapefruit,Spring Water,Strawberries,Uncured Genoa Salami,Unsalted Butter,Unsweetened Almondmilk,Unsweetened Original Almond Breeze Almond Milk,Whole Milk,Yellow Onions
order_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
5,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1 
    
basket = basket.map(encode_units)
basket.head()

product_name,100% Raw Coconut Water,100% Whole Wheat Bread,2% Reduced Fat Milk,Apple Honeycrisp Organic,Asparagus,Bag of Organic Bananas,Banana,Bartlett Pears,Blueberries,Boneless Skinless Chicken Breasts,Broccoli Crown,Bunched Cilantro,Carrots,"Clementines, Bag",Cucumber Kirby,Extra Virgin Olive Oil,Fresh Cauliflower,Garlic,Granny Smith Apples,Grape White/Green Seedless,Grated Parmesan,Green Bell Pepper,Half & Half,Hass Avocados,Honeycrisp Apple,Jalapeno Peppers,Large Alfresco Eggs,Large Lemon,Lime Sparkling Water,Limes,Michigan Organic Kale,Orange Bell Pepper,Organic Avocado,Organic Baby Arugula,Organic Baby Carrots,Organic Baby Spinach,Organic Bartlett Pear,Organic Black Beans,Organic Blackberries,Organic Blueberries,Organic Broccoli,Organic Broccoli Florets,Organic Carrot Bunch,Organic Cilantro,Organic Cucumber,Organic D'Anjou Pears,Organic Fuji Apple,Organic Gala Apples,Organic Garlic,Organic Garnet Sweet Potato (Yam),Organic Ginger Root,Organic Grade A Free Range Large Brown Eggs,Organic Granny Smith Apple,Organic Grape Tomatoes,Organic Half & Half,Organic Hass Avocado,Organic Italian Parsley Bunch,Organic Kiwi,Organic Lacinato (Dinosaur) Kale,Organic Large Extra Fancy Fuji Apple,Organic Lemon,Organic Navel Orange,Organic Peeled Whole Baby Carrots,Organic Raspberries,Organic Red Bell Pepper,Organic Red Onion,Organic Reduced Fat 2% Milk,Organic Reduced Fat Milk,Organic Romaine Lettuce,Organic Small Bunch Celery,Organic Sticks Low Moisture Part Skim Mozzarella String Cheese,Organic Strawberries,Organic Tomato Cluster,Organic Unsweetened Almond Milk,Organic Whole Milk,Organic Whole String Cheese,Organic Yellow Onion,Organic Zucchini,Original Hummus,Pure Irish Butter,Raspberries,Red Onion,Red Peppers,Red Vine Tomato,Roma Tomato,Seedless Red Grapes,Shredded Parmesan,Small Hass Avocado,Soda,Sparkling Lemon Water,Sparkling Natural Mineral Water,Sparkling Water Grapefruit,Spring Water,Strawberries,Uncured Genoa Salami,Unsalted Butter,Unsweetened Almondmilk,Unsweetened Original Almond Breeze Almond Milk,Whole Milk,Yellow Onions
order_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1
1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
5,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [35]:
basket.size

244498200

In [36]:
basket.shape

(2444982, 100)

Criar conjuntos e regras frequentes

In [37]:
frequent_items = apriori(basket, min_support=0.01, use_colnames=True, low_memory=True)
frequent_items.head()



Unnamed: 0,support,itemsets
0,0.016062,frozenset({100% Raw Coconut Water})
1,0.025814,frozenset({100% Whole Wheat Bread})
2,0.0158,frozenset({2% Reduced Fat Milk})
3,0.035694,frozenset({Apple Honeycrisp Organic})
4,0.029101,frozenset({Asparagus})


In [38]:
frequent_items.tail()

Unnamed: 0,support,itemsets
124,0.010235,"frozenset({Organic Strawberries, Organic Blueb..."
125,0.010966,"frozenset({Organic Hass Avocado, Organic Raspb..."
126,0.017314,"frozenset({Organic Strawberries, Organic Hass ..."
127,0.014533,"frozenset({Organic Strawberries, Organic Raspb..."
128,0.01013,"frozenset({Organic Whole Milk, Organic Strawbe..."


In [39]:
frequent_items.shape

(129, 2)

Filtrando as conexões mais fortes entre os produtos, permitindo finalmente responder à pergunta do Marketing: "O que os clientes compram junto com as Bananas?"

In [40]:
rules = association_rules(frequent_items, metric="lift", min_threshold=1)
rules.sort_values('lift', ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
35,frozenset({Large Lemon}),frozenset({Limes}),0.065764,0.059984,0.01186,0.180345,3.006544,1.0,0.007915,1.146843,0.714372,0.104139,0.128041,0.189034
34,frozenset({Limes}),frozenset({Large Lemon}),0.059984,0.065764,0.01186,0.197723,3.006544,1.0,0.007915,1.16448,0.70998,0.104139,0.141248,0.189034
52,frozenset({Organic Strawberries}),frozenset({Organic Raspberries}),0.112711,0.058325,0.014533,0.12894,2.210731,1.0,0.007959,1.081069,0.61723,0.092861,0.074989,0.189057
53,frozenset({Organic Raspberries}),frozenset({Organic Strawberries}),0.058325,0.112711,0.014533,0.249174,2.210731,1.0,0.007959,1.181751,0.581582,0.092861,0.153798,0.189057
36,frozenset({Organic Avocado}),frozenset({Large Lemon}),0.075348,0.065764,0.010538,0.139862,2.126728,1.0,0.005583,1.086147,0.572966,0.080708,0.079314,0.150053
37,frozenset({Large Lemon}),frozenset({Organic Avocado}),0.065764,0.075348,0.010538,0.160244,2.126728,1.0,0.005583,1.101097,0.567088,0.080708,0.091815,0.150053
46,frozenset({Organic Strawberries}),frozenset({Organic Blueberries}),0.112711,0.042956,0.010235,0.090809,2.114024,1.0,0.005394,1.052633,0.593909,0.070378,0.050002,0.164542
47,frozenset({Organic Blueberries}),frozenset({Organic Strawberries}),0.042956,0.112711,0.010235,0.238274,2.114024,1.0,0.005394,1.16484,0.550621,0.070378,0.141513,0.164542
49,frozenset({Organic Raspberries}),frozenset({Organic Hass Avocado}),0.058325,0.090339,0.010966,0.188018,2.081257,1.0,0.005697,1.120298,0.551699,0.079639,0.10738,0.154704
48,frozenset({Organic Hass Avocado}),frozenset({Organic Raspberries}),0.090339,0.058325,0.010966,0.121389,2.081257,1.0,0.005697,1.071777,0.571115,0.079639,0.06697,0.154704


**Uma opção mais simples**

In [None]:
# 1. Identificar as ordens que contêm bananas
ordens_com_banana = df_super[df_super['product_name'].str.contains('Banana', na=False)]['order_id'].unique()

# 2. Filtrar o dataframe original para pegar todos os itens dessas ordens, 
# mas excluir a própria banana para ver o que vem "acompanhando"
df_acompanhamentos = df_super[df_super['order_id'].isin(ordens_com_banana)]
df_acompanhamentos = df_acompanhamentos[~df_acompanhamentos['product_name'].str.contains('Banana', na=False)]

# 3. Contar a frequência dos acompanhantes
top_acompanhantes = df_acompanhamentos['product_name'].value_counts().head(10).reset_index()
top_acompanhantes.columns = ['produto_acompanhante', 'frequencia']

print("Quem compra Banana, também compra:")
print(top_acompanhantes)

**Contagem de frequência**

In [None]:
# 1. Definir os termos que identificam "Banana" (incluindo orgânicas)
termos_banana = ['Banana', 'Bag of Organic Bananas']

# 2. Identificar os IDs dos pedidos que contêm esses produtos
pedidos_com_banana = df[df['product_name'].isin(termos_banana)]['order_id'].unique()

# 3. Filtrar o DataFrame para pegar os "companheiros" de cesto
df_acompanhantes = df[df['order_id'].isin(pedidos_com_banana)]
df_acompanhantes = df_acompanhantes[~df_acompanhantes['product_name'].isin(termos_banana)]

# 4. Rankear os 10 produtos mais frequentes nestes pedidos
ranking_afinidade = df_acompanhantes['product_name'].value_counts().head(10)

print("Produtos com maior afinidade com Bananas:")
print(ranking_afinidade)