# 1.Contexto do Problema

Durante a pandemia os números de vendas da Razzle Dazzle (e-commerce de variedades) dispararam, bem como o da concorrência. Uma das maneiras de se destacar nesse mercado cada vez mais competitivo é oferecer o produto certo para a pessoa certa.

Então, você foi contratado desenvolver pelo menos um modelo de recomendação, que será disponibilizado no novo site da empresa. Além do modelo, é necessário informar qual o melhor momento para usá-lo, em propagandas, quando o cliente estiver fazendo uma pesquisa ou quando o cliente estiver vendo um produto. É fundamental que isso seja pensado no desenvolvimento do modelo.

# 2. Pacotes Python

In [115]:
import pandas  as pd
import numpy   as np
import seaborn as sns
from google.colab import drive
from sklearn.metrics.pairwise import cosine_similarity
import random

In [116]:
!pip freeze > /content/gdrive/MyDrive/Colab Notebooks/Day7/requirements.txt

In [117]:
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


# 3. Leitura Data Sets

In [118]:
link1 = '/content/gdrive/MyDrive/Colab Notebooks/Day7/DNC_order_items_dataset.csv'
df1 = pd.read_csv(link1,index_col='Unnamed: 0')

In [119]:
df1.head()

Unnamed: 0,order_id,order_item_id,product_id,price
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,58.9
1,00018f77f2f0320c557190d7a144bdd3,1,e5f2d52b802189ee658865ca93d83a8f,239.9
2,000229ec398224ef6ca0657da4fc703e,1,c777355d18b72b67abbeef9df44fd0fd,199.0
3,00024acbcdf0a6daa1e931b038114c75,1,7634da152a4610f1595efa32f14722fc,12.99
4,00042b26cf59d7ce69dfabb4e55b4fd9,1,ac6c3623068f30de03045865e4e10089,199.9


In [120]:
link2 = '/content/gdrive/MyDrive/Colab Notebooks/Day7/DNC_order_reviews_dataset.csv'
df2 = pd.read_csv(link2,index_col='Unnamed: 0')

In [121]:
df2.head()

Unnamed: 0,review_id,order_id,review_score
0,7bc2406110b926393aa56f80a40eba40,73fc7af87114b39712e6da79b0a377eb,4
1,80e641a11e56f04c1ad469d5645fdfde,a548910a1c6147796b98fdf73dbeba33,5
2,228ce5500dc1d8e020d8d1322874b6f0,f9e4b658b201a9f2ecdecbb34bed034b,5
3,e64fb393e7b32834bb789ff8bb30750e,658677c97b385a9be170737859d3511b,5
4,f7c4243c7fe1938f181bec41a392bdeb,8e6bfb81e283fa7e4f11123a3fb894f1,5


In [122]:
link3 = '/content/gdrive/MyDrive/Colab Notebooks/Day7/DNC_orders_dataset.csv'
df3 = pd.read_csv(link3,index_col='Unnamed: 0')

In [123]:
df3.head()

Unnamed: 0,order_id,customer_id,order_status
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered


In [124]:
link4 = '/content/gdrive/MyDrive/Colab Notebooks/Day7/DNC_products_dataset.csv'
df4 = pd.read_csv(link4,index_col='Unnamed: 0')

In [125]:
df4.head()

Unnamed: 0,product_id,product_category_name,product_name_lenght,product_description_lenght,product_weight_g,product_length_cm,product_height_cm,product_width_cm
0,1e9e8ef04dbcff4541ed26657ea517e5,perfumaria,40.0,287.0,225.0,16.0,10.0,14.0
1,3aa071139cb16b67ca9e5dea641aaa2f,artes,44.0,276.0,1000.0,30.0,18.0,20.0
2,96bd76ec8810374ed1b65e291975717f,esporte_lazer,46.0,250.0,154.0,18.0,9.0,15.0
3,cef67bcfe19066a932b7673e239eb23d,bebes,27.0,261.0,371.0,26.0,4.0,26.0
4,9dc1a7de274444849c219cff195d0b71,utilidades_domesticas,37.0,402.0,625.0,20.0,17.0,13.0


# 4. Concatenando Data Sets

In [126]:
df_concat = df1.merge(df2, on=['order_id'], how='inner')
df_concat = df_concat.merge(df3, on=['order_id'], how='inner')
df_concat = df_concat.merge(df4, on=['product_id'], how='inner')
df_concat.head()

Unnamed: 0,order_id,order_item_id,product_id,price,review_id,review_score,customer_id,order_status,product_category_name,product_name_lenght,product_description_lenght,product_weight_g,product_length_cm,product_height_cm,product_width_cm
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,58.9,97ca439bc427b48bc1cd7177abe71365,5,3ce436f183e68e07877b285a838db11a,delivered,cool_stuff,58.0,598.0,650.0,28.0,9.0,14.0
1,130898c0987d1801452a8ed92a670612,1,4244733e06e7ecb4970a6e2683c13e61,55.9,b11cba360bbe71410c291b764753d37f,5,e6eecc5a77de221464d1c4eaff0a9b64,delivered,cool_stuff,58.0,598.0,650.0,28.0,9.0,14.0
2,532ed5e14e24ae1f0d735b91524b98b9,1,4244733e06e7ecb4970a6e2683c13e61,64.9,af01c4017c5ab46df6cc810e069e654a,4,4ef55bf80f711b372afebcb7c715344a,delivered,cool_stuff,58.0,598.0,650.0,28.0,9.0,14.0
3,6f8c31653edb8c83e1a739408b5ff750,1,4244733e06e7ecb4970a6e2683c13e61,58.9,8304ff37d8b16b57086fa283fe0c44f8,5,30407a72ad8b3f4df4d15369126b20c9,delivered,cool_stuff,58.0,598.0,650.0,28.0,9.0,14.0
4,7d19f4ef4d04461989632411b7e588b9,1,4244733e06e7ecb4970a6e2683c13e61,58.9,426f43a82185969503fb3c86241a9535,5,91a792fef70ecd8cc69d3c7feb3d12da,delivered,cool_stuff,58.0,598.0,650.0,28.0,9.0,14.0


# 5. Análise e Tratamento do Data Set Concatenado

In [127]:
df_concat.shape

(112372, 15)

In [128]:
df_concat.dtypes

order_id                       object
order_item_id                   int64
product_id                     object
price                         float64
review_id                      object
review_score                    int64
customer_id                    object
order_status                   object
product_category_name          object
product_name_lenght           float64
product_description_lenght    float64
product_weight_g              float64
product_length_cm             float64
product_height_cm             float64
product_width_cm              float64
dtype: object

In [129]:
df_concat['order_id'] = df_concat['order_id'].astype('string')
df_concat['customer_id'] = df_concat['customer_id'].astype('string')
df_concat['order_status'] = df_concat['order_status'].astype('string')
df_concat['product_category_name'] = df_concat['product_category_name'].astype('string')
df_concat['product_id'] = df_concat['product_id'].astype('string')
df_concat['review_id'] = df_concat['review_id'].astype('string')

In [130]:
df_concat.dtypes

order_id                       string
order_item_id                   int64
product_id                     string
price                         float64
review_id                      string
review_score                    int64
customer_id                    string
order_status                   string
product_category_name          string
product_name_lenght           float64
product_description_lenght    float64
product_weight_g              float64
product_length_cm             float64
product_height_cm             float64
product_width_cm              float64
dtype: object

In [131]:
df_concat.isna().sum()

order_id                         0
order_item_id                    0
product_id                       0
price                            0
review_id                        0
review_score                     0
customer_id                      0
order_status                     0
product_category_name         1598
product_name_lenght           1598
product_description_lenght    1598
product_weight_g                18
product_length_cm               18
product_height_cm               18
product_width_cm                18
dtype: int64

In [132]:
df_concat.dropna(how='any',axis=0,inplace=True)

In [133]:
df_concat.drop_duplicates(subset ="order_id",keep = False, inplace = True)

In [134]:
df = df_concat[['product_id','customer_id','review_score']]

In [135]:
df.describe()

Unnamed: 0,review_score
count,86532.0
mean,4.162587
std,1.281801
min,1.0
25%,4.0
50%,5.0
75%,5.0
max,5.0


# 6. Filtros Colaborativos

## 6.1 Itens Mais Populares -  Somatório de Avaliações


In [136]:
ratings_explicit = df[df['review_score'] != 0]
ratings_sum = pd.DataFrame(ratings_explicit.groupby(['product_id'])['review_score'].sum())
top10 = ratings_sum.sort_values('review_score', ascending = False).head(10)
top10.rename(columns={'review_score': 'review_score_sum'},inplace=True)
top10

Unnamed: 0_level_0,review_score_sum
product_id,Unnamed: 1_level_1
99a4788cb24856965c36a24e339b6058,1547
aca2eb7d00ea1a7b8ebd4e68314663af,1449
d1c427060a0f73f6b889a5c7c61f2ac4,1248
53b36df67ebb7c41585e8d54d6772e08,1237
422879e10f46682990de24d770e7f83d,1065
154e7e31ebfa092203795c972e5804a6,1053
3dd2a17168ec895c781a9191c1e95ad7,1036
2b4609f8948be18874494203496bc318,994
389d119b48cf3043d311335e499d9c6b,954
368c6c730842d78016ad823897a372db,906


## 6.2 Itens Mais Populares -  Melhor Média de Avaliação



In [137]:
ratings_explicit = df[df['review_score'] != 0]
ratings_sum = pd.DataFrame(ratings_explicit.groupby(['product_id'])['review_score'].mean())
top10 = ratings_sum.sort_values('review_score', ascending = False).head(10)
top10.rename(columns={'review_score': 'review_score_mean'},inplace=True)
top10

Unnamed: 0_level_0,review_score_mean
product_id,Unnamed: 1_level_1
00066f42aeeb9f3007548bb9d3f33c38,5.0
652c030867f364b558eb9f7dcbcf608d,5.0
cae58f36738671651f3d19fee286f556,5.0
cae2e38942c8489d9d7a87a3f525c06b,5.0
64f1126c9715d5394b7301934c6833f0,5.0
64fbadb8e3f6a0ac76c38ab230d661f9,5.0
cab49aa7c76189e7e6d55ad8c7f9eb91,5.0
6510b9320992123556a40f98806e512a,5.0
65194d9ad03e8206e3a9848f405942f1,5.0
6520088dce31a24d4fafaf79cfc10baa,5.0


## 6.3 Itens Mais Populares - Maior Quantidade de Usuários que Avaliaram



In [138]:
count = df.groupby(by='product_id', as_index=False) \
  .agg({'customer_id': pd.Series.nunique}) \
  .rename(columns={'customer_id': 'customer_id_count'})
buy_count = count.set_index('product_id')
top10 = buy_count.sort_values('customer_id_count', ascending = False).head(10)
top10

Unnamed: 0_level_0,customer_id_count
product_id,Unnamed: 1_level_1
99a4788cb24856965c36a24e339b6058,388
aca2eb7d00ea1a7b8ebd4e68314663af,345
d1c427060a0f73f6b889a5c7c61f2ac4,288
53b36df67ebb7c41585e8d54d6772e08,287
422879e10f46682990de24d770e7f83d,248
2b4609f8948be18874494203496bc318,246
154e7e31ebfa092203795c972e5804a6,243
3dd2a17168ec895c781a9191c1e95ad7,238
389d119b48cf3043d311335e499d9c6b,225
368c6c730842d78016ad823897a372db,205


## 6.4 Matriz de Recomendação

In [139]:
def start_pipeline(dataf):
    return dataf.copy()

def get_product_counts(dataf):
  return dataf.groupby(by='product_id', as_index=False) \
    .agg({'customer_id': pd.Series.nunique}) \
    .rename(columns={'customer_id': 'customer_id_count'}) \
    .set_index('product_id') \
    .sort_values('customer_id_count', ascending = False)

def get_avg_ratings(dataf):
  return dataf.groupby(by='product_id', as_index=False) \
    .agg({'review_score': np.mean}) \
    .rename(columns={'review_score': 'review_score_avg'}) \
    .set_index('product_id') \
    .sort_values('review_score_avg', ascending = False)

def get_n_top_values(dataf, n):
  return dataf.head(n) \
    .index \
    .tolist()

In [140]:
def recomendar1(id_produto):
  print(f"Produto de interesse: {id_produto}")
  try:
    produto_idx = matriz_de_relacionamento_top_product_ratings.index.tolist().index(id_produto)
    closest_10_products = np.argsort(-cos_products[produto_idx])[1:11]
    print(f"Retornando 10 Produtos mais próximos de: {id_produto}.")
    return matriz_de_relacionamento_top_product_ratings.index[closest_10_products].tolist()
  except ValueError as e:
    print(f"{id_produto}, não está incluso na matriz de recomendação. Retornando os 10 mais avaliados")
    return top_products_sellers

In [141]:
top_product_ratings = df \
  .pipe(start_pipeline) \
  .pipe(get_avg_ratings) \
  .pipe(get_n_top_values, n=1500)

top_products_sellers = df \
  .pipe(start_pipeline) \
  .pipe(get_product_counts) \
  .pipe(get_n_top_values, n=10)

top_product_ratings[:10], top_products_sellers

(['00066f42aeeb9f3007548bb9d3f33c38',
  '652c030867f364b558eb9f7dcbcf608d',
  'cae58f36738671651f3d19fee286f556',
  'cae2e38942c8489d9d7a87a3f525c06b',
  '64f1126c9715d5394b7301934c6833f0',
  '64fbadb8e3f6a0ac76c38ab230d661f9',
  'cab49aa7c76189e7e6d55ad8c7f9eb91',
  '6510b9320992123556a40f98806e512a',
  '65194d9ad03e8206e3a9848f405942f1',
  '6520088dce31a24d4fafaf79cfc10baa'],
 ['99a4788cb24856965c36a24e339b6058',
  'aca2eb7d00ea1a7b8ebd4e68314663af',
  'd1c427060a0f73f6b889a5c7c61f2ac4',
  '53b36df67ebb7c41585e8d54d6772e08',
  '422879e10f46682990de24d770e7f83d',
  '2b4609f8948be18874494203496bc318',
  '154e7e31ebfa092203795c972e5804a6',
  '3dd2a17168ec895c781a9191c1e95ad7',
  '389d119b48cf3043d311335e499d9c6b',
  '368c6c730842d78016ad823897a372db'])

In [142]:
matriz_de_relacionamento_top_product_ratings = df[df["product_id"].isin(top_product_ratings)] \
  .pivot_table(index="product_id", columns="customer_id", values="review_score").fillna(0)

matriz_de_relacionamento_top_product_ratings.shape

(1500, 1980)

### 6.4.1 Com Similaridade de Cossenos

In [143]:
cos_products = cosine_similarity(matriz_de_relacionamento_top_product_ratings)
print(cos_products)
cos_products.shape

[[1. 0. 0. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 [0. 0. 1. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 1. 0. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 0. 1.]]


(1500, 1500)

In [144]:
id_livro_interesse = "00066f42aeeb9f3007548bb9d3f33c38"
ids_10_mais_proximos = recomendar1(id_livro_interesse)
ids_10_mais_proximos

Produto de interesse: 00066f42aeeb9f3007548bb9d3f33c38
Retornando 10 Produtos mais próximos de: 00066f42aeeb9f3007548bb9d3f33c38.


['c2c6d6cfda3171733ed7af121e46f6a9',
 'c2c503e76e239d461b3e92222f6b49c5',
 'c2c4115f38ec8f43e1052cf0735e289b',
 'c2bcdb759a32342591497db4153af052',
 '753531d6b13f62a1aae9baefa606a470',
 '75285f54a9546fb99e970a15b52e15b6',
 '7515ab3fc02c8f43b07e9451497fb13e',
 '750d224957fab8388b6b2c7432c00e35',
 '750a49a83f6ad13ccdf4a761309483f2',
 '7500a93e5485588cedc511badf55e56e']

In [145]:
id_produto_novo = "id_novo"
ids_10_mais_vendidos = recomendar1(id_produto_novo)
ids_10_mais_vendidos

Produto de interesse: id_novo
id_novo, não está incluso na matriz de recomendação. Retornando os 10 mais avaliados


['99a4788cb24856965c36a24e339b6058',
 'aca2eb7d00ea1a7b8ebd4e68314663af',
 'd1c427060a0f73f6b889a5c7c61f2ac4',
 '53b36df67ebb7c41585e8d54d6772e08',
 '422879e10f46682990de24d770e7f83d',
 '2b4609f8948be18874494203496bc318',
 '154e7e31ebfa092203795c972e5804a6',
 '3dd2a17168ec895c781a9191c1e95ad7',
 '389d119b48cf3043d311335e499d9c6b',
 '368c6c730842d78016ad823897a372db']

### 6.4.2 Com Correlação de Pearson

In [146]:
cos_products = np.corrcoef(matriz_de_relacionamento_top_product_ratings)
print(cos_products)
cos_products.shape

[[ 1.00000000e+00 -8.75657750e-04 -5.05305710e-04 ... -8.75657750e-04
  -5.05305710e-04 -5.05305710e-04]
 [-8.75657750e-04  1.00000000e+00 -8.75657750e-04 ... -1.51745068e-03
  -8.75657750e-04 -8.75657750e-04]
 [-5.05305710e-04 -8.75657750e-04  1.00000000e+00 ... -8.75657750e-04
  -5.05305710e-04 -5.05305710e-04]
 ...
 [-8.75657750e-04 -1.51745068e-03 -8.75657750e-04 ...  1.00000000e+00
  -8.75657750e-04 -8.75657750e-04]
 [-5.05305710e-04 -8.75657750e-04 -5.05305710e-04 ... -8.75657750e-04
   1.00000000e+00 -5.05305710e-04]
 [-5.05305710e-04 -8.75657750e-04 -5.05305710e-04 ... -8.75657750e-04
  -5.05305710e-04  1.00000000e+00]]


(1500, 1500)

In [147]:
id_livro_interesse = "00066f42aeeb9f3007548bb9d3f33c38"
ids_10_mais_proximos = recomendar1(id_livro_interesse)
ids_10_mais_proximos

Produto de interesse: 00066f42aeeb9f3007548bb9d3f33c38
Retornando 10 Produtos mais próximos de: 00066f42aeeb9f3007548bb9d3f33c38.


['701e076c7aea72b5f668183dde0afa9a',
 '64473a39b66923a81252bb7150b63663',
 '64693ed5472651f9ea64d52a689d9ea0',
 '646c629c40a590e38c21f33ef1aca36f',
 'c9341fe3a3cf071d813a92b6b012efde',
 'c93b1a2e204567ed8b8b59a99456f8c4',
 '680874c570dad71c0a2844cfbf417054',
 '5e14c2beea650eac6b94bc9d446cd71a',
 'cd8c7501d1e3a66f282dfed8dbd5ab9f',
 '746d236f81b4ae9d259030ace2833590']

In [148]:
id_produto_novo = "id_novo"
ids_10_mais_vendidos = recomendar1(id_produto_novo)
ids_10_mais_vendidos

Produto de interesse: id_novo
id_novo, não está incluso na matriz de recomendação. Retornando os 10 mais avaliados


['99a4788cb24856965c36a24e339b6058',
 'aca2eb7d00ea1a7b8ebd4e68314663af',
 'd1c427060a0f73f6b889a5c7c61f2ac4',
 '53b36df67ebb7c41585e8d54d6772e08',
 '422879e10f46682990de24d770e7f83d',
 '2b4609f8948be18874494203496bc318',
 '154e7e31ebfa092203795c972e5804a6',
 '3dd2a17168ec895c781a9191c1e95ad7',
 '389d119b48cf3043d311335e499d9c6b',
 '368c6c730842d78016ad823897a372db']

# 7. Analisando Novas Features - Adicional

In [149]:
df_esp = df_concat[['product_id','customer_id','review_score','product_category_name','price']]
df_esp.reset_index
df_esp

Unnamed: 0,product_id,customer_id,review_score,product_category_name,price
0,4244733e06e7ecb4970a6e2683c13e61,3ce436f183e68e07877b285a838db11a,5,cool_stuff,58.9
1,4244733e06e7ecb4970a6e2683c13e61,e6eecc5a77de221464d1c4eaff0a9b64,5,cool_stuff,55.9
2,4244733e06e7ecb4970a6e2683c13e61,4ef55bf80f711b372afebcb7c715344a,4,cool_stuff,64.9
3,4244733e06e7ecb4970a6e2683c13e61,30407a72ad8b3f4df4d15369126b20c9,5,cool_stuff,58.9
4,4244733e06e7ecb4970a6e2683c13e61,91a792fef70ecd8cc69d3c7feb3d12da,5,cool_stuff,58.9
...,...,...,...,...,...
112367,4cc4d02efc8f249c13355147fb44e34d,050309b91cc5e04e68841938e7984aaf,5,ferramentas_jardim,129.9
112368,b10ecf8e33aaaea419a9fa860ea80fb5,11e0f43ab4e2d2c48348dd9332c0ef80,4,moveis_decoracao,99.0
112369,dd469c03ad67e201bc2179ef077dcd48,dec8952e97ef6124259c56914fb3569c,5,relogios_presentes,736.0
112370,bbe7651fef80287a816ead73f065fc4b,a5201e1a6d71a8d21e869151bd5b4085,4,esporte_lazer,229.9


In [150]:
df_esp['product_category_name'] = df_esp['product_category_name'].astype('category')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [151]:
ratings_explicit = df_esp[df_esp['review_score'] != 0]
ratings_sum = pd.DataFrame(ratings_explicit.groupby(['product_category_name'])['review_score'].mean())
top10 = ratings_sum.sort_values('review_score', ascending = False).head(10)
top10.rename(columns={'review_score': 'review_score_mean'},inplace=True)
top10

Unnamed: 0_level_0,review_score_mean
product_category_name,Unnamed: 1_level_1
la_cuisine,4.9
cds_dvds_musicais,4.7
flores,4.521739
livros_interesse_geral,4.485169
livros_tecnicos,4.454918
construcao_ferramentas_jardim,4.418919
construcao_ferramentas_ferramentas,4.404494
alimentos_bebidas,4.397906
fashion_esporte,4.391304
musica,4.388889


In [152]:
ratings_explicit = df_esp[df_esp['price'] != 0]
ratings_sum = pd.DataFrame(ratings_explicit.groupby(['product_category_name'])['price'].mean())
top10 = ratings_sum.sort_values('price', ascending = False).head(10)
top10.rename(columns={'price': 'price'},inplace=True)
top10

Unnamed: 0_level_0,price
product_category_name,Unnamed: 1_level_1
pcs,1140.976605
portateis_casa_forno_e_cafe,628.266351
eletrodomesticos_2,498.065022
agro_industria_e_comercio,365.199939
instrumentos_musicais,306.13231
eletroportateis,303.855209
portateis_cozinha_e_preparadores_de_alimentos,302.574615
construcao_ferramentas_seguranca,226.963219
relogios_presentes,214.589837
moveis_quarto,190.13


In [153]:
df0 = pd.get_dummies(df_esp, prefix=['product_category_name'])
df0

Unnamed: 0,product_id,customer_id,review_score,price,product_category_name_agro_industria_e_comercio,product_category_name_alimentos,product_category_name_alimentos_bebidas,product_category_name_artes,product_category_name_artes_e_artesanato,product_category_name_artigos_de_festas,product_category_name_artigos_de_natal,product_category_name_audio,product_category_name_automotivo,product_category_name_bebes,product_category_name_bebidas,product_category_name_beleza_saude,product_category_name_brinquedos,product_category_name_cama_mesa_banho,product_category_name_casa_conforto,product_category_name_casa_conforto_2,product_category_name_casa_construcao,product_category_name_cds_dvds_musicais,product_category_name_cine_foto,product_category_name_climatizacao,product_category_name_consoles_games,product_category_name_construcao_ferramentas_construcao,product_category_name_construcao_ferramentas_ferramentas,product_category_name_construcao_ferramentas_iluminacao,product_category_name_construcao_ferramentas_jardim,product_category_name_construcao_ferramentas_seguranca,product_category_name_cool_stuff,product_category_name_dvds_blu_ray,product_category_name_eletrodomesticos,product_category_name_eletrodomesticos_2,product_category_name_eletronicos,product_category_name_eletroportateis,product_category_name_esporte_lazer,product_category_name_fashion_bolsas_e_acessorios,product_category_name_fashion_calcados,product_category_name_fashion_esporte,product_category_name_fashion_roupa_feminina,product_category_name_fashion_roupa_infanto_juvenil,product_category_name_fashion_roupa_masculina,product_category_name_fashion_underwear_e_moda_praia,product_category_name_ferramentas_jardim,product_category_name_flores,product_category_name_fraldas_higiene,product_category_name_industria_comercio_e_negocios,product_category_name_informatica_acessorios,product_category_name_instrumentos_musicais,product_category_name_la_cuisine,product_category_name_livros_importados,product_category_name_livros_interesse_geral,product_category_name_livros_tecnicos,product_category_name_malas_acessorios,product_category_name_market_place,product_category_name_moveis_colchao_e_estofado,product_category_name_moveis_cozinha_area_de_servico_jantar_e_jardim,product_category_name_moveis_decoracao,product_category_name_moveis_escritorio,product_category_name_moveis_quarto,product_category_name_moveis_sala,product_category_name_musica,product_category_name_papelaria,product_category_name_pc_gamer,product_category_name_pcs,product_category_name_perfumaria,product_category_name_pet_shop,product_category_name_portateis_casa_forno_e_cafe,product_category_name_portateis_cozinha_e_preparadores_de_alimentos,product_category_name_relogios_presentes,product_category_name_seguros_e_servicos,product_category_name_sinalizacao_e_seguranca,product_category_name_tablets_impressao_imagem,product_category_name_telefonia,product_category_name_telefonia_fixa,product_category_name_utilidades_domesticas
0,4244733e06e7ecb4970a6e2683c13e61,3ce436f183e68e07877b285a838db11a,5,58.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,4244733e06e7ecb4970a6e2683c13e61,e6eecc5a77de221464d1c4eaff0a9b64,5,55.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,4244733e06e7ecb4970a6e2683c13e61,4ef55bf80f711b372afebcb7c715344a,4,64.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,4244733e06e7ecb4970a6e2683c13e61,30407a72ad8b3f4df4d15369126b20c9,5,58.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,4244733e06e7ecb4970a6e2683c13e61,91a792fef70ecd8cc69d3c7feb3d12da,5,58.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
112367,4cc4d02efc8f249c13355147fb44e34d,050309b91cc5e04e68841938e7984aaf,5,129.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
112368,b10ecf8e33aaaea419a9fa860ea80fb5,11e0f43ab4e2d2c48348dd9332c0ef80,4,99.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
112369,dd469c03ad67e201bc2179ef077dcd48,dec8952e97ef6124259c56914fb3569c,5,736.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
112370,bbe7651fef80287a816ead73f065fc4b,a5201e1a6d71a8d21e869151bd5b4085,4,229.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [154]:
product_category_name_dumies_costumer = df0.drop(['review_score','price','product_id'],axis=1).sample(1000)
product_category_name_dumies_product  = df0.drop(['review_score','price','customer_id'],axis=1).sample(1000)
product_category_name_dumies_costumer.set_index('customer_id',inplace=True)
product_category_name_dumies_product.set_index('product_id',inplace=True)

In [155]:
product_category_name_dumies_costumer_array = product_category_name_dumies_costumer.values
product_category_name_dumies_product_array = product_category_name_dumies_product.values

In [156]:
cos_category_costumer = cosine_similarity(product_category_name_dumies_costumer_array)
cos_category_product = cosine_similarity(product_category_name_dumies_product_array)

In [157]:
cliente_de_interesse = product_category_name_dumies_costumer.index[random.randint(0, product_category_name_dumies_costumer.shape[0])]

print(f"Cliente de interesse: {cliente_de_interesse}")

cliente_idx = product_category_name_dumies_costumer.index.tolist().index(cliente_de_interesse)
print(f"Cliente id: {cliente_de_interesse},tem índice {cliente_idx}")

closest_10_users = np.argsort(-cos_category_costumer[cliente_idx])[:10]

for i in zip(product_category_name_dumies_costumer.index[closest_10_users], cos_category_costumer[cliente_idx][closest_10_users]):
    print(f"Usuário {i[0]} tem similaridade {i[1]:.2f} com usuário {cliente_de_interesse}")

Cliente de interesse: 98980a9f930887243ad599ff2eae3609
Cliente id: 98980a9f930887243ad599ff2eae3609,tem índice 722
Usuário 5219321f4cfa568d79fce57113233d7c tem similaridade 1.00 com usuário 98980a9f930887243ad599ff2eae3609
Usuário f797b305ff4fc7689400509ea249468a tem similaridade 1.00 com usuário 98980a9f930887243ad599ff2eae3609
Usuário f2d7ef7f86d1ad58863453789ac79985 tem similaridade 1.00 com usuário 98980a9f930887243ad599ff2eae3609
Usuário 85b721db5eadffc2f1c2e40a3c1f6a53 tem similaridade 1.00 com usuário 98980a9f930887243ad599ff2eae3609
Usuário a79df1f329aed9365b676532f1597740 tem similaridade 1.00 com usuário 98980a9f930887243ad599ff2eae3609
Usuário f0507177bf9f1d580a1859b88681b5d9 tem similaridade 1.00 com usuário 98980a9f930887243ad599ff2eae3609
Usuário b77899db8038bdab58b749545d11fd88 tem similaridade 1.00 com usuário 98980a9f930887243ad599ff2eae3609
Usuário b0046bc2fbf6c21b9d50c60db587ca77 tem similaridade 1.00 com usuário 98980a9f930887243ad599ff2eae3609
Usuário 7f0fdc10daead

In [158]:
produto_de_interesse = product_category_name_dumies_product.index[random.randint(0, product_category_name_dumies_product.shape[0])]

print(f"Produto de interesse: {produto_de_interesse}")

produto_idx = product_category_name_dumies_product.index.tolist().index(produto_de_interesse)
print(f"Produto id: {produto_de_interesse},tem índice {produto_idx}")

closest_10_products = np.argsort(-cos_category_product[produto_idx])[:10]

for i in zip(product_category_name_dumies_product.index[closest_10_products], cos_category_product[cliente_idx][closest_10_products]):
    print(f"Produto {i[0]} tem similaridade {i[1]:.2f} com produto {produto_de_interesse}")

Produto de interesse: aadff88486740e0b0ebe2be6c09476ae
Produto id: aadff88486740e0b0ebe2be6c09476ae,tem índice 201
Produto e2b9252181ddde232e0bfb68df262610 tem similaridade 0.00 com produto aadff88486740e0b0ebe2be6c09476ae
Produto 89f055104adb9365d7f7b5c475f77742 tem similaridade 0.00 com produto aadff88486740e0b0ebe2be6c09476ae
Produto e0cf79767c5b016251fe139915c59a26 tem similaridade 0.00 com produto aadff88486740e0b0ebe2be6c09476ae
Produto a78b102a4520ca6cf50885443c44080b tem similaridade 0.00 com produto aadff88486740e0b0ebe2be6c09476ae
Produto 49d7eb00f7557973ece1df7ff7ed415b tem similaridade 0.00 com produto aadff88486740e0b0ebe2be6c09476ae
Produto b8b426747049f2d3d6e00b486d47dedb tem similaridade 0.00 com produto aadff88486740e0b0ebe2be6c09476ae
Produto 9e10ae46a3021a02f1692448ef9fa1db tem similaridade 0.00 com produto aadff88486740e0b0ebe2be6c09476ae
Produto 9453bde60c4ee52155c963641736cfc5 tem similaridade 0.00 com produto aadff88486740e0b0ebe2be6c09476ae
Produto 466bf8874eb69