# Prova Técnica - Resolução

**Formato**: Problema Online + Entrevista subsequente

**Prazo**: 7 dias

**Descrição**: O problema consiste em prever a nota de avaliação dos clientes de um website de ecommerce, com base nos dados de avaliações (train_df.csv).

**Entregáveis**: link para repositório no Github com ao menos os seguintes arquivos:
- README.md
- notebook da solução (`resolução_prova_tecnica.ipynb`)
- source-code (`./src`)
- requirements.txt

**Dados**: train_df.csv
- Tabela de dados de empresa de ecommerce com mais de 130 mil avaliações de clientes. A base oferece informações sobre o perfil do revisor, como sexo, idade e localização geográfica.
    - y label column: overall_rating

# .

In [26]:
!pip install -q nltk
!pip install -q clean-text
!pip install -q gensim
pip install -q seaborn
!pip install -q unidecode
!pip install -q eli5


[notice] A new release of pip is available: 23.0 -> 23.3
[notice] To update, run: C:\Users\daire\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [81]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('rslp')

In [80]:
import pandas as pd
import re
from nltk.corpus import stopwords

In [315]:
df = pd.read_csv("data/train_df.csv")

In [151]:
df.head()

Unnamed: 0,index,submission_date,reviewer_id,product_id,product_name,product_brand,site_category_lv1,site_category_lv2,review_title,recommend_to_a_friend,review_text,reviewer_birth_year,reviewer_gender,reviewer_state,overall_rating
0,95851,2018-04-20 11:56:28,c951f3a4511b554a1f34330903c320f34cfccbdf8de357...,111586438,Depilador Elétrico Philips Satinelle HP6403/30,philips,Beleza e Perfumaria,Depilação,Depilador,Yes,Muito eficiente e prático! Depilação rápida em...,1978.0,F,ES,5
1,115536,2018-05-10 18:56:36,21da6d1c6d022a5c67da402d3082c7c438660f4252b7c3...,19399940,Hidratante Corporal Dior Addict Body Mist Femi...,,Beleza e Perfumaria,Tratamento de Pele,Cheiro de rica,Yes,"Adoro o perfume que fica na pele, ele não é um...",1986.0,F,SP,5
2,1254,2018-01-02 07:02:48,eaf2f059cbb702e377bf95ac998aa4365f851937a3b419...,22747780,Controle Com Fio Para Xbox 360 Slim / Fat E Pc...,,Games,Xbox 360,Bom produto,Yes,funciona o que é importante bom produto o text...,1978.0,M,ES,3
3,86792,2018-04-11 16:45:45,e5bb0709d14bc4a00aeaeb1f111616e69f57239dff7da6...,22857850,Kit Edredom + Lençol Aconchego Dupla Face Casa...,,"Cama, Mesa e Banho",Edredom,Nao recebi onprofuto e nem satisfacao,No,Gostaria de saber da minha entrega ate hoje na...,1969.0,F,RJ,1
4,35543,2018-02-02 16:30:23,420a3ab1adf3c6010d491c8def04e19b1439ed01df7038...,132207708,Smartphone Motorola Moto E4 Dual Chip Android ...,motorola,Celulares e Smartphones,Smartphone,Excelente aquisição!,Yes,Entrega super rápida! Quando da compra tive c...,1965.0,M,PE,4


In [152]:
grouped_df = df.groupby("product_brand")["overall_rating"].mean().reset_index().sort_values(by="overall_rating")
px.bar(grouped_df, x="product_brand", y="overall_rating")

In [326]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import RSLPStemmer
import unidecode

def clean_text(text):
    # Converte o texto para minúsculas
    text = text.lower()
    text = re.sub(r'\d+', '', text)  # Remover números
    # Remove acentos
    text = unidecode.unidecode(text)
    
    # Tokenização
    words = word_tokenize(text, language='portuguese')
    
    # Remove stopwords e pontuação
    stop_words = set(stopwords.words('portuguese'))
    words = [word for word in words if word.isalnum() and word not in stop_words]
    
    # Stemming
    stemmer = RSLPStemmer()
    words = [stemmer.stem(word) for word in words]
    
    # Reconstroi o texto
    cleaned_text = ' '.join(words)
    
    return cleaned_text


In [316]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.preprocessing import LabelEncoder
import string
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

# Define the pipeline
model = LogisticRegression(random_state=42, max_iter=1000, solver='liblinear')

df["overall_rating_binary"] = df["overall_rating"].apply(lambda rating: 0 if rating <= 3 else 1)

tfidf_vectorizer = TfidfVectorizer(ngram_range=(1,2))

X = tfidf_vectorizer.fit_transform(df['review_text'].apply(clean_text))
y = df["overall_rating_binary"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
# Fit and predict
model.fit(X_train, y_train)
pip_pred = model.predict(X_test)

# Print classification report
print(metrics.classification_report(y_test, pip_pred))

In [143]:
import eli5
eli5.explain_weights(model, vec=tfidf_vectorizer, feature_names=tfidf_vectorizer.get_feature_names_out(), top=30,)

Weight?,Feature
+7.024,excel
+6.786,otim
+4.131,recom
+3.952,perfeit
+3.263,ador
+3.086,sup
+2.892,ame
+2.611,lind
+2.601,rap
+2.493,satisfeit


## Recommendation System

In [205]:
df.head()

Unnamed: 0,index,submission_date,reviewer_id,product_id,product_name,product_brand,site_category_lv1,site_category_lv2,review_title,recommend_to_a_friend,review_text,reviewer_birth_year,reviewer_gender,reviewer_state,overall_rating,overall_rating_binary,review_text_process,topic
0,95851,2018-04-20 11:56:28,c951f3a4511b554a1f34330903c320f34cfccbdf8de357...,111586438,Depilador Elétrico Philips Satinelle HP6403/30,philips,Beleza e Perfumaria,Depilação,Depilador,Yes,Muito eficiente e prático! Depilação rápida em...,1978.0,F,ES,5,1,efici pra depilaca rap qualqu lug observaca me...,3
1,115536,2018-05-10 18:56:36,21da6d1c6d022a5c67da402d3082c7c438660f4252b7c3...,19399940,Hidratante Corporal Dior Addict Body Mist Femi...,,Beleza e Perfumaria,Tratamento de Pele,Cheiro de rica,Yes,"Adoro o perfume que fica na pele, ele não é um...",1986.0,F,SP,5,1,ador perfum fic pel nao produt hidrat costum d...,3
2,1254,2018-01-02 07:02:48,eaf2f059cbb702e377bf95ac998aa4365f851937a3b419...,22747780,Controle Com Fio Para Xbox 360 Slim / Fat E Pc...,,Games,Xbox 360,Bom produto,Yes,funciona o que é importante bom produto o text...,1978.0,M,ES,3,0,func import bom produt text analis curt quer e...,3
3,86792,2018-04-11 16:45:45,e5bb0709d14bc4a00aeaeb1f111616e69f57239dff7da6...,22857850,Kit Edredom + Lençol Aconchego Dupla Face Casa...,,"Cama, Mesa e Banho",Edredom,Nao recebi onprofuto e nem satisfacao,No,Gostaria de saber da minha entrega ate hoje na...,1969.0,F,RJ,1,0,gost sab entreg ate hoj nad aguard urgent retorn,4
4,35543,2018-02-02 16:30:23,420a3ab1adf3c6010d491c8def04e19b1439ed01df7038...,132207708,Smartphone Motorola Moto E4 Dual Chip Android ...,motorola,Celulares e Smartphones,Smartphone,Excelente aquisição!,Yes,Entrega super rápida! Quando da compra tive c...,1965.0,M,PE,4,1,entreg sup rap compr cert recei qual mot apes ...,0


In [209]:
counts_reviewer = df['reviewer_id'].value_counts() 
counts_products = df['product_id'].value_counts()

In [227]:
df_filtered = df[df['reviewer_id'].isin(counts_reviewer[counts_reviewer >= 1].index)]
df_filtered.shape

(10000, 18)

In [228]:
df_filtered = df_filtered[df_filtered['product_id'].isin(counts_products[counts_products >= 5].index)]
df_filtered.shape

(2441, 18)

In [239]:
df_filtered.drop_duplicates(subset="reviewer_id")

Unnamed: 0,index,submission_date,reviewer_id,product_id,product_name,product_brand,site_category_lv1,site_category_lv2,review_title,recommend_to_a_friend,review_text,reviewer_birth_year,reviewer_gender,reviewer_state,overall_rating,overall_rating_binary,review_text_process,topic
0,95851,2018-04-20 11:56:28,c951f3a4511b554a1f34330903c320f34cfccbdf8de357...,111586438,Depilador Elétrico Philips Satinelle HP6403/30,philips,Beleza e Perfumaria,Depilação,Depilador,Yes,Muito eficiente e prático! Depilação rápida em...,1978.0,F,ES,5,1,efici pra depilaca rap qualqu lug observaca me...,3
4,35543,2018-02-02 16:30:23,420a3ab1adf3c6010d491c8def04e19b1439ed01df7038...,132207708,Smartphone Motorola Moto E4 Dual Chip Android ...,motorola,Celulares e Smartphones,Smartphone,Excelente aquisição!,Yes,Entrega super rápida! Quando da compra tive c...,1965.0,M,PE,4,1,entreg sup rap compr cert recei qual mot apes ...,0
7,49682,2018-03-05 05:27:57,bc40d2473d17fd3b8f3ac4eb42dad1e2e26ec74f86739d...,128721018,Smartphone LG X Power Dual Chip Android 6.0 T...,lg,Celulares e Smartphones,Smartphone,Chegou tudo normal,Yes,Celular com custo beneficio bom... Design bem ...,1998.0,M,RN,3,0,celul cust benefici bom design bem produz,2
9,35481,2018-02-02 13:10:31,db9c66431666d38a8fa7ad230e1ddc4ae7d962932cfe8e...,125768030,Smartphone LG K10 Dual Chip Android 6.0 Marshm...,lg,Celulares e Smartphones,Smartphone,Veio com o SIM um com defeito.,Yes,"Tempo de recebimento do produto muito rápido, ...",1985.0,F,RJ,4,1,temp receb produt rap infeliz vei problem chip...,4
11,17169,2018-01-13 09:16:42,ec2eef77f9f103373ead652a3ce162529601a3e4964f23...,111903940,Aspirador Vertical Elétrico - AP-10 - Mondial,mondial,Eletroportáteis,Aspirador e Vassoura Elétrica,Gostei mas ....,Yes,Não sei se é defeito mas depois de 10 minutos ...,1968.0,F,SP,3,0,nao sei defeit minut lig deslig r nao lig so h...,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9991,34015,2018-01-31 03:03:10,ed56917b061cbfcb70e2fb384047df2001afbb89147137...,29562349,Camera Lampada Led Wifi IP HD Panoramica Única...,,Casa e Construção,Segurança,nao funcionou,No,como não funcionou logo dei esta opinião pois ...,1973.0,M,SP,1,0,nao funcion log dei opinia poi ped troc devolu...,4
9993,59704,2018-03-15 10:38:02,38344bc79d1aa52b2f34f7bef075cb3f39889f90f34a87...,132444050,Smartphone Motorola Moto G 5S Dual Chip Androi...,,Celulares e Smartphones,Smartphone,Ótimo aparelho com ótimo preço,Yes,Bonito e muito funcional um aparelho barato pe...,1973.0,M,SP,4,1,bonit func aparelh barat otim desempenh gig me...,3
9994,37310,2018-02-07 08:24:07,2692fabed23ac684d2c2482656dbc2d28348c97ca53dc4...,132221770,Smartphone Motorola Moto Z2 Play Dual Chip And...,motorola,Celulares e Smartphones,Smartphone,Excelente aparelho por um preço acessível.,Yes,"O aparelho é muito bom, rápido, excelente câme...",1978.0,M,PE,5,1,aparelh bom rap excel cam muit memor armazen a...,2
9996,67000,2018-03-22 16:51:22,74c4b0d4bdf5fefd2d2142724a9a911b1354c85af82480...,18204207,Carregador Portátil Slim 10000mah Powerbank Pi...,,Celulares e Smartphones,Acessórios para Celular,Produto regular por não ter Manuel em português,Yes,Poderia ser melhor se tivesse manual em portug...,1970.0,M,SP,2,0,pod melhor man portug produt si parec bom,3


In [294]:
df_gp_reviewer = df_filtered.groupby(["reviewer_id", "product_id", "product_name"])["overall_rating"].mean().reset_index()

In [295]:
ratingsd = df_gp_reviewer.pivot(index='product_id', columns='reviewer_id', values='overall_rating').fillna(0)

In [296]:
from sklearn.model_selection import train_test_split
traind, testd = train_test_split(ratingsd, test_size=0.30,random_state=42)

In [297]:
train = traind.values
test = testd.values

In [298]:
sparsity = float(len(train.nonzero()[0]))
sparsity /= (train.shape[0] * train.shape[1])
sparsity *= 100
print ('Sparsity: {:5.2f}%'.format(sparsity))

Sparsity:  0.46%


In [299]:
def item_similarity(ratings, epsilon=1e-9):
    # epsilon -> for handling dived-by-zero errors
    sim = ratings.T.dot(ratings) + epsilon
    norms = np.array([np.sqrt(np.diagonal(sim))])
    return (sim / norms / norms.T)

In [300]:
item_sim = item_similarity(train)

def predict_item(ratings, similarity):
    return ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])

item_prediction = predict_item(train, item_sim)
item_prediction[:4, :4]

array([[1.82403591e-10, 4.15235122e-10, 1.15377693e-06, 1.15377693e-06],
       [5.20668131e-11, 1.18528201e-10, 3.29343779e-07, 3.29343779e-07],
       [6.75568856e-11, 1.53790786e-10, 4.27324790e-07, 4.27324790e-07],
       [4.05341314e-11, 9.22744715e-11, 2.56394874e-07, 2.56394874e-07]])

In [301]:
from sklearn.metrics import mean_squared_error

def get_mse(pred, actual):
    # Ignore nonzero terms.
    pred = pred[actual.nonzero()].flatten()
    actual = actual[actual.nonzero()].flatten()
    return mean_squared_error(pred, actual)


print ('Item-based CF MSE: ' + str(get_mse(item_prediction, test)))


from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity

svd = TruncatedSVD(n_components=50, n_iter=7, random_state=42)
r_mat_tr=svd.fit_transform(traind) 
print(svd.explained_variance_ratio_)  
print(svd.explained_variance_ratio_.sum())

#pm=pd.DataFrame(cosine_similarity(r_mat_tr))
#pm.head()
ctrain = cosine_similarity(r_mat_tr)

Item-based CF MSE: 17.393814095271164
[0.04445366 0.02820365 0.02764927 0.02671377 0.02162047 0.01909114
 0.01871005 0.01860607 0.01576482 0.0151765  0.01517563 0.01507224
 0.01408673 0.01404735 0.01285444 0.01268031 0.0124384  0.01233387
 0.01137367 0.0114926  0.0108473  0.01039057 0.01007937 0.00949491
 0.0090749  0.00887005 0.00879785 0.008843   0.00871628 0.00868196
 0.00859358 0.00835986 0.00820608 0.00815772 0.00809016 0.00797728
 0.00799328 0.00774432 0.00763529 0.00758018 0.0075462  0.00755627
 0.00752452 0.00739649 0.00749508 0.00740386 0.00725227 0.00703296
 0.00693162 0.00680565]
0.6146235343721568


In [302]:
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity

svd = TruncatedSVD(n_components=50, n_iter=7, random_state=42)
r_mat_tr=svd.fit_transform(testd) 
print(svd.explained_variance_ratio_)  
print(svd.explained_variance_ratio_.sum())

#pmtt=pd.DataFrame(cosine_similarity(r_mat_tr))
#print (pmtt[:2])
#pmtt.head()
ctest = cosine_similarity(r_mat_tr)
print (' CF MSE: ' + str(get_mse(ctrain, ctest)))

[0.07024187 0.06297813 0.05803261 0.04837339 0.04288695 0.0394869
 0.03492775 0.02920949 0.02766401 0.0250367  0.02403215 0.02395487
 0.02302759 0.02163666 0.02117302 0.01916393 0.01676839 0.01607312
 0.01599559 0.01584113 0.01437286 0.01251845 0.01213233 0.01158939
 0.011435   0.01080757 0.01113677 0.01050374 0.01035512 0.01012221
 0.00973112 0.00966094 0.00957963 0.00950367 0.00919156 0.00912162
 0.00904615 0.00895911 0.00818925 0.00832063 0.00839373 0.00814676
 0.00789836 0.00787914 0.00779732 0.0074727  0.00767828 0.00726845
 0.00743414 0.00710613]
0.9198564106682292
 CF MSE: 0.11342396514114067


In [303]:
df_gp_reviewer = df_gp_reviewer.sort_values(by='overall_rating')
df_gp_reviewer = df_gp_reviewer.reset_index(drop=True)
count_users = df_gp_reviewer.groupby("reviewer_id", as_index=False).count()

count = df_gp_reviewer.groupby("product_id", as_index=False).mean()
items_df = count[['product_id']]
users_df = count_users[['reviewer_id']]


df_clean_matrix = df_gp_reviewer.pivot(index='product_id', columns='reviewer_id', values='overall_rating').fillna(0)
df_clean_matrix = df_clean_matrix.T
R = (df_clean_matrix).values

  count = df_gp_reviewer.groupby("product_id", as_index=False).mean()


In [304]:
user_ratings_mean = np.mean(R, axis = 1)
R_demeaned = R - user_ratings_mean.reshape(-1, 1)

In [305]:
from scipy.sparse.linalg import svds
U, sigma, Vt = svds(R_demeaned)

sigma = np.diag(sigma)

all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt) + user_ratings_mean.reshape(-1, 1)
preds_df = pd.DataFrame(all_user_predicted_ratings, columns = df_clean_matrix.columns)
preds_df['reviewer_id'] = users_df
preds_df.set_index('reviewer_id', inplace=True)
preds_df.head()

product_id,108720288,110757661,111042393,111586438,111797770,111825515,111903940,111966764,113022329,113048617,...,6942227,7112342,7142158,7328501,7344160,7408885,7503466,8190307,8456579,9627515
reviewer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00021c52ce4c38dde0dea752609ec3242e7587bec1dfc5d923558bf36ef58c46,0.005825,0.002415,0.004755,0.001864,0.003889,0.004538,0.004437,0.003976,0.003127,-0.001711,...,0.002716,0.004105,0.00339,0.004964,0.006114,-0.000519,0.002779,0.004598,0.006067,0.005383
00164a832a4d2c40e1d85193c37b732a15d94c36e3cdc5870879798efe9e1d98,0.023284,0.023392,0.023318,0.023409,0.023346,0.023325,0.023328,0.023343,0.02337,0.023518,...,0.023382,0.023339,0.023362,0.023312,0.023274,0.023482,0.023381,0.023323,0.023276,0.023298
001b395bed37db76429433a6fda572002c991aad1e8629616f937a1cac21a449,0.018576,0.01865,0.018599,0.018661,0.018618,0.018604,0.018606,0.018616,0.018634,0.018736,...,0.018643,0.018613,0.018629,0.018595,0.018569,0.018711,0.018642,0.018603,0.01857,0.018585
001d4e96e29d2fef227dd6d4fab3f375abf3b2f7f85f0348845c6dcd6fd3b4ab,0.018579,0.018653,0.018602,0.018665,0.018621,0.018607,0.018609,0.018619,0.018638,0.01874,...,0.018646,0.018617,0.018632,0.018598,0.018572,0.018715,0.018645,0.018606,0.018573,0.018588
002a0bb116982e62957b8b76c80ecc3bc82dabeb8e07fb3e77b3674ae5282de0,0.023298,0.02341,0.023334,0.023428,0.023362,0.023341,0.023344,0.02336,0.023387,0.023541,...,0.0234,0.023355,0.023379,0.023327,0.023289,0.023504,0.023399,0.023339,0.02329,0.023313


In [306]:
def recommend_it(predictions_df, itm_df, original_ratings_df, num_recommendations=10,ruserId='A100UD67AHFODS'):
    
    # Get and sort the user's predictions
    sorted_user_predictions = predictions_df.loc[ruserId].sort_values(ascending=False)
    
    # Get the user's data and merge in the item information.
    user_data = original_ratings_df[original_ratings_df.reviewer_id == ruserId]
    user_full = (user_data.merge(itm_df, how = 'left', left_on = 'product_id', right_on = 'product_id').
                     sort_values(['overall_rating'], ascending=False)
                 )

    print ('User {0} has already purchased {1} items.'.format(ruserId, user_full.shape[0]))
    print ('Recommending the highest {0} predicted  items not already purchased.'.format(num_recommendations))
    
    # Recommend the highest predicted rating items that the user hasn't bought yet.
    recommendations = (itm_df[~itm_df['product_id'].isin(user_full['product_id'])].
         merge(pd.DataFrame(sorted_user_predictions).reset_index(), how = 'left',
               left_on = 'product_id',
               right_on = 'product_id').
         rename(columns = {ruserId: 'Predictions'}).
         sort_values('Predictions', ascending = False).
                       iloc[:num_recommendations, :-1]
                      )
    topk=recommendations.merge(original_ratings_df, right_on = 'product_id',left_on='product_id').drop_duplicates(
    ['product_id', 'product_name'])[['product_id', 'product_name', 'overall_rating']]

    return topk

In [313]:
preds_df

product_id,108720288,110757661,111042393,111586438,111797770,111825515,111903940,111966764,113022329,113048617,...,6942227,7112342,7142158,7328501,7344160,7408885,7503466,8190307,8456579,9627515
reviewer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00021c52ce4c38dde0dea752609ec3242e7587bec1dfc5d923558bf36ef58c46,0.005825,0.002415,0.004755,0.001864,0.003889,0.004538,0.004437,0.003976,0.003127,-0.001711,...,0.002716,0.004105,0.003390,0.004964,0.006114,-0.000519,0.002779,0.004598,0.006067,0.005383
00164a832a4d2c40e1d85193c37b732a15d94c36e3cdc5870879798efe9e1d98,0.023284,0.023392,0.023318,0.023409,0.023346,0.023325,0.023328,0.023343,0.023370,0.023518,...,0.023382,0.023339,0.023362,0.023312,0.023274,0.023482,0.023381,0.023323,0.023276,0.023298
001b395bed37db76429433a6fda572002c991aad1e8629616f937a1cac21a449,0.018576,0.018650,0.018599,0.018661,0.018618,0.018604,0.018606,0.018616,0.018634,0.018736,...,0.018643,0.018613,0.018629,0.018595,0.018569,0.018711,0.018642,0.018603,0.018570,0.018585
001d4e96e29d2fef227dd6d4fab3f375abf3b2f7f85f0348845c6dcd6fd3b4ab,0.018579,0.018653,0.018602,0.018665,0.018621,0.018607,0.018609,0.018619,0.018638,0.018740,...,0.018646,0.018617,0.018632,0.018598,0.018572,0.018715,0.018645,0.018606,0.018573,0.018588
002a0bb116982e62957b8b76c80ecc3bc82dabeb8e07fb3e77b3674ae5282de0,0.023298,0.023410,0.023334,0.023428,0.023362,0.023341,0.023344,0.023360,0.023387,0.023541,...,0.023400,0.023355,0.023379,0.023327,0.023289,0.023504,0.023399,0.023339,0.023290,0.023313
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ff6e2d09dcbf68708eb506f8a4e8ca7cada840f9b65643ad92a0032447b8b47f,0.013927,0.013981,0.013944,0.013990,0.013958,0.013948,0.013949,0.013957,0.013970,0.014044,...,0.013976,0.013955,0.013966,0.013941,0.013922,0.014026,0.013976,0.013947,0.013923,0.013934
ff788c0c887b490169f715aa03863cbba95d980d66b90b250812b46d4a73811e,0.018576,0.018650,0.018599,0.018661,0.018618,0.018604,0.018606,0.018616,0.018634,0.018736,...,0.018643,0.018613,0.018629,0.018595,0.018569,0.018711,0.018642,0.018603,0.018570,0.018585
ff963664ef150c49c3012c90ee5aaa48089aa53f5932a97a219596fa159dfb15,0.002558,0.000170,0.001757,-0.000174,0.001145,0.001601,0.001529,0.001205,0.000631,-0.002183,...,0.000406,0.001295,0.000806,0.001910,0.002783,-0.001553,0.000404,0.001643,0.002746,0.002220
ffb7f412943525aef85c7b0482ea16adcd5611625142a8f97da86a9c96f3580a,0.018582,0.018657,0.018606,0.018669,0.018625,0.018611,0.018613,0.018623,0.018642,0.018745,...,0.018651,0.018620,0.018636,0.018601,0.018575,0.018720,0.018650,0.018609,0.018576,0.018592


In [314]:
recommend_it(preds_df, items_df, df_gp_reviewer, 10, df_gp_reviewer["reviewer_id"][100])

User 7a708c30ddc360edf02333db9edbf41202cbfe877b82d5e5413a6a33e189b02c has already purchased 1 items.
Recommending the highest 10 predicted  items not already purchased.


Unnamed: 0,product_id,product_name,overall_rating
0,129543938,Smartphone Samsung Galaxy J7 Prime Dual Chip A...,1.0
43,132444738,Smartphone Motorola Moto G5S Plus Dual Chip An...,4.0
72,122701411,"Smart TV LED 32"" Samsung 32J4300 HD com Conver...",1.0
111,132380287,Smartphone Samsung Galaxy J5 Pro Dual Chip And...,1.0
140,132474081,Smartphone Moto G 5S Dual Chip Android 7.0 Tel...,1.0
173,132276640,Smartphone Samsung Galaxy J5 Prime Dual Chip A...,2.0
200,128011681,Smartphone Samsung Galaxy J7 Metal Dual Chip A...,1.0
229,132472827,Smart TV LED 32'' Semp Toshiba TCL 32L2600 HD ...,1.0
256,127005968,Pipoqueira Elétrica Britania Pop Time VM,2.0
279,129542708,Smartphone Samsung Galaxy J7 Prime Dual Chip A...,3.0
