## Partie 1: Recommandations simples:

Top recettes / Les plus populaires

Une stratégie de recommandation populaire consiste à afficher les « Meilleures recettes » ou les recettes classées comme « très vues » ou « meilleures ». Les recettes sont notées en fonction de la somme pondérée de toutes les interactions, telles que les vues, les critiques, etc. Le système privilégie les interactions récentes par rapport aux interactions historiques et met à jour les scores à chaque synchronisation d'un flux de données.

Comme cette stratégie n'est généralement pas basée sur des données utilisateur hyper-personnalisées, elle est particulièrement utile lorsque l'on sait peu ou rien d'un utilisateur ou lorsqu'un utilisateur affiche un comportement communiquant qu'il navigue simplement sur le site. Il est également idéal pour promouvoir vos recettes les plus chaudes, aidant ainsi votre entreprise à se démarquer de la concurrence. Il facilitera et améliorera l'expérience de découverte de produits, vous aidant à commercialiser votre marque, ainsi que les produits populaires qu'elle propose.

Selon les étapes de la recette:

Le filtrage collaboratif est une technique qui peut filtrer les recettes qu'un utilisateur pourrait aimer sur la base des réactions d'utilisateurs similaires.

Cela fonctionne en recherchant un grand groupe de personnes et en trouvant un plus petit ensemble d'utilisateurs ayant des goûts similaires à un utilisateur particulier. Il examine les éléments qu'ils aiment et les combine pour créer une liste classée de suggestions.

Il existe de nombreuses façons de déterminer quels utilisateurs sont similaires et de combiner leurs choix pour créer une liste de recommandations.

Pour créer un système capable de recommander automatiquement des recettes aux utilisateurs en fonction des préférences des autres utilisateurs, la première étape consiste à rechercher des utilisateurs ou des éléments similaires. La deuxième étape consiste à prédire les notes des éléments qui ne sont pas encore notés par un utilisateur.

Pour une recette I, avec un ensemble de recettes similaires déterminées sur la base de vecteurs de notation constitués de notes d'utilisateurs reçues, la note d'un utilisateur U, qui ne l'a pas notée, est trouvée en sélectionnant N recettes de la liste de similarité qui ont été notées par U et en calculant la note sur la base de ces notes N.

Selon la description de la recette

Un système de recommandation basé sur la description des recettes fonctionne avec des données que l'utilisateur fournit, explicitement (notation) ou implicitement (clic sur un lien) ou des avis sur les recettes. Sur la base de ces données, un profil d'utilisateur est généré, qui est ensuite utilisé pour faire des suggestions à l'utilisateur. Au fur et à mesure que l'utilisateur fournit plus d'entrées ou prend des mesures sur les recommandations, le moteur devient de plus en plus précis.

## Read the data 

In [1]:
#libraries importation
import numpy as np
import pandas as pd

In [2]:
#clone github repository
!git clone https://github.com/DavidBert/projet_AI_frameworks.git

Cloning into 'projet_AI_frameworks'...
remote: Enumerating objects: 9, done.[K
remote: Total 9 (delta 0), reused 0 (delta 0), pack-reused 9[K
Unpacking objects: 100% (9/9), done.


In [3]:
#read the data 
df = pd.read_csv('/content/projet_AI_frameworks/test_script.csv')
df.head()

Unnamed: 0,user_id,recipe_id,date,rating,u,i
0,76535,33627,2005-02-15,4.0,5,177317
1,160497,75307,2005-10-24,4.0,23,170785
2,930021,100961,2008-11-30,4.0,31,165555
3,58439,154105,2007-03-24,4.0,44,177453
4,628951,14525,2008-02-16,5.0,45,142367


In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
!ls /content/drive/MyDrive/Projet

interactions_test.csv	    RAW_interactions.csv.zip
interactions_train.csv.zip  RAW_recipes.csv.zip


In [6]:
import os
os.chdir('/content/projet_AI_frameworks')  #change dir
!mkdir train  #create a directory named train/
!mkdir test  #create a directory named test/
!mkdir raws
!unzip -q /content/drive/MyDrive/Projet/interactions_train.csv.zip -d train/  #unzip data in train/
!cp /content/drive/MyDrive/Projet/interactions_test.csv  test/  #copy the content of project into test directory
!unzip -q /content/drive/MyDrive/Projet/RAW_interactions.csv.zip -d raws/  #unzip data in raws/ directory
!unzip -q /content/drive/MyDrive/Projet/RAW_recipes.csv.zip -d raws/ #unzip data in raws/ directory

In [7]:
# read the train and the test data
df_train = pd.read_csv('/content/projet_AI_frameworks/train/interactions_train.csv')
df_test = pd.read_csv('/content/projet_AI_frameworks/test/interactions_test.csv')
df_train.head()

Unnamed: 0,user_id,recipe_id,date,rating,u,i
0,2046,4684,2000-02-25,5.0,22095,44367
1,2046,517,2000-02-25,5.0,22095,87844
2,1773,7435,2000-03-13,5.0,24732,138181
3,1773,278,2000-03-13,4.0,24732,93054
4,2046,3431,2000-04-07,5.0,22095,101723


In [8]:
df_test.head()

Unnamed: 0,user_id,recipe_id,date,rating,u,i
0,8937,44551,2005-12-23,4.0,2,173538
1,56680,126118,2006-10-07,4.0,16,177847
2,349752,219596,2008-04-12,0.0,26,89896
3,628951,82783,2007-11-13,2.0,45,172637
4,92816,435013,2013-07-31,3.0,52,177935


In [9]:
# function qui retourne les notes inférieures à 3 sont négatives et celles supérieurs sont positives.
def var_sentiment(rate):
  if rate > 3:
    return "positve"
  else:
    return "negative"
# ajout d'une colonne appelé sentiment
df_train["sentiment"] = df_train["rating"].apply(var_sentiment)
df_test["sentiment"] = df_test["rating"].apply(var_sentiment)

In [10]:
df_test.head()

Unnamed: 0,user_id,recipe_id,date,rating,u,i,sentiment
0,8937,44551,2005-12-23,4.0,2,173538,positve
1,56680,126118,2006-10-07,4.0,16,177847,positve
2,349752,219596,2008-04-12,0.0,26,89896,negative
3,628951,82783,2007-11-13,2.0,45,172637,negative
4,92816,435013,2013-07-31,3.0,52,177935,negative


In [11]:
# lire les interactions des utilisateurs sur les recetes
df_R_int = pd.read_csv('/content/projet_AI_frameworks/raws/RAW_interactions.csv')
df_R_rec = pd.read_csv('/content/projet_AI_frameworks/raws/RAW_recipes.csv')

In [12]:
df_R_int.head()

Unnamed: 0,user_id,recipe_id,date,rating,review
0,38094,40893,2003-02-17,4,Great with a salad. Cooked on top of stove for...
1,1293707,40893,2011-12-21,5,"So simple, so delicious! Great for chilly fall..."
2,8937,44394,2002-12-01,4,This worked very well and is EASY. I used not...
3,126440,85009,2010-02-27,5,I made the Mexican topping and took it to bunk...
4,57222,85009,2011-10-01,5,"Made the cheddar bacon topping, adding a sprin..."


##Partie 2: Analyse de sentiments:

In [13]:
# import libraries
import unicodedata 
import time
import pandas as pd
import numpy as np
import random
import nltk
import re 
import collections
import itertools
import pickle
import warnings
from tqdm import tqdm
import plotly.offline as pof
import plotly.graph_objects as go
warnings.filterwarnings("ignore")
import sklearn.metrics as smet

import matplotlib.pyplot as plt
import seaborn as sb
from scipy import sparse
sb.set_style("whitegrid")

import sklearn.model_selection as sms

In [14]:
nltk.download("stopwords")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [15]:
french_stopwords = nltk.corpus.stopwords.words('french') 
english_stopwords = nltk.corpus.stopwords.words('english') 
pd.DataFrame([french_stopwords[:30], english_stopwords[:30]], index=["French", "English"]).T

stopwords = [unicodedata.normalize('NFD', sw).encode('ascii', 'ignore').decode("utf-8") for sw in french_stopwords]

stemmer=nltk.stem.SnowballStemmer('french')
stemmer.stem("")

''

In [16]:
from bs4 import BeautifulSoup #Nettoyage d'HTML
import re
# function to clean the data

def clean_text(string):
  if string != float:
    txt = BeautifulSoup(string,"html.parser",from_encoding='utf-8').get_text()
    txt = txt.lower()
    txt = unicodedata.normalize('NFD', txt).encode('ascii', 'ignore').decode("utf-8")
    txt = re.sub('[^a-z_]', ' ', txt)
    tokens = [w for w in txt.split() if (w not in stopwords)]
    tokens_stem = [stemmer.stem(token) for token in tokens]
  return tokens_stem



In [18]:
# set the index to userid and recipe id
df_R_int=df_R_int.set_index(['user_id','recipe_id'])
df_train=df_train.set_index(['user_id','recipe_id'])
df_test=df_test.set_index(['user_id','recipe_id'])

In [19]:
# the reviews from the interaction data are mapped to the train and the test
df_train['review'] = df_train.index.map(df_R_int['review'])
df_test['review'] = df_test.index.map(df_R_int['review'])

In [20]:
df_train.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,date,rating,u,i,sentiment,review
user_id,recipe_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2046,4684,2000-02-25,5.0,22095,44367,positve,this is absolutely delicious. i even served i...
2046,517,2000-02-25,5.0,22095,87844,positve,thought this was terrific!
1773,7435,2000-03-13,5.0,24732,138181,positve,easily the best i have ever had. juicy flavor...
1773,278,2000-03-13,4.0,24732,93054,positve,"a little greasy, but a huge hit with the guests."
2046,3431,2000-04-07,5.0,22095,101723,positve,leeks on a pizza?! it was really delicious. ...


In [21]:
# reset the index to its default
df_train.reset_index()
df_test.reset_index()

Unnamed: 0,user_id,recipe_id,date,rating,u,i,sentiment,review
0,8937,44551,2005-12-23,4.0,2,173538,positve,I made this and took it to several holiday fun...
1,56680,126118,2006-10-07,4.0,16,177847,positve,"This was really great, directions are right on..."
2,349752,219596,2008-04-12,0.0,26,89896,negative,Very easy and tasty! The eggs cooked up nicel...
3,628951,82783,2007-11-13,2.0,45,172637,negative,I would have enjoyed this more without the lem...
4,92816,435013,2013-07-31,3.0,52,177935,negative,These didn&#039;t turn out quite like I had ho...
...,...,...,...,...,...,...,...,...
12450,101053,179011,2009-01-03,5.0,25054,130258,positve,"These little gems are fantastic! For years, I..."
12451,252205,81398,2005-12-26,2.0,25055,152255,negative,"I hate to be the first to post a bad review, b..."
12452,624305,142984,2011-01-15,1.0,25057,139864,negative,I should have known a recipe submitted in 2005...
12453,173575,104842,2004-12-18,3.0,25059,140646,negative,i found the crab cakes to be very dry and the ...


In [None]:
# ajout d'une colonne 'cleaned text' 
df_train["cleaned_text"] = df_train["review"].apply(clean_text)
df_test["cleaned_text"] = df_test["review"].apply(clean_text)

In [None]:
# transorming the sentiment to 0-1
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
t=le.fit_transform(df_train['sentiment'].values)
y=le.fit_transform(df_test['sentiment'].values)

In [None]:
# eliminiting rows without review
df_train = df_train.dropna()
df_test = df_test.dropna()

In [None]:
# train the model using random forest and evaluting the model
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.pipeline import make_pipeline

model = make_pipeline(
        CountVectorizer(max_df= 0.5, ngram_range= (1, 2)),
        TfidfTransformer(),
        RandomForestClassifier()
)

model.fit(df_train['cleaned_text'][:5000], t[:5000])
print(f"Model score: {model.score(df_test['cleaned_text'], y):.2f}")

In [None]:
!pip install lime > /dev/null 2>&1

In [None]:
class_names = ['negative', 'positive']
from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names=class_names)

In [None]:
index = 11
exp = explainer.explain_instance(df_test['cleaned_text'].iloc[index], model.predict_proba, num_features=6)

prediction = model.predict_proba([df_test['cleaned_text'].iloc[index]])
class_predicted = class_names[prediction.argmax(1)[0]]
class_proba = prediction.max(1)[0]
true_class = class_names[y[index]]
print(f'Class predicted: {class_predicted} (p={class_proba})')
print(f'True class: {class_names[y[index]]}')

In [None]:
fig = exp.as_pyplot_figure()

In [None]:
exp.show_in_notebook(text=True)

In [None]:
index = 30
exp = explainer.explain_instance(df_test['cleaned_text'].iloc[index], model.predict_proba, num_features=6)

prediction = model.predict_proba([df_test['cleaned_text'].iloc[index]])
class_predicted = class_names[prediction.argmax(1)[0]]
class_proba = prediction.max(1)[0]
true_class = class_names[y[index]]
print(f'Class predicted: {class_predicted} (p={class_proba})')
print(f'True class: {class_names[y[index]]}')

In [None]:
fig = exp.as_pyplot_figure()

In [None]:
exp.show_in_notebook(text=True)

##Partie 3: Neural Collaborative Filtering:

In [None]:
df_train = pd.read_csv('/content/projet_AI_frameworks/train/interactions_train.csv')
df_test = pd.read_csv('/content/projet_AI_frameworks/test/interactions_test.csv')
df_train.head()

In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn

user_list = pd.concat([df_train, df_test, df]).user_id.unique()
item_list = pd.concat([df_train, df_test, df]).recipe_id.unique()
user2id = {w: i for i, w in enumerate(user_list)}
item2id = {w: i for i, w in enumerate(item_list)}


class Ratings_Datset(Dataset):
    def __init__(self, df):
        self.df = df.reset_index()

    def __len__(self):
        return len(self.df)
  
    def __getitem__(self, idx):
        user = user2id[self.df['user_id'][idx]]
        user = torch.tensor(user, dtype=torch.long) 
        item = item2id[self.df['recipe_id'][idx]]
        item = torch.tensor(item, dtype=torch.long)
        rating = torch.tensor(self.df['rating'][idx], dtype=torch.float)
        return user, item, rating


trainloader = DataLoader(Ratings_Datset(df_train), batch_size=512, shuffle=True ,num_workers=2)
testloader = DataLoader(Ratings_Datset(df_test), batch_size=64, num_workers=2)

In [None]:
from tqdm.notebook import tqdm
import torch

from statistics import mean


def train(model, optimizer, trainloader, epochs=30):
    criterion = nn.MSELoss(reduction='mean')
    t = tqdm(range(epochs))
    for epoch in t:
        corrects = 0
        total = 0
        train_loss = []
        for users, items, r in trainloader:
            users = users.cuda()
            items = items.cuda()
            r = r.cuda() / 5
            y_hat = model(users, items)
            loss = criterion(y_hat, r.unsqueeze(1).float())
            train_loss.append(loss.item())
            total += r.size(0)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            t.set_description(f"loss: {mean(train_loss)}")

In [None]:
import math

def test(model, testloader, m_eval=False):

    
    running_mae = 0
    with torch.no_grad():
        corrects = 0
        total = 0
        for users, items, r in testloader:
            users = users.cuda()
            items = items.cuda()
            y = r.cuda() / 5
            y_hat = model(users, items).flatten()
            error = torch.abs(y_hat - y).sum().data
            
            running_mae += error
            total += y.size(0)
    
    mae = running_mae/total
    return mae * 5
    

In [None]:
import torch
import torch.nn 
class NCF(nn.Module):
        
    def __init__(self, n_users, n_items, n_factors=8):
        super().__init__()
        self.user_embeddings = torch.nn.Embedding(n_users, n_factors)
        self.item_embeddings = torch.nn.Embedding(n_items, n_factors)
        self.predictor = torch.nn.Sequential(
            nn.Linear(in_features=n_factors*2, out_features=64),
            nn.Linear(in_features=64, out_features=32),
            nn.Linear(in_features=32, out_features=1),
            nn.Sigmoid()
        )
        
        
    def forward(self, user, item):
        

        u = self.user_embeddings(user)
        i = self.item_embeddings(item)

        # Concat the two embedding layers
        z = torch.cat([u, i], dim=-1)
        return self.predictor(z)

In [None]:
model = NCF(len(user_list), len(item_list)).cuda()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
train(model, optimizer, trainloader, epochs=5)

In [None]:
test(model, testloader)

In [None]:
users, movies, r = next(iter(testloader))
users = users.cuda()
movies = movies.cuda()
r = r.cuda()

y = model(users, movies)*5
print("ratings", r[:10].data)
print("predictions:", y.flatten()[:10].data)

In [None]:
torch.save(model, '/content/projet_AI_frameworks/weight.pth')

In [None]:
# main to test on test_script.csv
if __name__ == "__main__":
  df = pd.read_csv("/content/projet_AI_frameworks/test_script.csv")
  model = torch.load("/content/projet_AI_frameworks/weight.pth")
  model.eval()
  test = DataLoader(Ratings_Datset(df), batch_size=10, shuffle=True ,num_workers=1)
  users, movies, r = next(iter(test))
  users = users.cuda()
  movies = movies.cuda()
  r = r.cuda()

  y = model(users, movies)*5
  print("ratings", r[:10].data)
  print("predictions:", y.flatten()[:10].data)