<a href="https://colab.research.google.com/github/mkjubran/Fundamentals-of-AI-and-Machine-Learning/blob/main/RECOMMENDATION_SYSTEMS_CONTENT_BASED.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## RECOMMENDATION SYSTEMS - CONTENT-BASED


In this notebook, we will demonstrate how to build a Healthy Diet Recommender system. We will work on the Healthy Diet datasets from Kaggle (https://www.kaggle.com/code/dhyanidesai/healthy-diet-recommender/data).

# Import Libraries

First, we need to import some libraries that will be used during the creation and evaluation of the Decision Tree and Random Forest models.

In [None]:
import pandas as pd
import joblib as jb

# Data Preparation

**Clone the dataset Repository**

The dataset can be cloned from the GitHub repository https://github.com/mkjubran/AIData.git as below

In [None]:
!rm -rf ./AIData
!git clone https://github.com/mkjubran/AIData.git

**Read the dataset**

The data is stored in the dataset.csv file. Read the input data into a dataframe using the Pandas library (https://pandas.pydata.org/) to read the data.

In [None]:
df = pd.read_csv('/content/AIData/HealthyDietRecommender/dataset.csv')
df.head()

**Display Data Info**

Display some information about the dataset using the info() method

In [None]:
df.info()

The dataset contains 512 meal records with 8 features for each record.

# Clean Data

**Check Missing Values**

Check if there are any missing values in the dataset

In [None]:
df.isnull().sum()

Only one meal record has a missing description. We will keep this record so that it will be recommended incase the recommendation is based on the other features.

# Feature Selection and Encode Features

At this stage we will consider the category, Veg, Nutrient, and Disease features for the recomendation system. 

We have 78 categories in the category feature, 2 options in the Veg feature, 17 options in the Nutrient feature, 12 diseases in the Disease feature and 16 options in the Diet feature.

In [None]:
print('Number of categories in the category feature is {}'.format(df['catagory'].unique().size))
print('Number of options in the Veg feature is {}'.format(df['Veg_Non'].unique().size))
print('Number of options in the Nutrient feature is {}'.format(df['Nutrient'].unique().size))

Disease = list(filter(None,list(sorted(set(df['Disease'].sum().replace('[^a-zA-Z ]', '').lower().split(' '))))))
print('Number of options in the Disease feature is {}'.format(len(Disease)))

Diet = list(filter(None,list(sorted(set(df['Diet'].sum().replace('[^a-zA-Z ]', '').lower().split(' '))))))
print('Number of options in the Diet feature is {}'.format(len(Diet)))

We will encode all of these features using get_dummies() and store them in a separate dataframe

In [None]:
catagory_dummies = df.catagory.str.get_dummies()
Veg_Non_dummies = df.Veg_Non	.str.get_dummies()
nutrient_dummies = df.Nutrient.str.get_dummies()
disease_dummies = df.Disease.str.get_dummies(sep=' ')
diet_dummies = df.Diet.str.get_dummies(sep=' ')

feature_df = pd.concat([catagory_dummies,Veg_Non_dummies,nutrient_dummies,disease_dummies,diet_dummies],axis=1)
feature_df.shape

The number of features in the resulting dataframe is 125 features. we will build a recommender system based on these features.

# Train Unsupervised Nearest Neighbors Model

We will use the Unsupervised Nearest Neighbors algorithm from sklearn to 

In [None]:
from sklearn.neighbors import NearestNeighbors
model_NearestNeighbors = NearestNeighbors(n_neighbors=5,algorithm='ball_tree')
model_NearestNeighbors.fit(feature_df)

Now, we need to prepare the format of the input for the recommender system (Input_features). This will be a dictionary that contains the features after get_dummies as keys.

In [None]:
Input_features = dict()
for i in feature_df.columns:
    Input_features[i]= 0
print(Input_features)

# Saving and Loading Models

We will use the joblib method from sklearn library (https://scikit-learn.org/stable/modules/model_persistence.html) to save and load the models. To save the model and the input for the recommender system, we use the dump method as

In [None]:
jb.dump(model_NearestNeighbors, './model_NearestNeighbors.joblib')
jb.dump(Input_features, './model_NearestNeighbors_Input_features.joblib')

And to load the recommender model and the input for the recommender system, we will use the load() method

In [None]:
model_NearestNeighbors_joblib = jb.load('./model_NearestNeighbors.joblib')
Input_features_joblib = jb.load('./model_NearestNeighbors_Input_features.joblib')

# Recommend Meals After Loading Models

To recommend a meal after loading the model, we need first to read the dataset.

In [None]:
df_ALoad = pd.read_csv('/content/AIData/HealthyDietRecommender/dataset.csv')
df_ALoad.head()

To use the loaded recommender model, we need to get the values of the features for any input. Next, we will assume a sample input, then we will map this input to the existing features in feature_df dataframe, and produce the output vector final_input. The value of every feature in the final_input vector will equal one if the feature is available in the sample input, otherwise, it equals zero.

In [None]:
sample_input = ['high_protien_diet','gluten_free_diet','diabeties','anemia','calcium','protien']

for i in sample_input:
    
    Input_features_joblib[i] = 1

final_input = list(Input_features_joblib.values())
print(final_input)

To get the most recommended meals, we will apply the final_input vector to the loaded model

In [None]:
distnaces , indices = model_NearestNeighbors_joblib.kneighbors([final_input])

This will return the indices of the closest records and the distance between them and the sample input.

In [None]:
print(distnaces , indices)

Next, we will print the list of recommended records from the original dataset (before feature selection and encoding)

In [None]:
df_results = pd.DataFrame(columns=list(df_ALoad.columns))

for i in list(indices):
    df_results = df_results.append(df_ALoad.loc[i])
                
df_results = df_results.filter(['Name','Nutrient','Veg_Non','Price','Review','Diet','Disease','description'])
df_results = df_results.drop_duplicates(subset=['Name'])
df_results = df_results.reset_index(drop=True)
df_results