# Flavor Recommender System

In this project, following the [Python Programming Course Bundle: Build 15 applications from Udemy](https://www.udemy.com/course/full-python-programming/), we'll create a model to predict the favourite flavor of a user given its age and gender.  

First, let's make preparations:

In [45]:
import pandas as pd
import numpy as np

It's time to read the data and have a look at it:

In [8]:
data = pd.read_csv('./flavour.csv')
print('Number of examples in the dataset: ' + str(len(data)))
data.head()

Number of examples in the dataset: 20


Unnamed: 0,Age,Gender,Flavour
0,6,Male,Chocolate
1,6,Female,Strawberry
2,7,Male,Chocolate
3,8,Female,Strawberry
4,11,Male,Butterscotch


As there are only 20 examples, our model will not be very precise. However, as this is a learning project let's build it anyways and let it train on all 20 examples. Before creating the model, its important to prepare the data.

As seen before, the value we want our model is predict is flavour so let's separate that column from the rest:

In [55]:
X = data.iloc[:, :-1].values
y = data.iloc[:, 2].values

flavours = list(dict.fromkeys(y))

print('The different flavours in our dataset are: ')
print(*flavours, sep = ', ')

The different flavours in our dataset are: 
Chocolate, Strawberry, Butterscotch, Vanilla, Mango, Almond & Chocolate, Coffe


The flavour column is an example of categorical data. It is necessary to encode this data before feeding it to our model:

In [100]:
from sklearn import preprocessing
le_Y = preprocessing.LabelEncoder()
y = le_Y.fit_transform(y)

flavour_encoding = dict(
                    zip(
                        list(dict.fromkeys(y)),
                        list(dict.fromkeys(data.iloc[:, 2].values))
                    )
                )

print('The encoding for the output will be the following:')
print(flavour_encoding)

The encoding for the output will be the following:
{2: 'Chocolate', 5: 'Strawberry', 1: 'Butterscotch', 6: 'Vanilla', 4: 'Mango', 0: 'Almond & Chocolate', 3: 'Coffe'}


The same process applies to the 'Gender' column in our model input X, as it only has two possible values (in the given data):

In [70]:
le_X = preprocessing.LabelEncoder()
X[:,1] = le_X.fit_transform(X[:,1])

gender_encoding = dict(
                    zip(
                        list(data.iloc[:, 1].values),
                        list(dict.fromkeys(X[:,1]))
                    ))

print('The encoding for the gender column will be the following:')
print(gender_encoding)

The encoding for the gender column will be the following:
{'Male': 1, 'Female': 0}


Now our data is ready for the model. For this project, the course instructor uses a decision tree classifier from the sklearn library. Let's go ahead and build and train the model:

In [68]:
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X, y)

DecisionTreeClassifier()

Let's try and predict what my favourite flavour would be, just for fun:

In [110]:
user = [21, gender_encoding.get('Male')]
encoded_output = model.predict([user])
prediction = flavour_encoding[encoded_output[0]]

print(prediction)

Coffe


It didn't do bad at all :coffee::clap:

Final comment: If more data was available, I would have liked to separate some of it to test the model's performance. However, as only 20 examples were provided, the model's training would be too short and so it would loose some precision.