In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from pandas.plotting import parallel_coordinates
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn import metrics
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

To load the dataset, we can use the read_csv function from pandas

In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/joek47/eatwhat/main/data.csv')

After we load the data, we can take a look at the first couple of rows through the head function:

In [3]:
df.head()

Unnamed: 0,horoscope,fengshui,luckycolour,hungry,food
0,5,3,1,0,chickenrice
1,4,3,1,0,chickenrice
2,4,3,1,0,chickenrice
3,4,3,1,0,chickenrice
4,5,3,1,0,chickenrice


We assign numbers to different features. 5 under horoscope is how near to your horoscope month.

Maybe lower number = current day is nearer to your horoscope. 

# Train-Test Split

Now we split the dataset into a training set and a test set. In general, we should also have a validation set to evaluate the performance of each classifier and fine-tune the model parameters in order to determine the best model. 

Due to the small size of this dataset, we can simplify this process by using the test set to serve the purpose of the validation set.

In [4]:
train, test = train_test_split(df, test_size = 0.3, stratify = df['food'], random_state = 22)

# Build Classifiers
Now we are ready to build a classifier.
Let’s separate out the class label and features first

In [5]:
X_train = train[['horoscope','fengshui','luckycolour','hungry']]
y_train = train.food
X_test = test[['horoscope','fengshui','luckycolour','hungry']]
y_test = test.food

We are predicting the following with ground truth

In [8]:
y_test.head()

44     chickenrice
148      rotiprata
24     chickenrice
29     chickenrice
119      rotiprata
Name: food, dtype: object

Let's try 2 models: Logistic Regression and SVC.

In [6]:
# Logistic regression
mod_lr = LogisticRegression(solver = 'newton-cg').fit(X_train, y_train)
prediction=mod_lr.predict(X_test)
print('The accuracy of the Logistic Regression is',"{:.3f}".format(metrics.accuracy_score(prediction,y_test)))

The accuracy of the Logistic Regression is 0.978


In [7]:
# SVC with linear kernel
linear_svc = SVC(kernel='linear').fit(X_train, y_train)
prediction=linear_svc.predict(X_test)
print('The accuracy of the linear SVC is',"{:.3f}".format(metrics.accuracy_score(prediction,y_test)))

The accuracy of the linear SVC is 0.978
