#### Let us now try fitting the various different models in this data to find which one fits best. We will also be using ensembling techniques like Random Forest to see if that fits well.


#### We need to convert the data into numerical format so that our algorithms can work on it, Then we need to split the data into training and testing data and then we need to test our model over the test data and see the scores.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.metrics import r2_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder


import joblib
from sklearn.metrics import confusion_matrix

#### Converting data into format that can be fed into machine learning algorithms

In [2]:
df = pd.read_csv(r"C:\Users\sajee\OneDrive\Desktop\Xtern AI- Sajeev Singh- 2024\Data.csv")
df2 = pd.read_csv(r"C:\Users\sajee\OneDrive\Desktop\Xtern AI- Sajeev Singh- 2024\Menu.csv")

In [3]:
copy = df.copy()
mappings = {
    'Year': {'Year 1': 1, 'Year 2': 2, 'Year 3': 3, 'Year 4': 4},
    'Major': {value: index for index, value in enumerate(df['Major'].unique())},
    'University': {value: index for index, value in enumerate(df['University'].unique())},
    'Order': {value: index for index, value in enumerate(df['Order'].unique())}
}
for column, mapping in mappings.items():
    copy[column] = copy[column].map(mapping)

print(copy.head())

   Year  Major  University  Time  Order
0     2      0           0    12      0
1     3      1           1    14      1
2     3      1           2    12      2
3     2      2           0    11      0
4     3      3           2    12      3


In [4]:
X = copy[['University', 'Major', 'Year', 'Time']] ### Setting X-variables
y = copy['Order'] ### Setting target variable as y

scaler = StandardScaler()
X = scaler.fit_transform(X)

In [5]:
### Using sklearn test train split and splitting the data in normal 20-80.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=20, shuffle= True)

#### Now we will use the different algorithms to find the best one in these
#### 1. Random Forest Classifier
#### 2. K- Nearest Neighbours
#### 3. SVC
#### 4. Decision Tree Classifier

In [6]:
classifiers = [
    ('Random Forest', RandomForestClassifier(random_state=0)),
    ('K-Nearest Neighbors', KNeighborsClassifier()),
    ('Support Vector Machine', SVC()),
    ('Decision Tree', DecisionTreeClassifier(random_state=0))
]

# Initializing lists to store evaluation metrics
accuracy_scores = []
precision_scores = []
recall_scores = []
f1_scores = []

for name, classifier in classifiers:
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    accuracy_scores.append(accuracy_score(y_test, y_pred))
    precision_scores.append(precision_score(y_test, y_pred, average='weighted'))  # Use weighted average for multiclass
    recall_scores.append(recall_score(y_test, y_pred, average='weighted'))  # Use weighted average for multiclass
    f1_scores.append(f1_score(y_test, y_pred, average='weighted'))  # Use weighted average for multiclass

import pandas as pd
metrics_df = pd.DataFrame({
    'Classifier': [name for name, _ in classifiers],
    'Accuracy': accuracy_scores,
    'Precision': precision_scores,
    'Recall': recall_scores,
    'F1-Score': f1_scores
})

### Creating a bar plot for metrics
plt.figure(figsize=(10, 6))
sns.barplot(x='Classifier', y='Accuracy', data=metrics_df, palette='viridis')
plt.title('Classifier Performance Comparison (Accuracy)')
plt.xlabel('Classifier')
plt.ylabel('Accuracy')
plt.xticks(rotation=45)
plt.show()

print(metrics_df)


NameError: name 'precision_score' is not defined

#### Clearly Decision Tree and Random Forest works best in this case and thus I choose the Decision Tree classifier as my final model because of less complexity in that

#### We can also use advance techniques of combining ML algo and tuning it but that would not give any extra result in this case as there are no such patterns already found here.

In [None]:
joblib.dump(d_tree, 'decision_tree_model.pkl')
loaded_model = joblib.load('decision_tree_model.pkl')
### Now we can use 'loaded_model' for predictions
y_pred = loaded_model.predict(X_test)

In [None]:
confusion_m = confusion_matrix(y_test, y_pred)
print(confusion_m)

## Answer to last question- 
#### To assess the viability of launching a FoodX food recommender AI, key considerations include data quality, model performance, user interest, and cost-effectiveness. Collecting and maintaining data can be resource-intensive, so ensuring data cleanliness and unbiased information is essential. The AI model should consistently outperform random guessing to justify its implementation. User engagement and interest in such a system should be evaluated. A cost-benefit analysis, comparing potential savings from offering discounts through AI predictions versus traditional methods, is crucial. Addressing the current lack of discernible patterns in the data involves gathering more data and expanding measured variables. Investing in stronger models, like XGBoost, can enhance predictive capabilities. Time sensitivity plays a role, as quick deployment may not align with data readiness. Balancing these factors is key to determining the feasibility of launching the FoodX AI recommender.