# ML Mini-Project

1. Import modules
2. Load the data
3. Transform Data
4. Divide data into training/testing set
5. Create empty model
6. Fit/train the model
7. Evaluate the model

### Problem formulation: 
We want to be able to predict pizza type based on the ingredients it contains \
We may also want to be able to predict a pizza price based on ingredients and possibly some other metric

In [436]:
# Importing necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
import re

from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.model_selection import train_test_split

### Load Files

In [398]:
DATA_PATH = f'{os.path.abspath("")}/Data files'

data_dictionary = pd.read_csv(f'{DATA_PATH}/data_dictionary.csv')
order_details = pd.read_csv(f'{DATA_PATH}/pizza_sales/order_details.csv')
orders = pd.read_csv(f'{DATA_PATH}/pizza_sales/orders.csv')
pizza_types = pd.read_csv(f'{DATA_PATH}/pizza_sales/pizza_types.csv')
pizzas = pd.read_csv(f'{DATA_PATH}/pizza_sales/pizzas.csv')
# print(pizza_types)



### Transform DataFrames

Create a dataframe of each unique ingredient to be used for our classifications.

In [400]:
ingredient_list = []
for i in range(len(pizza_types["ingredients"])):
    for n in range(len(pizza_types["ingredients"][i].split(','))):
        if n == 0: continue
        ingredient_list.append(pizza_types["ingredients"][i].split(',')[n].strip())

ingredients = pd.DataFrame(ingredient_list, columns=['ingredient'])["ingredient"].unique()


ingredient_by_pizza_dict = {}
for i in range(len(pizza_types["name"])):
    ingredient_by_pizza_dict[pizza_types["name"][i]] = pizza_types["ingredients"][i].split(',')

In [401]:
pizza_ingredient = pizza_types
pizza_ingredient[ingredients] = 0

for i, ingredients in enumerate(pizza_ingredient['ingredients']):
    for ingredient in ingredients:
        if ingredient in pizza_ingredient.columns:
            pizza_ingredient.loc[i, ingredient] = 1


Create a DataFrame of pizza id's and all ingredients as columns to be used for learning. \
If a pizza contains one ingredient, we give it the number 1. if it does not contain an ingredient we assign it a 0.

### Transform
Transform the Data into a viable dataframe.

In [404]:
# Get total count of sold pizza per pizza_type_id
pizza_list = pizza_types[['pizza_type_id', 'name', 'category']].copy()

sold_pizzas = order_details[['pizza_id', 'quantity']].copy()
sold_pizzas['pizza_id'] = sold_pizzas['pizza_id'].str.replace(r"(_s$)|(_m$)|(_l$)|(_xl$)|(_xxl$)|", "", regex=True)
pizza_list['quantity_sold'] = sold_pizzas.groupby(['pizza_id'])['quantity'].transform('count')

pizza_list.at[10, 'category'] = 'Classic'
pizza_list.at[28, 'category'] = 'Veggie'

print(pizza_list)

   pizza_type_id                                        name category  \
0        bbq_ckn                  The Barbecue Chicken Pizza  Chicken   
1       cali_ckn                The California Chicken Pizza  Chicken   
2    ckn_alfredo                   The Chicken Alfredo Pizza  Chicken   
3      ckn_pesto                     The Chicken Pesto Pizza  Chicken   
4     southw_ckn                 The Southwest Chicken Pizza  Chicken   
5       thai_ckn                      The Thai Chicken Pizza  Chicken   
6       big_meat                          The Big Meat Pizza  Classic   
7    classic_dlx                    The Classic Deluxe Pizza  Classic   
8       hawaiian                          The Hawaiian Pizza  Classic   
9    ital_cpcllo                 The Italian Capocollo Pizza  Classic   
10    napolitana                        The Napolitana Pizza  Classic   
11   pep_msh_pep  The Pepperoni, Mushroom, and Peppers Pizza  Classic   
12     pepperoni                         The Pepper

In [433]:
temp_df = pizza_ingredient.drop(["name","pizza_type_id", "ingredients"], axis=1)
temp_df.at[10, 'category'] = 'Classic'
temp_df.at[28, 'category'] = 'Veggie'


cat = pd.Series(temp_df['category'].unique()).map(lambda x: "Other" if x == 1 else x)
cat = pd.DataFrame(temp_df, columns=['category'])
# catdict = {cat["category"].values[i]:range(0,len(cat))[i] for i in range(len(cat["category"].values))}
# i = 0
# for key in catdict.keys():
#     catdict[key] = i
#     i +=1

# temp_df["category"] = temp_df["category"].map(lambda x: catdict[x])


temp_df.loc[temp_df["category"] == 'Chicken', 'category'] = 0
temp_df.loc[temp_df["category"] == 'Classic', 'category'] = 1
temp_df.loc[temp_df['category'] == 'Supreme', 'category'] = 2
temp_df.loc[temp_df['category'] == 'Veggie', 'category'] = 3

In [438]:
print(temp_df.columns)

Index(['category', 'Red Peppers', 'Green Peppers', 'Tomatoes', 'Red Onions',
       'Barbecue Sauce', 'Artichoke', 'Spinach', 'Garlic', 'Jalapeno Peppers',
       'Fontina Cheese', 'Gouda Cheese', 'Mushrooms', 'Asiago Cheese',
       'Alfredo Sauce', 'Pesto Sauce', 'Corn', 'Cilantro', 'Chipotle Sauce',
       'Pineapple', 'Thai Sweet Chilli Sauce', 'Pepperoni', 'Italian Sausage',
       'Chorizo Sausage', 'Bacon', 'Mozzarella Cheese', 'Goat Cheese',
       'Oregano', 'Anchovies', 'Green Olives', 'Feta Cheese',
       'Beef Chuck Roast', 'Prosciutto', 'Caramelized Onions', 'Pears',
       'Thyme', 'Pancetta', 'Friggitello Peppers', 'Capocollo', 'Arugula',
       'Luganega Sausage', 'Onions', 'Artichokes', 'Peperoncini verdi',
       'Kalamata Olives', 'Provolone Cheese', 'Smoked Gouda Cheese',
       'Romano Cheese', 'Blue Cheese', 'Gorgonzola Piccante Cheese',
       'Parmigiano Reggiano Cheese', 'Zucchini', 'Sun-dried Tomatoes',
       'Plum Tomatoes'],
      dtype='object')


#### 4. Divide data into training/testing set

In [443]:
X = temp_df # Skriver in alla rader ifrån instruktionen tidigare i markdown
y = X.pop("category")

X_train, X_test, y_train, y_test = train_test_split(X,y)
y_test

6     1
14    2
8     1
27    3
20    2
0     0
29    3
11    1
Name: category, dtype: object

#### 5. Create empty model

#### 6. Fit/train the model

#### 7. Evaluate the model

In [None]:
def transform_data(ingredient_list):
    for item in ingredient_list:
        

Tables I think we want for classification on label category: \
1. pizza_category - labels for our id's predicted by classification \
2. pizza_ingredients_categories(name?) - columns : pizza-id, pizza category, *features(one column for each ingredient, maybe more to predict on) \
3. pizza id by count(sold_pizza_id)
   1. helen 2. Andreas 3. Marcus
