# ML Mini-Project

1. Import modules
2. Load the data
3. Divide data into training/testing set
4. Create empty model
5. Fit/train the model
6. Evaluate the model

### Problem formulation: 
We want to be able to predict pizza type based on the ingredients it contains \
We may also want to be able to predict a pizza price based on ingredients and possibly some other metric

In [135]:
# Importing necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
import re

from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC
from sklearn.inspection import DecisionBoundaryDisplay#

### Load Files

In [88]:
DATA_PATH = f'{os.path.abspath("")}/Data files'

data_dictionary = pd.read_csv(f'{DATA_PATH}/data_dictionary.csv')
order_details = pd.read_csv(f'{DATA_PATH}/pizza_sales/order_details.csv')
orders = pd.read_csv(f'{DATA_PATH}/pizza_sales/orders.csv')
pizza_types = pd.read_csv(f'{DATA_PATH}/pizza_sales/pizza_types.csv')
pizzas = pd.read_csv(f'{DATA_PATH}/pizza_sales/pizzas.csv')
# print(pizza_types)



[' Red Peppers' ' Green Peppers' ' Tomatoes' ' Red Onions'
 ' Barbecue Sauce' ' Artichoke' ' Spinach' ' Garlic' ' Jalapeno Peppers'
 ' Fontina Cheese' ' Gouda Cheese' ' Mushrooms' ' Asiago Cheese'
 ' Alfredo Sauce' ' Pesto Sauce' ' Corn' ' Cilantro' ' Chipotle Sauce'
 ' Pineapple' ' Thai Sweet Chilli Sauce' ' Pepperoni' ' Italian Sausage'
 ' Chorizo Sausage' ' Bacon' ' Mozzarella Cheese' ' Goat Cheese'
 ' Oregano' ' Anchovies' ' Green Olives' ' Feta Cheese'
 ' Beef Chuck Roast' ' Prosciutto' ' Caramelized Onions' ' Pears' ' Thyme'
 ' Pancetta' ' Friggitello Peppers' ' Capocollo' ' Arugula'
 ' Luganega Sausage' ' Onions' ' Artichokes' ' Peperoncini verdi'
 ' Kalamata Olives' ' Provolone Cheese' ' Smoked Gouda Cheese'
 ' Romano Cheese' ' Blue Cheese' ' Gorgonzola Piccante Cheese'
 ' Parmigiano Reggiano Cheese' ' Zucchini' ' Sun-dried Tomatoes'
 ' Plum Tomatoes']


### Transform DataFrames

Create a dataframe of each unique ingredient to be used for our classifications.

In [141]:
ingredient_list = []
for i in range(len(pizza_types["ingredients"])):
    for n in range(len(pizza_types["ingredients"][i].split(','))):
        if n == 0: continue
        ingredient_list.append(pizza_types["ingredients"][i].split(',')[n].strip())

ingredients = pd.DataFrame(ingredient_list, columns=['ingredient'])["ingredient"].unique()


ingredient_by_pizza_dict = {}
for i in range(len(pizza_types["name"])):
    ingredient_by_pizza_dict[pizza_types["name"][i]] = pizza_types["ingredients"][i].split(',')
ingdf = pd.DataFrame(ingredient_by_pizza_dict)

{'The Barbecue Chicken Pizza': ['Barbecued Chicken',
  ' Red Peppers',
  ' Green Peppers',
  ' Tomatoes',
  ' Red Onions',
  ' Barbecue Sauce'],
 'The California Chicken Pizza': ['Chicken',
  ' Artichoke',
  ' Spinach',
  ' Garlic',
  ' Jalapeno Peppers',
  ' Fontina Cheese',
  ' Gouda Cheese'],
 'The Chicken Alfredo Pizza': ['Chicken',
  ' Red Onions',
  ' Red Peppers',
  ' Mushrooms',
  ' Asiago Cheese',
  ' Alfredo Sauce'],
 'The Chicken Pesto Pizza': ['Chicken',
  ' Tomatoes',
  ' Red Peppers',
  ' Spinach',
  ' Garlic',
  ' Pesto Sauce'],
 'The Southwest Chicken Pizza': ['Chicken',
  ' Tomatoes',
  ' Red Peppers',
  ' Red Onions',
  ' Jalapeno Peppers',
  ' Corn',
  ' Cilantro',
  ' Chipotle Sauce'],
 'The Thai Chicken Pizza': ['Chicken',
  ' Pineapple',
  ' Tomatoes',
  ' Red Peppers',
  ' Thai Sweet Chilli Sauce'],
 'The Big Meat Pizza': ['Bacon',
  ' Pepperoni',
  ' Italian Sausage',
  ' Chorizo Sausage'],
 'The Classic Deluxe Pizza': ['Pepperoni',
  ' Mushrooms',
  ' Red Onion

In [149]:
pizza_types.head(5)

Unnamed: 0,pizza_type_id,name,category,ingredients
0,bbq_ckn,The Barbecue Chicken Pizza,Chicken,"Barbecued Chicken, Red Peppers, Green Peppers,..."
1,cali_ckn,The California Chicken Pizza,Chicken,"Chicken, Artichoke, Spinach, Garlic, Jalapeno ..."
2,ckn_alfredo,The Chicken Alfredo Pizza,Chicken,"Chicken, Red Onions, Red Peppers, Mushrooms, A..."
3,ckn_pesto,The Chicken Pesto Pizza,Chicken,"Chicken, Tomatoes, Red Peppers, Spinach, Garli..."
4,southw_ckn,The Southwest Chicken Pizza,Chicken,"Chicken, Tomatoes, Red Peppers, Red Onions, Ja..."


In [310]:
pizza_ingredient = pizza_types
pizza_ingredient[ingredients] = 0

for i, ingredients in enumerate(pizza_ingredient['ingredients']):
    for ingredient in ingredients:
        if ingredient in pizza_ingredient.columns:
            pizza_ingredient.loc[i, ingredient] = 1


Unnamed: 0,Mushrooms,Tomatoes,Red Peppers,Green Peppers,Red Onions,Zucchini,Spinach,Garlic
0,0,1,1,1,1,0,0,0
1,0,0,0,0,0,0,1,1
2,0,0,1,0,1,0,0,0
3,0,1,1,0,0,0,1,1
4,0,1,1,0,1,0,0,0
5,0,1,1,0,0,0,0,0
6,0,0,0,0,0,0,0,0
7,0,0,1,0,1,0,0,0
8,0,0,0,0,0,0,0,0
9,0,1,1,0,0,0,0,1


Create a DataFrame of pizza id's and all ingredients as columns to be used for learning. \
If a pizza contains one ingredient, we give it the number 1. if it does not contain an ingredient we assign it a 0.

In [295]:
pizza_ingredient

Unnamed: 0,pizza_type_id,name,category,ingredients,Red Peppers,Green Peppers,Tomatoes,Red Onions,Barbecue Sauce,Artichoke,...,Kalamata Olives,Provolone Cheese,Smoked Gouda Cheese,Romano Cheese,Blue Cheese,Gorgonzola Piccante Cheese,Parmigiano Reggiano Cheese,Zucchini,Sun-dried Tomatoes,Plum Tomatoes
0,bbq_ckn,The Barbecue Chicken Pizza,Chicken,"[Barbecued Chicken, Red Peppers, Green Peppe...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,cali_ckn,The California Chicken Pizza,Chicken,"[Chicken, Artichoke, Spinach, Garlic, Jala...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,ckn_alfredo,The Chicken Alfredo Pizza,Chicken,"[Chicken, Red Onions, Red Peppers, Mushroom...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,ckn_pesto,The Chicken Pesto Pizza,Chicken,"[Chicken, Tomatoes, Red Peppers, Spinach, ...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,southw_ckn,The Southwest Chicken Pizza,Chicken,"[Chicken, Tomatoes, Red Peppers, Red Onions...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,thai_ckn,The Thai Chicken Pizza,Chicken,"[Chicken, Pineapple, Tomatoes, Red Peppers,...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,big_meat,The Big Meat Pizza,Classic,"[Bacon, Pepperoni, Italian Sausage, Chorizo...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,classic_dlx,The Classic Deluxe Pizza,Classic,"[Pepperoni, Mushrooms, Red Onions, Red Pepp...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,hawaiian,The Hawaiian Pizza,Classic,"[Sliced Ham, Pineapple, Mozzarella Cheese]",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,ital_cpcllo,The Italian Capocollo Pizza,Classic,"[Capocollo, Red Peppers, Tomatoes, Goat Che...",0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Transform
Transform the Data into a viable dataframe.

In [299]:
# Get total count of sold pizza per pizza_type_id
pizza_list = pizza_types[['pizza_type_id', 'name', 'category']].copy()

sold_pizzas = order_details[['pizza_id', 'quantity']].copy()
sold_pizzas['pizza_id'] = sold_pizzas['pizza_id'].str.replace(r"(_s$)|(_m$)|(_l$)|(_xl$)|(_xxl$)|", "", regex=True)
pizza_list['quantity_sold'] = sold_pizzas.groupby(['pizza_id'])['quantity'].transform('count')

pizza_list.at[10, 'category'] = 'Classic'
pizza_list.at[28, 'category'] = 'Veggie'

print(pizza_list)

   pizza_type_id                                        name category  \
0        bbq_ckn                  The Barbecue Chicken Pizza  Chicken   
1       cali_ckn                The California Chicken Pizza  Chicken   
2    ckn_alfredo                   The Chicken Alfredo Pizza  Chicken   
3      ckn_pesto                     The Chicken Pesto Pizza  Chicken   
4     southw_ckn                 The Southwest Chicken Pizza  Chicken   
5       thai_ckn                      The Thai Chicken Pizza  Chicken   
6       big_meat                          The Big Meat Pizza  Classic   
7    classic_dlx                    The Classic Deluxe Pizza  Classic   
8       hawaiian                          The Hawaiian Pizza  Classic   
9    ital_cpcllo                 The Italian Capocollo Pizza  Classic   
10    napolitana                        The Napolitana Pizza  Classic   
11   pep_msh_pep  The Pepperoni, Mushroom, and Peppers Pizza  Classic   
12     pepperoni                         The Pepper

In [None]:
# pizza_type_id, pizza_category_id

In [320]:
cat = pd.Series(pizza_types['category'].unique()).map(lambda x: "Other" if x == 1 else x)
cat = pd.DataFrame(cat, columns=['category'])
print(cat)

  category
0  Chicken
1  Classic
2    Other
3  Supreme
4   Veggie


In [None]:
temp_df = pizza_ingredient.drop(["name","pizza_type_id", "ingredients"], axis=0)


def x(y):
    return y
temp_df["category"] = temp_df["category"].map(lambda x: )
class_y = pizza_ingredient.pop('category')

Tables I think we want for classification on label category: \
1. pizza_category - labels for our id's predicted by classification \
2. pizza_ingredients_categories(name?) - columns : pizza-id, pizza category, *features(one column for each ingredient, maybe more to predict on) \
3. pizza id by count(sold_pizza_id)
   1. helen 2. Andreas 3. Marcus
