# ML Mini-Project

1. Import modules
2. Load the data
3. Divide data into training/testing set
4. Create empty model
5. Fit/train the model
6. Evaluate the model

### Problem formulation: 
We want to be able to predict pizza type based on the ingredients it contains \
We may also want to be able to predict a pizza price based on ingredients and possibly some other metric

In [61]:
# Importing necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os

from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC
from sklearn.inspection import DecisionBoundaryDisplay#

### Load Files

In [88]:
DATA_PATH = f'{os.path.abspath("")}/Data files'

data_dictionary = pd.read_csv(f'{DATA_PATH}/data_dictionary.csv')
order_details = pd.read_csv(f'{DATA_PATH}/pizza_sales/order_details.csv')
orders = pd.read_csv(f'{DATA_PATH}/pizza_sales/orders.csv')
pizza_types = pd.read_csv(f'{DATA_PATH}/pizza_sales/pizza_types.csv')
pizzas = pd.read_csv(f'{DATA_PATH}/pizza_sales/pizzas.csv')
# print(pizza_types)



[' Red Peppers' ' Green Peppers' ' Tomatoes' ' Red Onions'
 ' Barbecue Sauce' ' Artichoke' ' Spinach' ' Garlic' ' Jalapeno Peppers'
 ' Fontina Cheese' ' Gouda Cheese' ' Mushrooms' ' Asiago Cheese'
 ' Alfredo Sauce' ' Pesto Sauce' ' Corn' ' Cilantro' ' Chipotle Sauce'
 ' Pineapple' ' Thai Sweet Chilli Sauce' ' Pepperoni' ' Italian Sausage'
 ' Chorizo Sausage' ' Bacon' ' Mozzarella Cheese' ' Goat Cheese'
 ' Oregano' ' Anchovies' ' Green Olives' ' Feta Cheese'
 ' Beef Chuck Roast' ' Prosciutto' ' Caramelized Onions' ' Pears' ' Thyme'
 ' Pancetta' ' Friggitello Peppers' ' Capocollo' ' Arugula'
 ' Luganega Sausage' ' Onions' ' Artichokes' ' Peperoncini verdi'
 ' Kalamata Olives' ' Provolone Cheese' ' Smoked Gouda Cheese'
 ' Romano Cheese' ' Blue Cheese' ' Gorgonzola Piccante Cheese'
 ' Parmigiano Reggiano Cheese' ' Zucchini' ' Sun-dried Tomatoes'
 ' Plum Tomatoes']


### Transform DataFrames

Create a dataframe of each unique ingredient to be used for our classifications.

In [91]:
ingredient_list = []
for i in range(len(pizza_types["ingredients"])):
    for n in range(len(pizza_types["ingredients"][i].split(','))):
        if n == 0: continue
        ingredient_list.append(pizza_types["ingredients"][i].split(',')[n].strip())

ingredients = pd.DataFrame(ingredient_list, columns=['ingredient'])["ingredient"].unique()

print(ingredients)

['Red Peppers' 'Green Peppers' 'Tomatoes' 'Red Onions' 'Barbecue Sauce'
 'Artichoke' 'Spinach' 'Garlic' 'Jalapeno Peppers' 'Fontina Cheese'
 'Gouda Cheese' 'Mushrooms' 'Asiago Cheese' 'Alfredo Sauce' 'Pesto Sauce'
 'Corn' 'Cilantro' 'Chipotle Sauce' 'Pineapple' 'Thai Sweet Chilli Sauce'
 'Pepperoni' 'Italian Sausage' 'Chorizo Sausage' 'Bacon'
 'Mozzarella Cheese' 'Goat Cheese' 'Oregano' 'Anchovies' 'Green Olives'
 'Feta Cheese' 'Beef Chuck Roast' 'Prosciutto' 'Caramelized Onions'
 'Pears' 'Thyme' 'Pancetta' 'Friggitello Peppers' 'Capocollo' 'Arugula'
 'Luganega Sausage' 'Onions' 'Artichokes' 'Peperoncini verdi'
 'Kalamata Olives' 'Provolone Cheese' 'Smoked Gouda Cheese'
 'Romano Cheese' 'Blue Cheese' 'Gorgonzola Piccante Cheese'
 'Parmigiano Reggiano Cheese' 'Zucchini' 'Sun-dried Tomatoes'
 'Plum Tomatoes']


Unnamed: 0,pizza_id,pizza_type_id,size,price
0,bbq_ckn_s,bbq_ckn,S,12.75
1,bbq_ckn_m,bbq_ckn,M,16.75
2,bbq_ckn_l,bbq_ckn,L,20.75
3,cali_ckn_s,cali_ckn,S,12.75
4,cali_ckn_m,cali_ckn,M,16.75
...,...,...,...,...
91,spinach_fet_m,spinach_fet,M,16.00
92,spinach_fet_l,spinach_fet,L,20.25
93,veggie_veg_s,veggie_veg,S,12.00
94,veggie_veg_m,veggie_veg,M,16.00


Create a DataFrame of pizza id's and all ingredients as columns to be used for learning. \
If a pizza contains one ingredient, we give it the number 1. if it does not contain an ingredient we assign it a 0.

### Transform
Transform the Data into a viable dataframe.

In [109]:
# print(order_details)
# print(pizza_types.columns)

pizza_list = pizza_types[['pizza_type_id', 'name']].copy()
pizza_list['sold_count']
# for item in pizza_types:
#     pizza_list.append(item)

# pizza_id, namn, igerediens
# sold_pizzas = df.groupby(order_details['pizza_id']).count()
# pizza_id + count_sold_pizza
# Normalize count_sold_pizza

print(pizza_list)
# print(sold_pizzas)

   pizza_type_id                                        name
0        bbq_ckn                  The Barbecue Chicken Pizza
1       cali_ckn                The California Chicken Pizza
2    ckn_alfredo                   The Chicken Alfredo Pizza
3      ckn_pesto                     The Chicken Pesto Pizza
4     southw_ckn                 The Southwest Chicken Pizza
5       thai_ckn                      The Thai Chicken Pizza
6       big_meat                          The Big Meat Pizza
7    classic_dlx                    The Classic Deluxe Pizza
8       hawaiian                          The Hawaiian Pizza
9    ital_cpcllo                 The Italian Capocollo Pizza
10    napolitana                        The Napolitana Pizza
11   pep_msh_pep  The Pepperoni, Mushroom, and Peppers Pizza
12     pepperoni                         The Pepperoni Pizza
13     the_greek                             The Greek Pizza
14    brie_carre                        The Brie Carre Pizza
15     calabrese        

In [None]:
# pizza_type_id, pizza_category_id

Tables I think we want for classification on label category: \
1. pizza_category - labels for our id's predicted by classification \
2. pizza_ingredients_categories(name?) - columns : pizza-id, pizza category, *features(one column for each ingredient, maybe more to predict on) \
3. pizza id by count(sold_pizza_id)
   1. helen 2. Andreas 3. Marcus
