# ML Mini-Project

1. Import modules
2. Load the data
3. Transform Data
4. Divide data into training/testing set
5. Create empty model
6. Fit/train the model
7. Evaluate the model

### Problem formulation: 
We want to be able to predict pizza type based on the ingredients it contains \
We may also want to be able to predict a pizza price based on ingredients and possibly some other metric

In [1]:
# Importing necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os

from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.model_selection import train_test_split

### Load Files

In [2]:
# Create path
DATA_PATH = f'{os.path.abspath("")}/Data files'

# Load data from files
data_dictionary = pd.read_csv(f'{DATA_PATH}/data_dictionary.csv')
order_details = pd.read_csv(f'{DATA_PATH}/pizza_sales/order_details.csv')
orders = pd.read_csv(f'{DATA_PATH}/pizza_sales/orders.csv')
pizza_types = pd.read_csv(f'{DATA_PATH}/pizza_sales/pizza_types.csv')
pizzas = pd.read_csv(f'{DATA_PATH}/pizza_sales/pizzas.csv')

### Transform DataFrames

Create a dataframe of each unique ingredient to be used for our classifications.

In [3]:
ingredient_list = []
for i in range(len(pizza_types["ingredients"])):
    for n in range(len(pizza_types["ingredients"][i].split(','))):
        if n == 0: continue
        ingredient_list.append(pizza_types["ingredients"][i].split(',')[n].strip())

ingredients = pd.DataFrame(ingredient_list, columns=['ingredient'])["ingredient"].unique()

In [4]:
pizza_ingredient = pizza_types
pizza_ingredient[ingredients] = 0

for i, ingredients in enumerate(pizza_ingredient['ingredients']):
    for ingredient in ingredients:
        if ingredient in pizza_ingredient.columns:
            pizza_ingredient.loc[i, ingredient] = 1


Create a DataFrame of pizza id's and all ingredients as columns to be used for learning. \
If a pizza contains one ingredient, we give it the number 1. if it does not contain an ingredient we assign it a 0.

### Transform
Transform the Data into a viable dataframe.

In [5]:
# Get total count of sold pizza per pizza_type_id, name, and category

# Copying wanted columns into a new DataFrame
pizza_sold_df = pizza_types[['pizza_type_id', 'name', 'category']].copy()

# Create a temporary DataFrame in order to remove sizes from pizza_id,
# and to count number of sold pizzas grouped by pizza_id.
# Adding counted solz pizzas to pizza_sold_df
sold_pizzas = order_details[['pizza_id', 'quantity']].copy()
sold_pizzas['pizza_id'] = sold_pizzas['pizza_id'].str.replace(r"(_s$)|(_m$)|(_l$)|(_xl$)|(_xxl$)|", "", regex=True)
pizza_sold_df['quantity_sold'] = sold_pizzas.groupby(['pizza_id'])['quantity'].transform('count')

# Checking DataFrame and controlling contents
pizza_sold_df

Unnamed: 0,pizza_type_id,name,category,quantity_sold
0,bbq_ckn,The Barbecue Chicken Pizza,Chicken,2370
1,cali_ckn,The California Chicken Pizza,Chicken,2416
2,ckn_alfredo,The Chicken Alfredo Pizza,Chicken,1359
3,ckn_pesto,The Chicken Pesto Pizza,Chicken,1849
4,southw_ckn,The Southwest Chicken Pizza,Chicken,1456
5,thai_ckn,The Thai Chicken Pizza,Chicken,2315
6,big_meat,The Big Meat Pizza,Classic,1849
7,classic_dlx,The Classic Deluxe Pizza,Classic,1428
8,hawaiian,The Hawaiian Pizza,Classic,1849
9,ital_cpcllo,The Italian Capocollo Pizza,Classic,1849


In [22]:
# Creating a DataFrame holding the categories and their ingredients per pizza
cat_ingred_df = pizza_ingredient.drop(["pizza_type_id", "ingredients"], axis=1)


# TODO Check this
# Alternative soloution 1
# cat = pd.Series(cat_ingred_df['category'].unique()).map(lambda x: "Other" if x == 1 else x)
# cat = pd.DataFrame(cat_ingred_df, columns=['category'])

# catdict = {cat["category"].values[i]:range(0,len(cat))[i] for i in range(len(cat["category"].values))}
# cat_ingred_df["category"] = cat_ingred_df["category"].map(lambda x: catdict[x])

# i = 0
# for key in catdict.keys():
#     catdict[key] = i
#     i +=1


# Alternative soloution 2

# cat_ingred_df.loc[cat_ingred_df["category"] == 'Chicken', 'category'] = 0
# cat_ingred_df.loc[cat_ingred_df["category"] == 'Classic', 'category'] = 1
# cat_ingred_df.loc[cat_ingred_df['category'] == 'Supreme', 'category'] = 2
# cat_ingred_df.loc[cat_ingred_df['category'] == 'Veggie', 'category'] = 3

# Alternative solution 3
cat_list = list()
for category in cat_ingred_df['category']:
    if category not in cat_list:
        cat_list.append(category)

for category in cat_list:
    cat_ingred_df=cat_ingred_df.replace(category, cat_list.index(category))

# Checking DataFrame and controlling contents
cat_ingred_df

['Chicken', 'Classic', 'Supreme', 'Veggie']


Unnamed: 0,name,category,Red Peppers,Green Peppers,Tomatoes,Red Onions,Barbecue Sauce,Artichoke,Spinach,Garlic,...,Kalamata Olives,Provolone Cheese,Smoked Gouda Cheese,Romano Cheese,Blue Cheese,Gorgonzola Piccante Cheese,Parmigiano Reggiano Cheese,Zucchini,Sun-dried Tomatoes,Plum Tomatoes
0,The Barbecue Chicken Pizza,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,The California Chicken Pizza,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Chicken Alfredo Pizza,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Chicken Pesto Pizza,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Southwest Chicken Pizza,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,The Thai Chicken Pizza,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,The Big Meat Pizza,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,The Classic Deluxe Pizza,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,The Hawaiian Pizza,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,The Italian Capocollo Pizza,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [25]:
print(cat_ingred_df.columns)

Index(['name', 'category', 'Red Peppers', 'Green Peppers', 'Tomatoes',
       'Red Onions', 'Barbecue Sauce', 'Artichoke', 'Spinach', 'Garlic',
       'Jalapeno Peppers', 'Fontina Cheese', 'Gouda Cheese', 'Mushrooms',
       'Asiago Cheese', 'Alfredo Sauce', 'Pesto Sauce', 'Corn', 'Cilantro',
       'Chipotle Sauce', 'Pineapple', 'Thai Sweet Chilli Sauce', 'Pepperoni',
       'Italian Sausage', 'Chorizo Sausage', 'Bacon', 'Mozzarella Cheese',
       'Goat Cheese', 'Oregano', 'Anchovies', 'Green Olives', 'Feta Cheese',
       'Beef Chuck Roast', 'Prosciutto', 'Caramelized Onions', 'Pears',
       'Thyme', 'Pancetta', 'Friggitello Peppers', 'Capocollo', 'Arugula',
       'Luganega Sausage', 'Onions', 'Artichokes', 'Peperoncini verdi',
       'Kalamata Olives', 'Provolone Cheese', 'Smoked Gouda Cheese',
       'Romano Cheese', 'Blue Cheese', 'Gorgonzola Piccante Cheese',
       'Parmigiano Reggiano Cheese', 'Zucchini', 'Sun-dried Tomatoes',
       'Plum Tomatoes'],
      dtype='object')


#### 4. Divide data into training/testing set

In [31]:
X = cat_ingred_df # Skriver in alla rader ifrån instruktionen tidigare i markdown
y = X[X.columns.difference(['name', 'category'])]
# TODO Kolla in här
#y = X.pop("name", "category") # Marcus lade till name /Drygt

X_train, X_test, y_train, y_test = train_test_split(X,y)
y_test

Unnamed: 0,Alfredo Sauce,Anchovies,Artichoke,Artichokes,Arugula,Asiago Cheese,Bacon,Barbecue Sauce,Beef Chuck Roast,Blue Cheese,...,Red Onions,Red Peppers,Romano Cheese,Smoked Gouda Cheese,Spinach,Sun-dried Tomatoes,Thai Sweet Chilli Sauce,Thyme,Tomatoes,Zucchini
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
18,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
27,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### 5. Create empty model

#### 6. Fit/train the model

Tables I think we want for classification on label category: \
1. pizza_category - labels for our id's predicted by classification \
2. pizza_ingredients_categories(name?) - columns : pizza-id, pizza category, *features(one column for each ingredient, maybe more to predict on) \
3. pizza id by count(sold_pizza_id)
   1. helen 2. Andreas 3. Marcus


#### 7. Evaluate the model

In [32]:
def transform_data(ingredients):
    df = ingredient.copy()
    print(df)
transform_data('')

AttributeError: 'str' object has no attribute 'copy'