# What's Cooking?
## W207 Final Project
## James Beck, Samir Datta, Chris Hipple  


![Kaggle link](https://www.kaggle.com/c/whats-cooking)

# Introduction


The goal of this competition is to correctly classfiy the cuisine of a recipe given its ingredients. There are 20 different cuisine types from around the world to classify.  It is hosted by the company Yummly, and all of the data provided is by them.  Yummly is a recipe aggregator website which delivers personalized recipe recommendations and searching.  The ability to more accurately classify recipes by their cuisine type would improve their product offering.


# The Data

## Description
All data is provided by Yummly, and per the rules of the competition, no extraneous data may be brought in for the model.

The data provided is a list of recipe ingredients with a label.  We do not have any metadata information such as the name of the recipe, where in the world it was submitted from, and all of the ingredients have been translated are in english. 

## Preperation

The first task was to read the json list into python and seperate into train and dev splits.  We explored several options for preprocessors and vectorizer options to further prepare the data before modelling.  Some basic cleaning, such as lowercasing the ingredients was included in the preprocessor.  Below we take an initial look at the data, first fitting an exploratory model to see where our challenges will lie, and then digging into the data to see what we can find which will help us to improve our model.

In [289]:
import pandas as pd
import json
import numpy as np
import re
from nltk import ngrams
from itertools import combinations

from sklearn.feature_extraction.text import *
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier

from sklearn.model_selection import GridSearchCV
from sklearn.decomposition import PCA

from sklearn.pipeline import Pipeline
from sklearn.pipeline import FeatureUnion

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.svm import LinearSVC

%matplotlib inline

import matplotlib.pyplot as plt
plt.style.use("ggplot")

In [290]:
cats = ['brazilian', 'british', 'cajun_creole', 'chinese', 'filipino', 'french', 'greek', 'indian', 'irish', 'italian', 'jamaican','japanese', 'korean', 'mexican', 'moroccan', 'russian', 'southern_us', 'spanish', 'thai', 'vietnamese']

with open('train.json') as data_file:    
    data = json.load(data_file)

X = []
y = []
for item in data:
    X.append(', '.join(item['ingredients']))
    y.append(item['cuisine'])    

with open('train.json') as data_file:    
    test_data = json.load(data_file)

X_test = []
ID_test = []
for item in test_data:
    X_test.append(', '.join(item['ingredients']))
    ID_test.append(item['id'])    



X_train, X_dev, y_train, y_dev = train_test_split(X, y, random_state=2)

In [291]:
print("Number of recipes in training data: "+str(len(X_train)))
print("Number of recipes in development data: "+str(len(X_dev)))
print("Number of recipes in test data: " + str(len(X_test)))

Number of recipes in training data: 29830
Number of recipes in development data: 9944
Number of recipes in test data: 39774


# First attempt

Our first attempt to classify the recipes was to use the "bag of words" approach. We used the count vectorizer to create a sparse matrix of every word in the recipes, and fit a logistic regression model on the training data. We used this model to predict the development data.

In [292]:
cv = CountVectorizer()
tf_X_train = cv.fit_transform(X_train)
tf_X_dev = cv.transform(X_dev)

lr = LogisticRegression()
lr.fit(tf_X_train, y_train)
predictions = lr.predict(tf_X_dev)

print(classification_report(y_dev, predictions))
print("f1=score: "+str(metrics.f1_score(y_dev, predictions, average='weighted')))
print("Accuracy: "+str(np.mean(predictions==y_dev)))

              precision    recall  f1-score   support

   brazilian       0.76      0.55      0.63       121
     british       0.59      0.33      0.43       206
cajun_creole       0.78      0.70      0.74       376
     chinese       0.79      0.85      0.82       670
    filipino       0.71      0.54      0.61       190
      french       0.59      0.63      0.61       636
       greek       0.76      0.70      0.73       258
      indian       0.85      0.89      0.87       758
       irish       0.67      0.47      0.55       175
     italian       0.80      0.90      0.85      1963
    jamaican       0.81      0.70      0.75       123
    japanese       0.82      0.69      0.75       342
      korean       0.84      0.74      0.79       221
     mexican       0.91      0.92      0.91      1668
    moroccan       0.81      0.78      0.80       215
     russian       0.66      0.40      0.50       133
 southern_us       0.69      0.77      0.73      1056
     spanish       0.63    

Our simple approach to classification gave us an f1-score of .775 and an overall accuracy of .781.

One of the things that really sticks out for us in this first model is the very low recall score of British recipes.  Other cuisines also had low recall, as well.  

# Exploratory Data Analysis

In [293]:
print("Number of unique ingredients: "+str(len(cv.vocabulary_)))

Number of unique ingredients: 2849


In [294]:
print("Most important ingredients for each cuisine:\n")

largestWeightedWords = []
largestWeightedIndeces = []
cv_featurenames = cv.get_feature_names()

for cat in range(20):
    print(cats[cat])
    weightIndeces = np.argsort(abs(lr.coef_[cat]))[-5:]
    for index in weightIndeces:
        weight = lr.coef_[cat][index]
        
        print(cv_featurenames[index] + " " + str(weight))
    print('\n')

Most important ingredients for each cuisine:

brazilian
curry -2.29897910894
açai 2.42260280202
tapioca 2.6178603655
manioc 3.0491254874
cachaca 5.66428108383


british
worcestershire 2.2725271567
marmite 2.67150973274
mincemeat 2.70638328962
haddock 2.72889219389
stilton 4.81105108921


cajun_creole
mortadella 1.81332463325
jambalaya 1.81889115661
salami 1.99265353518
creole 3.20484077806
cajun 3.68107988305


chinese
mein 2.1481661817
kimchi -2.33025305444
mirin -2.65507624642
mandarin 2.771649239
szechwan 2.81858167377


filipino
dogs 2.34400654714
basil -2.3530993541
glutinous 2.35886364363
lumpia 2.83547473376
calamansi 3.45365349802


french
swiss 2.37624791163
niçoise 2.42009817368
crepes 2.47686289472
gruyère 2.48861302158
gruyere 2.86763157265


greek
tahini 2.69165878264
ouzo 2.84016869992
phyllo 3.14344962748
greek 3.33915676849
feta 4.26986694287


indian
cardamom 2.38706755008
yoghurt 2.45940754006
masala 2.5117701036
curry 2.91925573564
tandoori 3.78903276959


irish
stou

In [295]:
#Create list of most common ingedients based off simple text parser
ingredient_freq = []
for featurename in cv_featurenames:
    i = 0
    for recipe in X_train:
        if featurename in recipe:
            i +=1
    ingredient_freq.append((featurename, i))

In [296]:
ingredients_sorted_by_freq = sorted(ingredient_freq, key=lambda tup: tup[1], reverse=True)
ingredients_sorted_by_freq[0:50]

[(u'in', 22462),
 (u'on', 22015),
 (u'ic', 21720),
 (u'la', 19673),
 (u'ro', 18848),
 (u'salt', 18421),
 (u'an', 18202),
 (u'oil', 15985),
 (u'lo', 15483),
 (u'pepper', 15270),
 (u'garlic', 13600),
 (u'st', 13226),
 (u'onion', 13159),
 (u'or', 13064),
 (u'to', 12805),
 (u'mi', 11928),
 (u'ice', 11366),
 (u'el', 10789),
 (u'fresh', 10203),
 (u'round', 9822),
 (u'ground', 9763),
 (u'de', 9648),
 (u'au', 8960),
 (u'red', 8875),
 (u'onions', 8762),
 (u'oliv', 8417),
 (u'olive', 8416),
 (u'sugar', 8412),
 (u'mo', 8377),
 (u'sauc', 7699),
 (u'sauce', 7671),
 (u'it', 7660),
 (u'black', 7614),
 (u'tom', 7491),
 (u'tomato', 7294),
 (u'water', 7135),
 (u'chee', 6967),
 (u'chees', 6963),
 (u'chicken', 6942),
 (u'butt', 6920),
 (u'cheese', 6890),
 (u'egg', 6890),
 (u'butter', 6739),
 (u'no', 6688),
 (u'all', 6557),
 (u'cho', 6523),
 (u'flour', 6238),
 (u'tomatoes', 6197),
 (u'gin', 6077),
 (u'green', 6069)]

# Bigrams

Our next step was to have the vectorizer detect word pairs in addition to single words. 

In [297]:
cv_bigrams = CountVectorizer(ngram_range=(1,2))
tf_X_train_bigrams = cv_bigrams.fit_transform(X_train)
tf_X_dev_bigrams = cv_bigrams.transform(X_dev)

lr_bigrams = LogisticRegression()
lr_bigrams.fit(tf_X_train_bigrams, y_train)
predictions = lr_bigrams.predict(tf_X_dev_bigrams)

print("f1=score: "+str(metrics.f1_score(y_dev, predictions, average='weighted')))
print("Accuracy: "+str(np.mean(predictions==y_dev)))

f1=score: 0.77820171114
Accuracy: 0.78268302494


In [298]:
largestWeightedWords = []
largestWeightedIndeces = []
cv_bigram_featurenames = cv_bigrams.get_feature_names()


for cat in range(20):
    print(cats[cat])
    weightIndeces = np.argsort(abs(lr_bigrams.coef_[cat]))[-5:]
    for index in weightIndeces:
        weight = lr_bigrams.coef_[cat][index]
        
        print(cv_bigram_featurenames[index] + " " + str(weight))
    print('\n')

tilapia 1.38695353957
mirin -1.42973752698
basil -1.49832955074
calamansi 1.94421775293
lumpia 1.97018948175


french
snails 1.60193180004
grits -1.75813082953
duck 1.76790366795
pasta -1.77047745836
cognac 1.9911612899


greek
phyllo 2.25896983829
tahini 2.27594095227
feta cheese 2.56310462361
feta 2.63277686117
greek 2.91301982708


indian
curds 1.95553711706
tandoori 2.20226763025
masala 2.2340892319
curry 2.41600142299
yoghurt 2.55693242774


irish
brisket 1.62398881977
corned 1.64929876563
corned beef 1.64929876563
potatoes 2.08761701821
irish 3.42817575402


italian
gnocchi 2.38422502236
grits -2.53943645271
mascarpone 2.58577356428
spaghetti 2.91964769332
polenta 3.37721102056


jamaican
nutmeg 1.74171922718
rum 1.9900167825
allspice 2.84641407624
thyme 2.92833081638
jerk 3.95608925945


japanese
nori 2.29715673956
dashi 2.5207479706
sake 3.25246432334
mirin 3.28183409158
miso 3.53743722457


korean
eggs carrots 1.61012297257
gochujang base 1.8762577025
gochujang 1.8762577025
pi


kimchi -1.6236245228
mandarin 1.80026549182
sake -1.89346463843
mirin -2.62748693404


filipino


brazilian
manioc flour 1.81681702844
manioc 1.81681702844
black beans 1.89634732893
tapioca flour 1.94342987801
cachaca 4.42659009273


british
stilton cheese 1.58576911268
jam 1.60918448581
mincemeat 1.70117801594
marmite 2.14242286181
stilton 3.28159490129


cajun_creole
oil powdered 1.39533660205
powder dried 1.50187147299
cajun seasoning 1.67941291376
creole 2.32921401049
cajun 2.38734973434


chinese
szechwan peppercorns 1.58492262895

We notice that in our bigram model, many of the most important features for each cuisine are bigrams.  Even though the accuracy of our model was only slightly impacted, this tells us that including the extra bigram features is useful, and we should continue to move foward.  

## Custom tokenizer and preprocessor

Our next step was to attempt to build a bustom tokenizer and preprocessor to remove some noise and keep only the most important and informative features.

In [299]:
#Our custom preprocessor removes features that are uninformative - 
#like numbers, unnecessary spaces, and words that are two characters long or less.
def custom_preprocessor(ingredients):
    result = []
    for ingredient in ingredients.split(', '):
        temp = ingredient.lower()
        
        #remove numbers
        temp = re.sub(r'\d+|&', '', temp)
        #remove unnecessary spaces
        temp = re.sub(r' +', ' ', temp)
        #remove any words that are two characters or less
        temp = ' '.join(word for word in temp.split() if len(word)>2)
        
        result.append("".join(temp))
    
    return ", ".join(result)

#our custom_tokenizer retuns every combination of every word
#in the ingredinet list.
#this increases the number of features by a lot, but also improves
#accuracy.
#
#the logic behind this tokenizer is that two ingreidents may not be
#informative on their own, but if seen together they may help to predict
#a certain cuisine.


def custom_tokenizer(string):
    result = []
    
    #overall note: the point of sorting the ingredients before adding
    #them to the list is to prevent duplicates that are just flipped
    #like "unsalted butter" and "butter unsalted"
    
    #create an empty list where we're going to put the ngrams
    #where n = 1 so we can later create combinations of those
    single_grams = []
    
    
    for ingredient in string.split(', '):
        for n in range(1,len(ingredient.split())+1):
            grams = ngrams(ingredient.split(' '), n)
            for gram in grams:
                #if the length of the ngram we're looking at is 1,
                #add it to our single grams list.
                if n == 1:
                    single_grams.append(gram[0])
                result.append(" ".join(sorted(list(gram))))
    
    #finally add every combination of the n = 1 ngrams
    #so from ['unsalted butter', 'baking powder']
    #we should be adding: 'butter unsalted', 'baking powder',
    #'baking butter', 'baking unsalted', 'butter powder', 'powder unsalted'
    for combo in combinations(single_grams, 2):
        result.append(' '.join(sorted(list(combo))))
    
    #return the unique elements of this list
    #since there will be plenty of duplicates
    return list(set(result))

In [300]:
model = LogisticRegression(penalty="l2")
vectorizer = CountVectorizer(preprocessor = custom_preprocessor,
                             tokenizer = custom_tokenizer,
                             ngram_range = (0,2))
                             


pipe = Pipeline([("vectorize", vectorizer), ("model", model)])
pipe.fit(X_train, y_train)
preds = pipe.predict(X_dev)

print(metrics.f1_score(y_dev, preds, average='weighted'))
print(pipe.score(X_dev, y_dev))
print(classification_report(y_dev, preds))

0.800080450523
              precision    recall  f1-score   support

   brazilian       0.86      0.55      0.67       121
     british       0.67      0.42      0.52       206
cajun_creole       0.81      0.73      0.77       376
     chinese       0.83      0.87      0.85       670
    filipino       0.73      0.62      0.67       190
      french       0.64      0.66      0.65       636
       greek       0.80      0.69      0.74       258
      indian       0.86      0.91      0.89       758
       irish       0.72      0.49      0.58       175
     italian       0.80      0.91      0.85      1963
    jamaican       0.92      0.65      0.76       123
    japanese       0.83      0.71      0.77       342
      korean       0.87      0.75      0.80       221
     mexican       0.88      0.94      0.91      1668
    moroccan       0.87      0.79      0.82       215
     russian       0.75      0.38      0.50       133
 southern_us       0.70      0.80      0.75      1056
     spanish

0.794368143097


Again, with this improved model, we see the low recall for the British.  This is likely one of our large sources of issue in the performance of our model, and we'll see to address it.  Below we take a look at the confusion matrix to see where our British results are getting mixed up, and other patterns of missclassification which may emerge.

In [301]:
cm = confusion_matrix(y_dev, preds)
cmdf = pd.DataFrame(cm, index = cats, columns = cats)
print cmdf

              brazilian  british  cajun_creole  chinese  filipino  french  \
brazilian            67        0             2        1         2       3   
british               0       87             2        0         0      30   
cajun_creole          1        2           274        1         0       7   
chinese               0        1             2      583         8       2   
filipino              3        1             0       18       118       2   
french                0        7             9        2         1     420   
greek                 0        1             0        0         1       7   
indian                0        1             0        2         1       5   
irish                 2       15             0        1         0      12   
italian               0        2             3        4         2      60   
jamaican              1        1             2        1         2       0   
japanese              2        0             0       28         1      10   

From above in our confusion matrix, we also noticed that the british and irish often confused with eachother.  The other big take aways are that French and Southern_US cuisines are often misclassified as many other things.  We also noticed that in the classification report, these two lower support.  Hopefully by combining them, then classifying from the subgroup, it will improve our ability to predict both of them.

First we explore the British and Irish confusions to see if we can fit a model that can accurately differenitate between the two, if only those two are in the pool.

In [302]:
BI_indexes_train = [True if y == u"british" or y == u"irish" else False for y in y_train]
BI_indexes_train = pd.Series(BI_indexes_train)

BI_indexes_dev = [True if y == u"british" or y == u"irish" else False for y in y_dev]
BI_indexes_dev = pd.Series(BI_indexes_dev)

y_train_BI = pd.Series(y_train)[BI_indexes_train]
X_train_BI = pd.Series(X_train)[BI_indexes_train]

y_dev_BI = pd.Series(y_dev)[BI_indexes_dev]
X_dev_BI = pd.Series(X_dev)[BI_indexes_dev]


model = LogisticRegression(penalty = "l2", C = 1)

vectorizer = CountVectorizer(preprocessor = custom_preprocessor,
                             tokenizer = custom_tokenizer)

pipe_BI = Pipeline([("vectorize", vectorizer),
                  #("to_dense", DenseTransformer()),
                  ("model", model)])


gridsearch = False
if gridsearch is True:
    parameters = {"model__penalty": ["l1", "l2"],
              "model__C": np.linspace(0.25, 1.5, 10)}

    model_BI = GridSearchCV(pipe_BI, parameters)
    model_BI.fit(X_train_BI, y_train_BI)
    print model_BI.best_params_
else:
    model_BI = pipe_BI
    model_BI.fit(X_train_BI, y_train_BI)

preds_BI = model_BI.predict(X_dev_BI)

print(metrics.f1_score(y_dev_BI, preds_BI, average='weighted'))
print(GS.score(X_dev_BI, y_dev_BI))
print(classification_report(y_dev_BI, preds_BI))

0.782152230971
             precision    recall  f1-score   support

    british       0.80      0.83      0.82       206
      irish       0.80      0.76      0.78       175

avg / total       0.80      0.80      0.80       381



0.80009165521


The high f1 and scores here show that this approach has some promise.  To integrate this idea into a large model, we developed a script which allowed us to systematically create subgroups of classifications in an ad-hoc manner and see which subgroups improved the overall performance of the model.

In [308]:
def make_group_labels(label, group_name, group_cuisines):
    if label in group_cuisines:
        return group_name
    else:
        return label

# For example
BI_group = {"cuisines": [u"british", u"irish"], #which cuisines are included
            "group_name": u"BI", # what sub-group this called by base model
            "model": None} # The model which predicts the final cuisines


def train_ensemble_model(base_model, splits, X_train, y_train, X_dev, y_dev):
    """
    This function accepts a list of dictionaries which define which supgroup splits to make, splits the training
    data into those groups, and then makes sub-classifier models for each of them.  It then predicts against the
    dev data to return an overall score.
    """

    # First we must relabel the data to account for all of the splits
    y_train_grouped = y_train
    y_dev_grouped = y_dev
    for split in splits:
        y_train_grouped= pd.Series(y_train_grouped).map(lambda r:make_group_labels(r,
                                                                                   split["group_name"],
                                                                                   split["cuisines"]))

        y_dev_grouped= pd.Series(y_dev_grouped).map(lambda r:make_group_labels(r,
                                                                               split["group_name"],
                                                                               split["cuisines"]))

    # Next we train the base_model
    print "Fitting Base Model to classify into groups."
    base_model.fit(X_train, y_train_grouped)

    # Train the sub models for each group
    for split in splits:
        print("Training split model:", split["group_name"], "for cuisines:", split["cuisines"])
        # slice the data
        train_split_indicies = pd.Series([True if y == split["group_name"] else False for y in y_train_grouped])
        X_train_split = pd.Series(X_train)[train_split_indicies]
        y_train_split = pd.Series(y_train)[train_split_indicies]

        dev_split_indicies = pd.Series([True if y == split["group_name"] else False for y in y_dev_grouped])
        X_dev_split = pd.Series(X_dev)[dev_split_indicies]
        y_dev_split = pd.Series(y_dev)[dev_split_indicies]

        if split["model"] is None:
            vectorizer = CountVectorizer(preprocessor = custom_preprocessor,
                             tokenizer = custom_tokenizer)

            classifier = LogisticRegression(penalty = "l2")
            split_model = Pipeline([("vectorize", vectorizer),
                                    ("model", classifier)])
        else:
           split_model = split["model"]

        split_model.fit(X_train_split, y_train_split)
        split_predictions = split_model.predict(X_dev_split)
        split["model"] = split_model

        split["classification_report"] = classification_report(y_dev_split, split_predictions)

    print "Creating final predictions."
    preds = base_model.predict(X_dev)
    for split in splits:
        split_prediction_indicies = [True if y == split["group_name"] else False for y in preds]
        fill_in_predictions = split["model"].predict(pd.Series(X_dev)[split_prediction_indicies])
        preds = pd.Series(preds)
        preds[split_prediction_indicies] = fill_in_predictions

    print(metrics.f1_score(y_dev, preds, average="weighted"))
    print(classification_report(y_dev, preds))
    return base_model, splits, preds

Having defined the model, next we must apply it for splitting the British and Irish.

In [309]:
# Apply 
model = LogisticRegression(penalty = "l2", C = 1)

vectorizer = CountVectorizer(preprocessor = custom_preprocessor,
                             tokenizer = custom_tokenizer)
base_model = Pipeline([("vectorize", vectorizer),
                       ("model", model)])

BI_group = {"cuisines": [u"british", u"irish"], #which cuisines are included
            "group_name": u"BI", # what sub-group this called by base model
            "model": None} # The model which predicts the final cuisines

splits = [BI_group]

base_model, splits, preds = train_ensemble_model(base_model, splits, X_train, y_train, X_dev, y_dev)

for split in splits:
    print split["classification_report"]
    print "\n"



0.800207994669
              precision    recall  f1-score   support

   brazilian       0.83      0.61      0.70       121
     british       0.57      0.54      0.56       206
cajun_creole       0.79      0.74      0.76       376
     chinese       0.85      0.85      0.85       670
    filipino       0.73      0.65      0.69       190
      french       0.64      0.66      0.65       636
       greek       0.79      0.72      0.75       258
      indian       0.87      0.91      0.89       758
       irish       0.61      0.57      0.59       175
     italian       0.81      0.90      0.85      1963
    jamaican       0.88      0.68      0.77       123
    japanese       0.82      0.74      0.78       342
      korean       0.86      0.77      0.82       221
     mexican       0.91      0.93      0.92      1668
    moroccan       0.84      0.78      0.81       215
     russian       0.75      0.44      0.55       133
 southern_us       0.74      0.77      0.76      1056
     spanish

Creating final predictions.


('Training split model:', u'BI', 'for cuisines:', [u'british', u'irish'])


Fitting Base Model to classify into groups.


This model improves the accuracy and score on the dev data to above 0.8, which is a large milestone.

Due to the special nature of our model, we had to define another function for creating the predictions.

In [310]:
def predict_ensemble_model(base_model, splits, data):
    preds = base_model.predict(data)
    for split in splits:
        split_prediction_indexes = [True if y == split['group_name'] else False for y in preds]
        fill_in_split = split['model'].predict(pd.Series(data)[split_prediction_indexes])

        # Reconnect the predictions for the final model.
        preds= pd.Series(preds)
        preds[split_prediction_indexes] = fill_in_split
    return preds

dev_preds = predict_ensemble_model(base_model, [BI_group], X_dev)


print(metrics.f1_score(y_dev, dev_preds, average='weighted'))
print(classification_report(y_dev, dev_preds))


0.800207994669
              precision    recall  f1-score   support

   brazilian       0.83      0.61      0.70       121
     british       0.57      0.54      0.56       206
cajun_creole       0.79      0.74      0.76       376
     chinese       0.85      0.85      0.85       670
    filipino       0.73      0.65      0.69       190
      french       0.64      0.66      0.65       636
       greek       0.79      0.72      0.75       258
      indian       0.87      0.91      0.89       758
       irish       0.61      0.57      0.59       175
     italian       0.81      0.90      0.85      1963
    jamaican       0.88      0.68      0.77       123
    japanese       0.82      0.74      0.78       342
      korean       0.86      0.77      0.82       221
     mexican       0.91      0.93      0.92      1668
    moroccan       0.84      0.78      0.81       215
     russian       0.75      0.44      0.55       133
 southern_us       0.74      0.77      0.76      1056
     spanish

# Evaluation

Overal, an accuracy of about 80% is quite good and we are proud of this score.  Throughout the process we looked at what 

The last step required is to use our final model to predict against the test data and submit it.

In [311]:
test_predictions = predict_ensemble_model(base_model, [BI_group], X_test)
print len(test_predictions)
submission = pd.DataFrame({"id":ID_test, "cuisine":test_predictions})
cols = ["id", "cuisine"]
submission[cols].to_csv("submission.csv", index = False)

39774


# Conclusion

Multiclass prediction problems are very difficult.  In this report, we made many improvements to our model by following a structured processing and trying many things.  We still realize that there is room for improvement, particularly in the realm of feature engineering.  More groups can easily be defined with our model, but we did not find any which improved the dev set accuracy beyond only using the British and Irish grouping.  Our best model ranked us to where we would have been tied for 213th out of 1388, which is just about at the 85th percentile.  