# What's Cooking

This code is for the [What's Cooking](https://www.kaggle.com/c/whats-cooking) Kaggle competion. Currently it is just a very simple naive bayes prediction based on the ingredients.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cross_validation import StratifiedKFold
from sklearn.metrics import roc_auc_score

np.random.seed(1337)

from matplotlib import style
style.use('ggplot')
%matplotlib inline

In [2]:
train = pd.read_json("./data/train.json")

In [3]:
train["ingredients"] = train.ingredients.apply(lambda x: " ".join(x))

In [4]:
train.head()

Unnamed: 0,cuisine,id,ingredients
0,greek,10259,romaine lettuce black olives grape tomatoes ga...
1,southern_us,25693,plain flour ground pepper salt tomatoes ground...
2,filipino,20130,eggs pepper salt mayonaise cooking oil green c...
3,indian,22213,water vegetable oil wheat salt
4,indian,13162,black pepper shallots cornflour cayenne pepper...


In [5]:
cv = CountVectorizer().fit(train.ingredients)

In [6]:
X = cv.transform(train.ingredients)

In [7]:
y = train.cuisine

## Test out the model

In [8]:
skf = StratifiedKFold(y, n_folds=5)

In [9]:
for train_index, test_index in skf:
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf = MultinomialNB().fit(X_train, y_train)
    print clf.score(X_test, y_test)

0.720798794273
0.731465192259
0.719582704877
0.720286756383
0.726060148484


Not great, but it _is_ a pretty simple model!

# Make submission file

In [10]:
clf = MultinomialNB().fit(X, y)

In [11]:
test = pd.read_json("./data/test.json")
test["ingredients"] = test.ingredients.apply(lambda x: " ".join(x))

In [12]:
test.head()

Unnamed: 0,id,ingredients
0,18009,baking powder eggs all-purpose flour raisins m...
1,28583,sugar egg yolks corn starch cream of tartar ba...
2,41580,sausage links fennel bulb fronds olive oil cub...
3,29752,meat cuts file powder smoked sausage okra shri...
4,35687,ground black pepper salt sausage casings leeks...


In [13]:
X_test = cv.transform(test.ingredients)
test["cuisine"] = clf.predict(X_test)
submit = test[["id", "cuisine"]]

In [14]:
submit.head()

Unnamed: 0,id,cuisine
0,18009,southern_us
1,28583,southern_us
2,41580,spanish
3,29752,cajun_creole
4,35687,italian


In [15]:
submit.to_csv("submit.csv", index=False)