# What's Cooking

![image](Kaggle_Project.PNG)

## Proje adımları

**Bu projede tarifleri verilen yemeklerin hangi ülkenin yemeği olduğunu tahmin eden modeli oluşturup bunu arayüze sahip bir programa yerleştirdim.**

**Veri kümesinde tarifin ID'si , yerel mutfağı, ve tariflerin olduğu sütünlar var, bü sütunlardan tariflerin olduğu sütuna doğal dil işleme methodlarını kullanacağız ve işleme alacağız.**

**Daha sonrasında ise bunu bir uygulamaya yerleştirip görsellik katacağız**

## Kurulum

In [51]:
import pandas as pd

In [52]:
df = pd.read_json(open("train.json", "r", encoding="utf8"))

In [53]:
df['ingredients'][12]

['Italian parsley leaves',
 'walnuts',
 'hot red pepper flakes',
 'extra-virgin olive oil',
 'fresh lemon juice',
 'trout fillet',
 'garlic cloves',
 'chipotle chile',
 'fine sea salt',
 'flat leaf parsley']

In [54]:
df['ingredients'].head()

0    [romaine lettuce, black olives, grape tomatoes...
1    [plain flour, ground pepper, salt, tomatoes, g...
2    [eggs, pepper, salt, mayonaise, cooking oil, g...
3                  [water, vegetable oil, wheat, salt]
4    [black pepper, shallots, cornflour, cayenne pe...
Name: ingredients, dtype: object

**Burada ise tariflerde kaç adet farklı içerik olduğunu bulmuş olduk**

In [55]:
setim = set() # Burada set kullandım çünkü listede eğer aynı veriden varsa bunlardan tekini alacak ve bize 
              # farklı olanları verecek

for i in df['ingredients']: 
    for j in i:
        setim.add(j)
len(setim)        

6714

**6714 farklı malzeme olduğunu görmüş olduk**

In [56]:
setim

{'Bertolli® Arrabbiata Sauce',
 'mole poblano',
 'poultry seasoning',
 'chinese winter melon',
 'whole allspice',
 'swiss chard',
 'dried chives',
 'empanada wrappers',
 'thick-cut bacon',
 'Herdez Salsa Casera',
 'won ton wrappers',
 'taleggio',
 'ammonium bicarbonate',
 'crab sticks',
 'bone-in short ribs',
 'sourdough bread',
 'quinoa',
 'sausage meat',
 'Elmlea single',
 'wasabi',
 'green cardamom',
 'icing mix',
 'sunflower oil',
 'salted fish',
 'boneless skinless turkey breasts',
 'fresh ham',
 'compressed yeast',
 'marzipan',
 'ricotta',
 'apple puree',
 'veal shoulder',
 'mexicorn',
 'seasoned flour',
 'raw cane sugar',
 'pinhead oatmeal',
 'jambon de bayonne',
 'frozen chopped spinach',
 'cardamom pods',
 'vegetarian protein crumbles',
 'queso blanco',
 'Knorr® Vegetable recipe mix',
 'dried guajillo chiles',
 'extra',
 'rustic rub',
 'codfish',
 'new mexico chile pods',
 'canned jalapeno peppers',
 'jam',
 'fresh lime',
 'basil olive oil',
 'chicken wing drummettes',
 'fresh

## Veri İşlemleri

In [57]:
def clearingandconverting(text):
    
      
    text =" ".join(text)  # Virgüllerle ayrılmış listeyi join methodu ile bir cümle haline getirdim
    
    text=text.lower()                    # Buradan sonraki 4 satırd ise NLP methodlarını uygulayabilmek adına
                                         # bütün veriyi küçük harflere çevirdik ve içlerinden numerik 
                                         # verileri ve de sembolleri attık
    text=text.replace("[^\w\s]","") 
    text=text.replace("\d+","") 
    text=text.replace("\n"," ").replace("\r","") 
    
    return text

In [58]:
df['ingredients'] = df['ingredients'].apply(clearingandconverting)

In [59]:
df['ingredients'].head(10)

0    romaine lettuce black olives grape tomatoes ga...
1    plain flour ground pepper salt tomatoes ground...
2    eggs pepper salt mayonaise cooking oil green c...
3                       water vegetable oil wheat salt
4    black pepper shallots cornflour cayenne pepper...
5    plain flour sugar butter eggs fresh ginger roo...
6    olive oil salt medium shrimp pepper garlic cho...
7    sugar pistachio nuts white almond bark flour v...
8    olive oil purple onion fresh pineapple pork po...
9    chopped tomatoes fresh basil garlic extra-virg...
Name: ingredients, dtype: object

# Modelleme

In [60]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from textblob import TextBlob
from nltk.stem.snowball import SnowballStemmer

stemmer = SnowballStemmer('english') 

def split_into_lemmas(text):    # Stemma analiz methodunu tanımladık
    
    text = str(text).lower()   
    
    words = TextBlob(text).words
    
    return [stemmer.stem(word) for word in words]

## Verileri Ayırma ve Vektörize Etme İşlemleri

In [62]:
x,y=df['ingredients'],df['cuisine']

In [63]:
x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=80)

In [64]:
vect=CountVectorizer(lowercase=True, stop_words='english', ngram_range=(1,2), analyzer=split_into_lemmas)
x_train_dtm=vect.fit_transform(x_train,y_train)
x_test_dtm=vect.transform(x_test)

In [65]:
x_train_dtm

<29830x2618 sparse matrix of type '<class 'numpy.int64'>'
	with 557553 stored elements in Compressed Sparse Row format>

In [66]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

In [67]:
b=MultinomialNB()
model=b.fit(x_train_dtm,y_train)
b_predict=b.predict(x_test_dtm)

In [68]:
accuracy_score(y_test,b_predict)

0.7247586484312148

In [69]:
def vectorizing(text):
    
    return vect.transform([text])


In [72]:
df2 = pd.read_json(open("train.json", "r", encoding="utf8"))

In [73]:
model.predict(vectorizing(clearingandconverting(df2['ingredients'][12])))

array(['italian'], dtype='<U12')

# Test Verilerini Oluşturulan Model İle Tahmin Etme Ve Kaggle İçin Hazırlama

In [74]:
test = pd.read_json('test.json')

In [75]:
test

Unnamed: 0,id,ingredients
0,18009,"[baking powder, eggs, all-purpose flour, raisi..."
1,28583,"[sugar, egg yolks, corn starch, cream of tarta..."
2,41580,"[sausage links, fennel bulb, fronds, olive oil..."
3,29752,"[meat cuts, file powder, smoked sausage, okra,..."
4,35687,"[ground black pepper, salt, sausage casings, l..."
...,...,...
9939,30246,"[large egg yolks, fresh lemon juice, sugar, bo..."
9940,36028,"[hot sauce, butter, sweet potatoes, adobo sauc..."
9941,22339,"[black pepper, salt, parmigiano reggiano chees..."
9942,42525,"[cheddar cheese, cayenne, paprika, plum tomato..."


In [77]:
test['ingredients'] = test['ingredients'].apply(clearingandconverting)

In [78]:
test

Unnamed: 0,id,ingredients
0,18009,baking powder eggs all-purpose flour raisins m...
1,28583,sugar egg yolks corn starch cream of tartar ba...
2,41580,sausage links fennel bulb fronds olive oil cub...
3,29752,meat cuts file powder smoked sausage okra shri...
4,35687,ground black pepper salt sausage casings leeks...
...,...,...
9939,30246,large egg yolks fresh lemon juice sugar bourbo...
9940,36028,hot sauce butter sweet potatoes adobo sauce salt
9941,22339,black pepper salt parmigiano reggiano cheese r...
9942,42525,cheddar cheese cayenne paprika plum tomatoes g...


In [79]:
vext_mod = test['ingredients'].apply(vectorizing)

In [80]:
list = []

In [81]:
for v in vext_mod:
    list.append(model.predict(v))
list

[array(['british'], dtype='<U12'),
 array(['southern_us'], dtype='<U12'),
 array(['italian'], dtype='<U12'),
 array(['cajun_creole'], dtype='<U12'),
 array(['italian'], dtype='<U12'),
 array(['southern_us'], dtype='<U12'),
 array(['french'], dtype='<U12'),
 array(['chinese'], dtype='<U12'),
 array(['mexican'], dtype='<U12'),
 array(['british'], dtype='<U12'),
 array(['italian'], dtype='<U12'),
 array(['greek'], dtype='<U12'),
 array(['indian'], dtype='<U12'),
 array(['italian'], dtype='<U12'),
 array(['british'], dtype='<U12'),
 array(['french'], dtype='<U12'),
 array(['southern_us'], dtype='<U12'),
 array(['southern_us'], dtype='<U12'),
 array(['mexican'], dtype='<U12'),
 array(['southern_us'], dtype='<U12'),
 array(['japanese'], dtype='<U12'),
 array(['indian'], dtype='<U12'),
 array(['french'], dtype='<U12'),
 array(['vietnamese'], dtype='<U12'),
 array(['italian'], dtype='<U12'),
 array(['southern_us'], dtype='<U12'),
 array(['vietnamese'], dtype='<U12'),
 array(['korean'], dtype='

In [82]:
df = pd.DataFrame(list)

In [83]:
df['cuisine']=df.iloc[:,0]

In [84]:
test

Unnamed: 0,id,ingredients
0,18009,baking powder eggs all-purpose flour raisins m...
1,28583,sugar egg yolks corn starch cream of tartar ba...
2,41580,sausage links fennel bulb fronds olive oil cub...
3,29752,meat cuts file powder smoked sausage okra shri...
4,35687,ground black pepper salt sausage casings leeks...
...,...,...
9939,30246,large egg yolks fresh lemon juice sugar bourbo...
9940,36028,hot sauce butter sweet potatoes adobo sauce salt
9941,22339,black pepper salt parmigiano reggiano cheese r...
9942,42525,cheddar cheese cayenne paprika plum tomatoes g...


In [85]:
df['0'] =test['id']

In [86]:
df.columns = ['0', 'cuisine', 'id']

In [87]:
df.drop('0', axis=1, inplace=True)

In [88]:
df

Unnamed: 0,cuisine,id
0,british,18009
1,southern_us,28583
2,italian,41580
3,cajun_creole,29752
4,italian,35687
...,...,...
9939,french,30246
9940,southern_us,36028
9941,italian,22339
9942,cajun_creole,42525


In [89]:
df.set_index('id', inplace=True)

In [90]:
df

Unnamed: 0_level_0,cuisine
id,Unnamed: 1_level_1
18009,british
28583,southern_us
41580,italian
29752,cajun_creole
35687,italian
...,...
30246,french
36028,southern_us
22339,italian
42525,cajun_creole


In [91]:
df.to_csv('submission.csv')

# Uygulama Oluşturma

In [None]:
import PySimpleGUI as sg
import time
from textblob import TextBlob


layout = [[sg.Text('Yemek Mutfağı Tahminleme', font=("Helvetica", 25))],
          [sg.Image(filename=r'C:\Users\MONSTERHAN\Desktop\nltk.png', size=(75, 81))],
              [sg.Text('Liste halinde malzemeleri giriniz')],
              [sg.In(justification='center', key='-IN-', enable_events=True, size=(60, 60))],
              [],
              [sg.Button('Analiz et'), sg.Text(size=(25,1), key='-OUTPUT-')],
              [sg.Button('Exit')],
              [sg.Text('Created by Erhan Namlı', font=("Helvetica", 8))]]

window = sg.Window('', layout, element_justification='c')

while True:  # Event Loop
    
    event, values = window.read()
    
    if event == sg.WIN_CLOSED or event == 'Exit':
        
        break
        
    if event == 'Analiz et':
        
        sonuc = model.predict(vectorizing(clearingandconverting(values['-IN-'])))
        
        window['-OUTPUT-'].update(sonuc)
    


![image](Program.PNG)