# How to learn from the Ingredient Graph

This notebook explains how to create an Ingredient Graph (IG) from several json input files.

### Importing several Python modules

For reading json files, we need the `json` package. For setting the correct collation for sorting node identifiers, we need the `locale` package. 

In [13]:
import sys
# !{sys.executable} -m pip install networkx
# {sys.executable} -m pip install matplotlib
# !{sys.executable} -m pip install mlxtend
# !{sys.executable} -m pip install xlwt
# !{sys.executable} -m pip install openpyxl
# import networkx as nx
# G = nx.Graph()
import json
# import matplotlib.pyplot as plt
import locale
locale.setlocale(locale.LC_ALL, 'de-DE.utf-8')
import matplotlib.pyplot as plt
import openpyxl
import numpy as np
from mlxtend.plotting import heatmap
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
%matplotlib inline
%config InlineBackend.figure_formats = {'svg',}

Read json file for **Yotam Ottolenghi's "Flavour"** recipes and create array [recipe][ingredients].

In [2]:
with open('recipes_YO_Flavour.json', encoding='utf-8') as file:
    data = json.load(file)
flv_arr = []
for rcp in data:
    row = rcp["ingredients"]
    flv_arr.append(row)
#print (flv_arr)

Transform array to pandas data frame.

In [3]:
te = TransactionEncoder()
te_ary = te.fit(flv_arr).transform(flv_arr)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,ahornsirup,ancho,apfel,apfelessig,aprikose,auberginen,austernpilz,avocado,babyspinat,balsamico,...,weißweinessig,worcester,zatar,ziegenfrischkäse,zimt,zitrone,zucchini,zucker,zuckerschoten,zwiebel
0,True,False,False,True,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,False,False
1,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
2,False,False,False,False,False,False,False,False,True,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,True,False,False,True,False,...,False,False,False,False,True,True,False,False,False,True
4,True,False,False,False,True,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,False,True
88,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
89,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
90,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,True,False,False


Compute ingredient co-occurrences.

In [4]:
frequent_itemsets=apriori(df, min_support=0.2, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets.to_excel('fp_flv.xlsx')
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.554348,(chilischote),1
1,0.23913,(cumin),1
2,0.228261,(ingwer),1
3,0.793478,(knoblauch),1
4,0.217391,(koriandergrün),1
5,0.206522,(lauchzwiebel),1
6,0.423913,(limette),1
7,0.76087,(olivenöl),1
8,0.206522,(petersilie),1
9,0.206522,(sesam),1


Create association rules.

In [5]:
association_rules(frequent_itemsets, metric="lift", min_threshold=0.7)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(chilischote),(knoblauch),0.554348,0.793478,0.445652,0.803922,1.013161,0.005789,1.053261
1,(knoblauch),(chilischote),0.793478,0.554348,0.445652,0.561644,1.013161,0.005789,1.016644
2,(chilischote),(limette),0.554348,0.423913,0.271739,0.490196,1.15636,0.036744,1.130017
3,(limette),(chilischote),0.423913,0.554348,0.271739,0.641026,1.15636,0.036744,1.24146
4,(olivenöl),(chilischote),0.76087,0.554348,0.423913,0.557143,1.005042,0.002127,1.006311
5,(chilischote),(olivenöl),0.554348,0.76087,0.423913,0.764706,1.005042,0.002127,1.016304
6,(cumin),(olivenöl),0.23913,0.76087,0.217391,0.909091,1.194805,0.035444,2.630435
7,(olivenöl),(cumin),0.76087,0.23913,0.217391,0.285714,1.194805,0.035444,1.065217
8,(knoblauch),(limette),0.793478,0.423913,0.336957,0.424658,1.001756,0.000591,1.001294
9,(limette),(knoblauch),0.423913,0.793478,0.336957,0.794872,1.001756,0.000591,1.006793


Read json file for **Henriette Davidis Gemüse chapter** and create array [recipe][ingredients].

In [8]:
with open('recipes_HD_Gemüse.json', encoding='utf-8') as file:
    data = json.load(file)
hdg_arr = []
for rcp in data:
    row = rcp["ingredients"]
    hdg_arr.append(row)
#print (hdg_arr)

Transform array to pandas data frame.

In [9]:
te = TransactionEncoder()
te_ary = te.fit(hdg_arr).transform(hdg_arr)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,ackerbohne,ackerrettich,apfel,artischocken,austern,beeren,birne,blumenkohl,bratenjus,bratfett,...,weiße_bohnen,weißkohl,wirsing,wurzeln,zimt,zitrone,zucker,zuckerschoten,zwieback,zwiebel
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,True,False,False,True,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
91,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
92,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
93,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


Compute ingredient co-occurrences.

In [10]:
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets.to_excel('fp_hdg.xlsx')
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.894737,(butter),1
1,0.315789,(ei),1
2,0.263158,(essig),1
3,0.305263,(fleischbrühe),1
4,0.368421,(kartoffel),1
5,0.4,(muskat),1
6,0.315789,(weizenmehl),1
7,0.252632,(zwieback),1
8,0.294737,"(ei, butter)",2
9,0.252632,"(essig, butter)",2


Create association rules.

In [11]:
association_rules(frequent_itemsets, metric="lift", min_threshold=0.7)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(ei),(butter),0.315789,0.894737,0.294737,0.933333,1.043137,0.012188,1.578947
1,(butter),(ei),0.894737,0.315789,0.294737,0.329412,1.043137,0.012188,1.020314
2,(essig),(butter),0.263158,0.894737,0.252632,0.96,1.072941,0.017175,2.631579
3,(butter),(essig),0.894737,0.263158,0.252632,0.282353,1.072941,0.017175,1.026747
4,(fleischbrühe),(butter),0.305263,0.894737,0.305263,1.0,1.117647,0.032133,inf
5,(butter),(fleischbrühe),0.894737,0.305263,0.305263,0.341176,1.117647,0.032133,1.054511
6,(kartoffel),(butter),0.368421,0.894737,0.284211,0.771429,0.862185,-0.045429,0.460526
7,(butter),(kartoffel),0.894737,0.368421,0.284211,0.317647,0.862185,-0.045429,0.92559
8,(muskat),(butter),0.4,0.894737,0.4,1.0,1.117647,0.042105,inf
9,(butter),(muskat),0.894737,0.4,0.4,0.447059,1.117647,0.042105,1.085106
