# Data Analysis: RecipeDB

## 4.1 Network Analysis - VoteRank Algorithm

### Ingredient - Ingredient Graph
We will now create a weighted graph between ingredients, where the weights represent the number of shared recipes. To do so, we have created a 2-dimensional dictionary.

In [1]:
#load libraries
import numpy as np
import pandas as pd 
import csv
import plotly
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import networkx as nx

In [2]:
#read the dataset
data = pd.read_csv("datasets/dataset3.csv")
data = pd.DataFrame(data)
data_dict = data.to_dict()

In [3]:
ingr_arr = []
reci_arr = []
for i in range(len(data)):
    ing = data_dict['ingredient'][i]
    rec = data_dict['recipe_no'][i]
    ingr_arr.append(ing)
    reci_arr.append(rec)

reci_arr = list(set(reci_arr))
ingr_arr = list(set(ingr_arr))

#2d-dictionary to create the graph
dict_ingrs = {}
temp_ing = []
for i in range(len(ingr_arr)):
    if(ingr_arr[i] not in temp_ing):
        dict_ingrs[ingr_arr[i]] = i
        temp_ing.append(ingr_arr[i])

f = data

#m stores the max of reci_arr and ingr_arr to determine the length of the dictionary
m = max(len(reci_arr), len(ingr_arr))

source_dict = dict()
for ind in f.index:
    source = dict_ingrs[f['ingredient'][ind]]
    recipe_no = f["recipe_no"][ind]
    
    #if two ingredients have a common recipe
    if source in source_dict:
        if recipe_no not in source_dict[source]:
            source_dict[source][recipe_no] = ""
    else:
        source_dict[source] = dict()
        source_dict[source][recipe_no] = ""

weights = []
for i in range(m+1):
    temp = [0]*(m+1)
    weights.append(temp)

for ingredient in source_dict:
    for source2 in source_dict:
        count = 0
        if ingredient!=source2:
            for recipe_no in source_dict[ingredient]:
                if recipe_no in source_dict[source2]:
                    count += 1
        weights[ingredient][source2] = count

#### Saving in a csv file
The file 'ingredient_weights.csv' contains the weighted graph.

In [4]:
#to save the graph in a csv file 
csv_file = open("ingredient_weights.csv",'w',newline='')
csv_writer = csv.writer(csv_file)

for i in range(len(reci_arr)):
    for j in range(len(reci_arr)):
        if(weights[i][j]>0 and i!=j):
            if(ingr_arr[i]!='nan' and ingr_arr[j]!='nan'):
                csv_writer.writerow([ingr_arr[i],ingr_arr[j],weights[i][j]])

### VoteRank Algorithm
---
VoteRank Algorithm is used to find the most influential nodes in a network, using metrics such as location in the network, degree and edge weights.

Top 5 influential/key ingredients in our network:
1. garlic
2. water
3. salt
4. onion
5. vegetable oil

<em>Reference</em>: Zhang, J.X., Chen, D.B., Dong, Q. and Zhao, Z.D., 2016. Identifying a set of influential spreaders in complex networks. Scientific reports, 6, p.27823.

In [5]:
#read the list
G = nx.read_edgelist('output_files/ingredient_weights.csv', delimiter=',', encoding='latin1', create_using=nx.Graph(), nodetype=str, data=(('weight',int),))

csv_file = open('output_files/voteRank_ingredient.csv','w',newline='')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Nodes'])

#use the nx library voterank
voteRankList = nx.algorithms.centrality.voterank(G)

for i in voteRankList:
    csv_writer.writerow([i])

print(voteRankList[:5])

['water', 'garlic', 'salt', 'onion', 'vegetable oil']


The file 'voteRank_ingredient.csv' stores the list of most influential ingredients based on shared recipes. 