# Food Dude
### Jaiden Gerig, Oron Hazi, Justin Katz, Kyle Wilson

## Importing Our Data
Luckily this is all in a CSV file so we can grab it easilty with Pandas

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

foods = pd.read_csv('ABBREV.csv', index_col=0)
foods.head()

Unnamed: 0_level_0,Shrt_Desc,Water_(g),Energ_Kcal,Protein_(g),Lipid_Tot_(g),Ash_(g),Carbohydrt_(g),Fiber_TD_(g),Sugar_Tot_(g),Calcium_(mg),...,Vit_K_(�g),FA_Sat_(g),FA_Mono_(g),FA_Poly_(g),Cholestrl_(mg),GmWt_1,GmWt_Desc1,GmWt_2,GmWt_Desc2,Refuse_Pct
NDB_No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1001,"BUTTER,WITH SALT",15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,24.0,...,7.0,51.368,21.021,3.043,215.0,5.0,"1 pat, (1"" sq, 1/3"" high)",14.2,1 tbsp,0.0
1002,"BUTTER,WHIPPED,W/ SALT",16.72,718,0.49,78.3,1.62,2.87,0.0,0.06,23.0,...,4.6,45.39,19.874,3.331,225.0,3.8,"1 pat, (1"" sq, 1/3"" high)",9.4,1 tbsp,0.0
1003,"BUTTER OIL,ANHYDROUS",0.24,876,0.28,99.48,0.0,0.0,0.0,0.0,4.0,...,8.6,61.924,28.732,3.694,256.0,12.8,1 tbsp,205.0,1 cup,0.0
1004,"CHEESE,BLUE",42.41,353,21.4,28.74,5.11,2.34,0.0,0.5,528.0,...,2.4,18.669,7.778,0.8,75.0,28.35,1 oz,17.0,1 cubic inch,0.0
1005,"CHEESE,BRICK",41.11,371,23.24,29.68,3.18,2.79,0.0,0.51,674.0,...,2.5,18.764,8.598,0.784,94.0,132.0,"1 cup, diced",113.0,"1 cup, shredded",0.0


## Washing Our Food
to start off we only wanted to look at the main nutritional values (Protein,Fat,Sodium,Etc.) of each food

In [2]:
#Rename some of our columns to something a bit easier on the eyes
foods = foods.rename(index=str,columns={'Shrt_Desc':'Name','Protein_(g)':'Protein (g)','Lipid_Tot_(g)':'Total Fat(g)','Cholestrl_(mg)':'Cholesterol (mg)',
               'FA_Sat_(g)':'Saturated Fat (g)','Sodium_(mg)':'Sodium (mg)','Potassium_(mg)':'Potassium (mg)',
               'Carbohydrt_(g)':'Carbohydrates (g)','Fiber_TD_(g)':'Fiber (g)','GmWt_1':'Weight (g)'})
# Look at a specific subset of nutrients
nutrients = ['Name','Protein (g)','Total Fat(g)','Cholesterol (mg)',
               'Saturated Fat (g)','Sodium (mg)','Potassium (mg)',
               'Carbohydrates (g)','Fiber (g)','Weight (g)']
foods = foods[nutrients]
foods = foods.fillna(0)
# Get rid of foods we dont have serving sizes for
foods = foods[foods.apply(lambda x:x['Weight (g)'] > 0, axis=1)]
foods.head()

Unnamed: 0_level_0,Name,Protein (g),Total Fat(g),Cholesterol (mg),Saturated Fat (g),Sodium (mg),Potassium (mg),Carbohydrates (g),Fiber (g),Weight (g)
NDB_No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1001,"BUTTER,WITH SALT",0.85,81.11,215.0,51.368,643.0,24.0,0.06,0.0,5.0
1002,"BUTTER,WHIPPED,W/ SALT",0.49,78.3,225.0,45.39,583.0,41.0,2.87,0.0,3.8
1003,"BUTTER OIL,ANHYDROUS",0.28,99.48,256.0,61.924,2.0,5.0,0.0,0.0,12.8
1004,"CHEESE,BLUE",21.4,28.74,75.0,18.669,1146.0,256.0,2.34,0.0,28.35
1005,"CHEESE,BRICK",23.24,29.68,94.0,18.764,560.0,136.0,2.79,0.0,132.0


We wanted to look the nutritional value of each food regardless of serving size, so we normalized them to 1 gram

In [3]:
# Normalizing all our foods to 1 g
def normalizeNutrients(x):
    ratio = x['Weight (g)']
    for nutrient in nutrients:
        if(type(x[nutrient]) is str):
            continue
        x[nutrient] = x[nutrient]/ratio
    return x
foods = foods.apply(normalizeNutrients,axis=1)
foods.head()

Unnamed: 0_level_0,Name,Protein (g),Total Fat(g),Cholesterol (mg),Saturated Fat (g),Sodium (mg),Potassium (mg),Carbohydrates (g),Fiber (g),Weight (g)
NDB_No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1001,"BUTTER,WITH SALT",0.17,16.222,43.0,10.2736,128.6,4.8,0.012,0.0,1.0
1002,"BUTTER,WHIPPED,W/ SALT",0.128947,20.605263,59.210526,11.944737,153.421053,10.789474,0.755263,0.0,1.0
1003,"BUTTER OIL,ANHYDROUS",0.021875,7.771875,20.0,4.837813,0.15625,0.390625,0.0,0.0,1.0
1004,"CHEESE,BLUE",0.75485,1.013757,2.645503,0.658519,40.42328,9.029982,0.08254,0.0,1.0
1005,"CHEESE,BRICK",0.176061,0.224848,0.712121,0.142152,4.242424,1.030303,0.021136,0.0,1.0


## Making a Meal
We started our food analysis by taking all the foods and Scaling them up to meet the demands of an average a 2000 calorie diet according to [Netrition](http://www.netrition.com/rdi_page.html).

The first step we took was to exclude foods that didnt contain all the nutrients we were looking at

In [4]:
# We only want foods that have a chance to sustain our needs
def filterNutrients(x):
    for nutrient in nutrients:
        if(x[nutrient] <= 0):
            return False
    return True
foods = foods[foods.apply(filterNutrients, axis=1)]
print 'Matching Foods:',len(foods)
foods.head()

Matching Foods: 1396


Unnamed: 0_level_0,Name,Protein (g),Total Fat(g),Cholesterol (mg),Saturated Fat (g),Sodium (mg),Potassium (mg),Carbohydrates (g),Fiber (g),Weight (g)
NDB_No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1013,"CHEESE,COTTAGE,CRMD,W/FRUIT",0.094602,0.034071,0.115044,0.020451,3.044248,0.79646,0.040796,0.00177,1.0
1043,"CHEESE,PAST PROCESS,PIMENTO",0.158071,0.222857,0.671429,0.14045,6.535714,1.157143,0.012357,0.000714,1.0
1102,"MILK,CHOC,FLUID,COMM,WHL,W/ ADDED VIT A & VITA...",0.01268,0.01356,0.048,0.008416,0.24,0.668,0.04136,0.0032,1.0
1103,"MILK,CHOC,FLUID,COMM,RED FAT",0.01196,0.0076,0.032,0.004708,0.264,0.676,0.04852,0.0028,1.0
1104,"MILK,CHOC,LOWFAT,W/ ADDED VIT A & VITAMIN D",0.01384,0.004,0.02,0.002336,0.26,0.688,0.03944,0.0004,1.0


Nice, That leaves us with almost 1400 foods to look at, not too shabby
So let's take these foods and scale them up to see how many grams of each we would have to consume to fulfill our daily nutrition requirements

In [28]:
# http://www.netrition.com/rdi_page.html
recommended = [-1,50,65,300,20,2400,3500,300,25,-1]
def findSatisfyingWeight(food):
    for x in range(0,len(nutrients)):
        nutrient = nutrients[x]
        rec = recommended[x]
        if(rec == -1 or food[nutrient] >= rec):
            continue
        ratio = rec/food[nutrient]
        for y in nutrients:
            if(type(food[y]) is str):
                continue
            food[y] = food[y]*ratio
    return food   
weighted_foods = foods.apply(findSatisfyingWeight,axis=1)
display = ['Name','Weight (g)','Protein (g)','Total Fat(g)','Cholesterol (mg)',
               'Saturated Fat (g)','Sodium (mg)','Potassium (mg)',
               'Carbohydrates (g)','Fiber (g)']
weighted_foods[display].sort_values(by='Weight (g)').head(10)

Unnamed: 0_level_0,Name,Weight (g),Protein (%),Total Fat(%),Cholesterol (%),Saturated Fat (%),Sodium (%),Potassium (%),Carbohydrates (%),Fiber (g)
NDB_No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
43260,"BEVERAGE,INST BRKFST PDR,CHOC,SUGAR-FREE,NOT R...",71.372549,456.27451,65.0,560.784314,27.554902,9138.235294,21730.392157,522.54902,25.490196
6959,"GRAVY,INST TURKEY,DRY",83.75,146.5,183.25,300.0,61.6375,51125.0,3825.0,719.5,47.5
21422,"KFC,POPCORN CHICK",160.0,441.75,543.5,1000.0,98.85,28500.0,7200.0,529.5,25.0
6957,"GRAVY,BROWN INST,DRY",167.5,213.25,296.25,300.0,146.65,126325.0,9175.0,1494.5,80.0
43078,"BEVERAGE,MILKSHAKE MIX,DRY,NOT CHOC",175.0,587.5,65.0,350.0,51.475,19500.0,55000.0,1322.5,40.0
1223,"PROTEIN SUPP,MILK BSD,MUSCLE MILK,PDR",178.378378,741.243243,277.945946,340.540541,25.183784,5335.135135,18308.108108,300.0,115.135135
6958,"GRAVY,INST BF,DRY",182.727273,267.272727,258.545455,300.0,133.090909,141900.0,12272.727273,1666.363636,117.272727
6124,"GRAVY,PORK,DRY,PDR",201.0,263.4,258.9,300.0,128.7,160680.0,7050.0,1907.1,72.0
19120,"CANDIES,MILK CHOC",212.658228,232.405063,901.063291,698.734177,562.298734,2400.0,11301.265823,1804.556962,103.291139
31019,"SEAWEED,CANADIAN CULTIVATED EMI-TSUNOMATA,DRY",233.81295,717.338129,65.0,1543.165468,21.043165,202528.776978,137669.064748,2162.302158,1716.18705


Yikes, none of that seems very healthy at all

To some of these foods into perspective:

* KFC Popcorn Chicken: 160 grams = 25 pieces
* Milkshake Mix: 175 grams = 25 tablespoons
* Milk Choclate: 213 grams = 30 Bars

Let's see what happens when we look at how far over the reccomended nutritional values these foods go

In [70]:
from bokeh.io import push_notebook,show,output_notebook
from bokeh.layouts import row
from bokeh.plotting import figure
from bokeh.charts import Bar, output_file, show
from bokeh.models import Range1d
from bokeh.charts.operations import blend
from bokeh import palettes
def findOverages(food):
    for x in range(0,len(nutrients)):
        nutrient = nutrients[x]
        rec = recommended[x]
        if(rec == -1):
            continue
        food[nutrient] -= rec
    return food   
overage_foods = weighted_foods.apply(findOverages,axis=1)
df = overage_foods.sort_values(by='Weight (g)').head(10)
a = Bar(df, 'Name', values='Protein (g)', title="Excess Protein",legend=False,width=450,continuous_range=Range1d(0,100))
b = Bar(df, 'Name', values='Total Fat(g)', title="Excess Total Fat",legend=False,width=450)
a.xaxis.axis_label = ""
b.xaxis.axis_label = ""
show(row(a,b))
a = Bar(df, 'Name', values='Cholesterol (mg)', title="Excess Cholesterol",legend=False,width=450)
b = Bar(df, 'Name', values='Saturated Fat (g)', title="Excess Saturated Fats",legend=False,width=450)
a.xaxis.axis_label = ""
b.xaxis.axis_label = ""
show(row(a,b))
a = Bar(df, 'Name', values='Sodium (mg)', title="Excess Sodium",legend=False,width=450)
b = Bar(df, 'Name', values='Potassium (mg)', title="Excess Potassium",legend=False,width=450)
a.xaxis.axis_label = ""
b.xaxis.axis_label = ""
show(row(a,b))
a = Bar(df, 'Name', values='Carbohydrates (g)', title="Excess Carbohydrates",legend=False,width=450)
b = Bar(df, 'Name', values='Fiber (g)', title="Excess Fiber",legend=False,width=450)
a.xaxis.axis_label = ""
b.xaxis.axis_label = ""
show(row(a,b))
output_notebook()

So that gives us a general idea of what kind of foods have common excesses but it's hard to compare them to each-other so let's convert them to percentages of the 2000 calorie diet over 100%

In [29]:
def findPercentOverages(food):
    for x in range(0,len(nutrients)):
        nutrient = nutrients[x]
        rec = recommended[x]
        if(rec == -1):
            continue
        food[nutrient] = ((food[nutrient]/rec)*100)-100
    return food   
overage_foods = weighted_foods.apply(findPercentOverages,axis=1)
overage_foods[display].sort_values(by='Weight (g)').head(10).rename(index=str,columns={'Protein (g)':'Protein (%)','Total Fat(g)':'Total Fat(%)','Cholesterol (mg)':'Cholesterol (%)',
               'Saturated Fat (g)':'Saturated Fat (%)','Sodium (mg)':'Sodium (%)','Potassium (mg)':'Potassium (%)',
               'Carbohydrates (g)':'Carbohydrates (%)','Fiber (g)':'Fiber (g)'})

Unnamed: 0_level_0,Name,Weight (g),Protein (%),Total Fat(%),Cholesterol (%),Saturated Fat (%),Sodium (%),Potassium (%),Carbohydrates (%),Fiber (g)
NDB_No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
43260,"BEVERAGE,INST BRKFST PDR,CHOC,SUGAR-FREE,NOT R...",71.372549,812.54902,0.0,86.928105,37.77451,280.759804,520.868347,74.183007,1.960784
6959,"GRAVY,INST TURKEY,DRY",83.75,193.0,181.923077,0.0,208.1875,2030.208333,9.285714,139.833333,90.0
21422,"KFC,POPCORN CHICK",160.0,783.5,736.153846,233.333333,394.25,1087.5,105.714286,76.5,0.0
6957,"GRAVY,BROWN INST,DRY",167.5,326.5,355.769231,0.0,633.25,5163.541667,162.142857,398.166667,220.0
43078,"BEVERAGE,MILKSHAKE MIX,DRY,NOT CHOC",175.0,1075.0,0.0,16.666667,157.375,712.5,1471.428571,340.833333,60.0
1223,"PROTEIN SUPP,MILK BSD,MUSCLE MILK,PDR",178.378378,1382.486486,327.609148,13.513514,25.918919,122.297297,423.088803,0.0,360.540541
6958,"GRAVY,INST BF,DRY",182.727273,434.545455,297.762238,0.0,565.454545,5812.5,250.649351,455.454545,369.090909
6124,"GRAVY,PORK,DRY,PDR",201.0,426.8,298.307692,0.0,543.5,6595.0,101.428571,535.7,188.0
19120,"CANDIES,MILK CHOC",212.658228,364.810127,1286.251217,132.911392,2711.493671,0.0,222.893309,501.518987,313.164557
31019,"SEAWEED,CANADIAN CULTIVATED EMI-TSUNOMATA,DRY",233.81295,1334.676259,0.0,414.388489,5.215827,8338.699041,3833.40185,620.767386,6764.748201


And now let's chart them again with a uniform scale


In [60]:
df = overage_foods.sort_values(by='Weight (g)').head(10).rename(index=str,columns={'Protein (g)':'Protein (%)','Total Fat(g)':'Total Fat(%)','Cholesterol (mg)':'Cholesterol (%)',
               'Saturated Fat (g)':'Saturated Fat (%)','Sodium (mg)':'Sodium (%)','Potassium (mg)':'Potassium (%)',
               'Carbohydrates (g)':'Carbohydrates (%)','Fiber (g)':'Fiber (g)'})
a = Bar(df, label='vars',group='Name', 
        values=blend('Protein (%)', 'Total Fat(%)','Cholesterol (%)',
                     'Saturated Fat (%)','Sodium (%)','Potassium (%)',
                     'Carbohydrates (%)','Fiber (%)',name='values', labels_name='vars'),
        title="Excess Nutrients (% above recommended daily intake)",width=900)
a.xaxis.axis_label = ""
a.yaxis.axis_label = "% above reccomended daily intake"
show(a)
output_notebook()

Holy Guacamole! Look at that sodium!

It looks like potassium and sodium are crazy high compared to the other nutrients, which is most likely because they're the only nutrients measured in milligrams instead of grams, so let's see what the graph looks like without them so we can get a better understanding of the other nutrients

In [79]:
df = overage_foods.sort_values(by='Weight (g)').head(10).rename(index=str,columns={'Protein (g)':'Protein (%)','Total Fat(g)':'Total Fat(%)','Cholesterol (mg)':'Cholesterol (%)',
               'Saturated Fat (g)':'Saturated Fat (%)','Sodium (mg)':'Sodium (%)','Potassium (mg)':'Potassium (%)',
               'Carbohydrates (g)':'Carbohydrates (%)','Fiber (g)':'Fiber (g)'})
a = Bar(df, label='vars',group='Name', 
        values=blend('Protein (%)', 'Total Fat(%)','Cholesterol (%)',
                     'Saturated Fat (%)',
                     'Carbohydrates (%)','Fiber (%)',name='values', labels_name='vars'),
        title="Excess Nutrients (% above recommended daily intake) (Excluding Sodium & Potassium)",width=900,height=1000,palette=palettes.BrBG11)
a.xaxis.axis_label = ""
a.yaxis.axis_label = "% above reccomended daily intake"
show(a)
output_notebook()

So overall, it appears that these foods are providing mainly carbs and sodium at the expense of other nutrients 