# Finding the right ingredients to provide

The following dataframe we obtained has nutritional information about the most varied food products there are. The information we are most interested in for our analysis. per product, are:
* **Food Group** - a generalized group in which the product is inserted
* **Food Name** - the name of the product itself
* **Protein (g)** - The amount of grams of proteins in a 100g serving
* **Carbohydrates (g)** - The amount of grams of carbohydrates in a 100g serving
* **Fat (g)** - The amount of grams of fat in a 100g serving

In [82]:
usda_foods = pd.read_excel("data/USDA-Food.xlsx", sheet_name=0)
test = pd.DataFrame(usda_foods[['Food Group', 'Food Name', 'Protein (g)', 'Carbohydrates (g)', 'Fat (g)']])

In [83]:
test.head()

Unnamed: 0,Food Group,Food Name,Protein (g),Carbohydrates (g),Fat (g)
0,Dairy and Egg Products,"Butter, salted",0.85,0.06,81.11
1,Dairy and Egg Products,"Butter, whipped, with salt",0.49,2.87,78.3
2,Dairy and Egg Products,"Butter oil, anhydrous",0.28,0.0,99.48
3,Dairy and Egg Products,"Cheese, blue",21.4,2.34,28.74
4,Dairy and Egg Products,"Cheese, brick",23.24,2.79,29.68


From the national agriculture library (https://www.nal.usda.gov/fnic/how-many-calories-are-one-gram-fat-carbohydrate-or-protein), we know that 1 gram of protein, fat and carbohydrates proved 4, 9 and 4 Kcal each, respectively. 

As such, and taking into account the necessity that each person, in their diet, should have their calories coming (**REFERENCE**): 
* 55% from proteins
* 25% from carbohydrates 
* 20% from fats.

In order to decide which products we want to provide to our needed countries, we'll apply a greedy rank that tries to find products which most closely respect these percentages.

In [84]:
def rank_food(food):
    prot = food['Protein (g)']
    carb = food['Carbohydrates (g)']
    fat = food['Fat (g)']
    
    if (prot == 0 and carb == 0 and fat == 0):
        return -1
    
    tot = prot * 4 + carb * 4 + fat * 9
    
    err_prot = abs(tot*0.55/4 - prot) / 100
    err_carb = abs(tot*0.25/4 - carb) / 100
    err_fat = abs(tot*0.20/9 - fat) / 100
    
    avg_err = (err_prot + err_carb + err_fat)/3
    
    return avg_err

In [85]:
test['rank'] = test.apply(rank_food, axis=1)

In [86]:
test = test.drop(test[test['rank'] < 0].index)

In [87]:
test.groupby(['Food Group'], as_index=False)['rank'].min().merge(test)

Unnamed: 0,Food Group,rank,Food Name,Protein (g),Carbohydrates (g),Fat (g)
0,Baby Foods,0.011343,"Fluid replacement, electrolyte solution (inclu...",0.0,2.45,0.0
1,Baked Products,0.065894,"Leavening agents, yeast, baker's, compressed",8.4,18.1,1.9
2,Beef Products,0.021907,"Beef, variety meats and by-products, tripe, co...",11.71,1.99,4.05
3,Beverages,0.000139,"Beverages, water, bottled, non-carbonated, DAN...",0.0,0.03,0.0
4,Breakfast Cereals,0.304611,"Cereals ready-to-eat, POST GREAT GRAINS Banana...",9.8,70.9,8.8
5,Cereal Grains and Pasta,0.046066,"Oat bran, cooked",3.21,11.44,0.86
6,Dairy and Egg Products,0.003264,"Yogurt, Greek, plain, lowfat",9.95,3.94,1.92
7,Fats and Oils,0.016926,"Salad dressing, sweet and sour",0.1,3.7,0.0
8,Finfish and Shellfish Products,0.00055,"Mollusks, clam, mixed species, canned, liquid",0.4,0.1,0.02
9,Fruits and Fruit Juices,0.019102,"Rhubarb, raw",0.9,4.54,0.2
