# **`Project 2: Team Thomas Allinson`**

### **Objective**: Analyze the comparative costs of a vegan diet versus an omnivorous diet within the American population, with a specific focus on their environmental impact.

#### Group Members:
> Johann: johann.dicken@berkeley.edu <br>
> Laure: laureho@berkeley.edu <br>
> Reily: reilyjean@berkeley.edu <br>
> Carmen: carmenvega@berkeley.edu <br>
> Steven: k1519632@berkeley.edu <br>

### **[A]: Description of population of interest**

...descripition here...

### **[A]: Dietary Reference Intakes**

In [2]:
import pandas as pd
import numpy as np

In [3]:
# Import Dietary Requirements spreadsheet data as a pd.DataFrame
diet_min = pd.read_csv('Dietary_Requirements.csv')
diet_min.head()

Unnamed: 0,Nutrition,Source,C 1-3,F 4-8,M 4-8,F 9-13,M 9-13,F 14-18,M 14-18,F 19-30,M 19-30,F 31-50,M 31-50,F 51+,M 51+
0,Energy,---,1000.0,1200.0,1400.0,1600.0,1800.0,1800.0,2200.0,2000.0,2400.0,1800.0,2200.0,1600.0,2000.0
1,Protein,RDA,13.0,19.0,19.0,34.0,34.0,46.0,52.0,46.0,56.0,46.0,56.0,46.0,56.0
2,"Fiber, total dietary",---,14.0,16.8,19.6,22.4,25.2,25.2,30.8,28.0,33.6,25.2,30.8,22.4,28.0
3,"Folate, DFE",RDA,150.0,200.0,200.0,300.0,300.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0
4,"Calcium, Ca",RDA,700.0,1000.0,1000.0,1300.0,1300.0,1300.0,1300.0,1000.0,1000.0,1000.0,1000.0,1200.0,1000.0


Dietary function takes 2 arguments: `age`, a positive integer, and `sex`, a string (not case-senstitive) with the classification of male, female, or child.

In [4]:
def dietary_ref(age, sex):

    # Validate age input
    if not isinstance(age, int) or age <= 0:
        return "Incorrect age input. Please enter a positive integer for the age."
    
    # Normalize and validate sex input
    sex = sex.lower()
    if sex not in ['male', 'female', 'child']:
        return "Incorrect sex input. Input must be Male, Female, or Child."
    
    # Determine the appropriate column based on age and sex
    if sex == 'child':
        if age <= 3:
            col_name = 'C 1-3'
        elif age <= 8:
            col_name = 'C 4-8'
        else:
            return "Age out of range for child category."
    else:
        if age <= 8:
            col_name = f"{'F' if sex == 'female' else 'M'} 4-8"
        elif age <= 13:
            col_name = f"{'F' if sex == 'female' else 'M'} 9-13"
        elif age <= 18:
            col_name = f"{'F' if sex == 'female' else 'M'} 14-18"
        elif age <= 30:
            col_name = f"{'F' if sex == 'female' else 'M'} 19-30"
        elif age <= 50:
            col_name = f"{'F' if sex == 'female' else 'M'} 31-50"
        else:
            col_name = f"{'F' if sex == 'female' else 'M'} 51+"
    
    # Extract and return the relevant nutrient recommendations
    if col_name in diet_min.columns:
        return diet_min[['Nutrition', col_name]].set_index('Nutrition')[col_name]
    else:
        return "Matching column not found in DataFrame. Check the column names."

In [5]:
# Example usage
dietary_ref(15, 'Male')

Nutrition
Energy                            2200.0
Protein                             52.0
Fiber, total dietary                30.8
Folate, DFE                        400.0
Calcium, Ca                       1300.0
Carbohydrate, by difference        130.0
Iron, Fe                            11.0
Magnesium, Mg                      410.0
Niacin                              16.0
Phosphorus, P                     1250.0
Potassium, K                      4700.0
Riboflavin                           1.3
Thiamin                              1.2
Vitamin A, RAE                     900.0
Vitamin B-12                         2.4
Vitamin B-6                          1.3
Vitamin C, total ascorbic acid      75.0
Vitamin E (alpha-tocopherol)        15.0
Vitamin K (phylloquinone)           75.0
Zinc, Zn                            11.0
Name: M 14-18, dtype: float64

### **[A]: Data on prices for different foods**

Let's import our google spreadsheet as a pd.DataFrame here!

In [6]:
# prices_df = pd.read_csv('file_name.csv')
# prices_df
apikey = "KNqUDtV7Kcktiuheo3EoNhB0zDlCevFAdqZrKgdj" 
%pip install -r requirements.txt --upgrade
import fooddatacentral as fdc

Collecting Pint>=0.8.1 (from -r requirements.txt (line 2))
  Using cached Pint-0.24.4-py3-none-any.whl.metadata (8.5 kB)
Collecting scipy>=1.1.0 (from -r requirements.txt (line 18))
  Using cached scipy-1.15.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting gspread (from -r requirements.txt (line 20))
  Using cached gspread-6.2.0-py3-none-any.whl.metadata (11 kB)
Collecting gspread_pandas (from -r requirements.txt (line 22))
  Using cached gspread_pandas-3.3.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting bottleneck>=1.3.6 (from -r requirements.txt (line 24))
  Using cached Bottleneck-1.4.2-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting eep153_tools (from -r requirements.txt (line 26))
  Using cached eep153_tools-0.12.4-py2.py3-none-any.whl.metadata (363 bytes)
Collecting fooddatacentral (from -r requirements.txt (line 28))
  Using cached fooddatacentral-1.0.10-py3-

## OMNIVORE DIET

In [124]:

import re

# Load the CSV file into a DataFrame
df = pd.read_csv('food_and_prices.csv')

# Define a regex pattern for common animal products
animal_product_pattern = r'\b(butter|cheese|milk|kefir|whey|eggnog|mascarpone|mozzarella|stracchino|parmigiano|beef|turbot|cod|ricotta|chicken|carp|salmon|bacon|trout|mealworms|dulce|pork|egg|fish|lamb|turkey|turtle|breast|mollusks|frog|thigh|yogurt|honey|gelatin|cream|lard|sausage|anchovy|shellfish|shrimp|mayo|ham|meat)\b'

# Create a new column 'animal product' that marks items based on the pattern
df['animal product'] = df['Food commodity ITEM'].apply(
    lambda x: 'animal product' if re.search(animal_product_pattern, str(x), re.IGNORECASE) else 'plant-based'
)

# Display the updated DataFrame
df.rename(columns={'Food commodity ITEM': 'Food'}, inplace=True)
df = df[df['Food'] != 'YEAST COMPRESSED*']
df['Average Price per 100g (USD)'] = df['Average Price per kg (USD)']/10

#df.drop('YEAST COMPRESSED*'
#df[df['animal product'] == 'plant-based'].head(100)

# Display the updated DataFrame
df.rename(columns={'Food commodity ITEM': 'Food'}, inplace=True)
df[df['animal product'] == 'plant-based'].head(100)

# Separate animal products column
df_animal = df[df['animal product'] == 'animal product']

# Separate plant-based products
df_plant = df[df['animal product'] == 'plant-based']

df.set_index('Food', inplace=True)
df.head()

Unnamed: 0_level_0,Carbon Footprint kg CO2eq/kg or l of food ITEM,Water Footprint liters water/kg o liter of food ITEM,FDC ID,FDC Food Name,Average Price per kg (USD),animal product,Average Price per 100g (USD)
Food,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CHOCOLATE OR CREAM FILLED COOKIES**,1.53,2902.0,2707915,"Cookie, chocolate or fudge",10.8,animal product,1.08
SIMPLE COOKIES**,1.39,1723.0,2707964,"Cookie, shortbread",10.8,plant-based,1.08
BREAD MULTICEREAL**,0.7,771.0,2707777,"Bread, multigrain",4.24,plant-based,0.424
BREAD PLAIN**,0.89,1031.0,174929,"Bread, sticks, plain",4.24,plant-based,0.424
BREAD WHOLE**,0.77,887.0,2707709,"Bread, whole wheat",4.24,plant-based,0.424


### **[A]: Nutritional content of different foods**

In [127]:
D = {}
count = 0
for food in df.index:
        try:
            FDC = df.loc[df.index==food,:]['FDC ID'][0]
            count+=1
            D[food] = fdc.nutrients(apikey,FDC).Quantity
            #print(D[food])
            #print(food)
        except AttributeError:
            warnings.warn(f"Couldn't find FDC Code {FDC} for food {food}.")
    
D = pd.DataFrame(D,dtype=float)
D

  FDC = df.loc[df.index==food,:]['FDC ID'][0]


Unnamed: 0,CHOCOLATE OR CREAM FILLED COOKIES**,SIMPLE COOKIES**,BREAD MULTICEREAL**,BREAD PLAIN**,BREAD WHOLE**,FLAVORED CRACKERS**,PLAIN CRACKERS**,WHOLEGRAIN CRACKERS**,CRISPBREAD**,KETCHUP,...,EGGPLANT,PEPPER,PUMPKIN,TOMATO,ZUCCHINI,CARP,COD,SALMON,TROUT,TURBOT
Alanine,,,,0.395,,,,,,,...,,,,,,,,,,0.971
"Alcohol, ethyl",0.00,0.00,0.0,0.000,0.00,,0.00,0.00,,0.00,...,0.00,,,,,0.00,,,0.00,
Amino acids,,,,0.000,,,,,,,...,,,,,,,,,,0.000
Arginine,,,,0.432,,,,,,,...,,,,,,,,,,0.960
Ash,,,,3.900,,,,,,,...,,,,,,,,,,2.100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Vitamin K (Dihydrophylloquinone),,,,,,,,,,,...,,,,,,,,,,
Vitamin K (phylloquinone),2.40,11.00,1.4,2.200,7.80,,69.30,36.00,,3.00,...,24.30,,,,,4.00,,,12.00,
Vitamins and Other Components,,,,0.000,,,,,,,...,,,,,,,,,,0.000
Water,4.50,3.60,36.9,6.100,38.70,,3.14,2.50,,68.50,...,70.40,,,,,60.50,,,53.20,76.950


### **[A]: Solution**

In [118]:
import warnings
import fooddatacentral as fdc

#bmin = diet_min.set_index('Nutrition', inplace= True)

bmin = diet_min
bmin = bmin.drop('Source',axis=1)
#bmax
bmax = pd.read_csv('diet_max.csv')

bmax = bmax.drop('Source',axis=1) #
bmax = bmax.set_index('Nutrition')

#bmin

b = pd.concat([bmin,-bmax]) # Note sign change for max constraints

b


Unnamed: 0_level_0,C 1-3,F 4-8,M 4-8,F 9-13,M 9-13,F 14-18,M 14-18,F 19-30,M 19-30,F 31-50,M 31-50,F 51+,M 51+
Nutrition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Energy,1000.0,1200.0,1400.0,1600.0,1800.0,1800.0,2200.0,2000.0,2400.0,1800.0,2200.0,1600.0,2000.0
Protein,13.0,19.0,19.0,34.0,34.0,46.0,52.0,46.0,56.0,46.0,56.0,46.0,56.0
"Fiber, total dietary",14.0,16.8,19.6,22.4,25.2,25.2,30.8,28.0,33.6,25.2,30.8,22.4,28.0
"Folate, DFE",150.0,200.0,200.0,300.0,300.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0
"Calcium, Ca",700.0,1000.0,1000.0,1300.0,1300.0,1300.0,1300.0,1000.0,1000.0,1000.0,1000.0,1200.0,1000.0
"Carbohydrate, by difference",130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0
"Iron, Fe",7.0,10.0,10.0,8.0,8.0,15.0,11.0,18.0,8.0,18.0,8.0,8.0,8.0
"Magnesium, Mg",80.0,130.0,130.0,240.0,240.0,360.0,410.0,310.0,400.0,320.0,420.0,320.0,420.0
Niacin,6.0,8.0,8.0,12.0,12.0,14.0,16.0,14.0,16.0,14.0,16.0,14.0,16.0
"Phosphorus, P",460.0,500.0,500.0,1250.0,1250.0,1250.0,1250.0,700.0,700.0,700.0,700.0,700.0,700.0


In [134]:
from  scipy.optimize import linprog as lp
import numpy as np

def get_grub(sex_age_group, diet, df):
    if (diet == 'vegan') or (diet == 'plant-based'):
        df = df[df['animal product'] == 'plant-based']
    D = {}
    count = 0
    
    
    for food in df.index:
        try:
            FDC = df.loc[df.index==food,:]['FDC ID'][0]
            count+=1
            D[food] = fdc.nutrients(apikey,FDC).Quantity
            #print(D[food])
            #print(food)
        except AttributeError:
            warnings.warn(f"Couldn't find FDC Code {FDC} for food {food}.")
    
    D = pd.DataFrame(D,dtype=float)
    df.dropna(how='any') # Drop food with any missing data

    Prices = df.groupby('Food')['Average Price per 100g (USD)'].min()
    p = Prices.apply(lambda x:x).dropna()
    
    # Compile list that we have both prices and nutritional info for; drop if either missing
    use = p.index.intersection(D.columns)
    p = p[use]
    tol = 1e-6 # Numbers in solution smaller than this (in absolute value) treated as zeros

    Aall = D[p.index].fillna(0)

    # Drop rows of A that we don't have constraints for.
    Amin = Aall.loc[bmin.index]
    
    Amax = Aall.loc[bmax.index]
    
    # Maximum requirements involve multiplying constraint by -1 to make <=.
    A = pd.concat([Amin,-Amax])
    ## Choose sex/age group!
    result =  lp(p, -A, -b[sex_age_group], method='highs')
    print(f"Cost of diet for {group} is ${result.fun:.2f} per day.")
    # Put back into nice series
    diet = pd.Series(result.x,index=p.index)
    
    print("\nYou'll be eating (in 100s of grams or milliliters):")
    print(diet[diet >= tol])  # Drop items with quantities less than precision of calculation.
    return diet

group = "F 19-30"
diet = 'not vegan'
#result = get_grub(group, diet, df)

solution = get_grub(group, diet, df)


Cost of diet for F 19-30 is $3.30 per day.

You'll be eating (in 100s of grams or milliliters):
CARROT           1.285283
COW MILK         3.514002
COWPEA           1.668872
LETTUCE          0.503611
OAT              0.763807
POTATO           3.113865
SOY BURGER       1.194030
SOYBEAN          1.389350
SUNFLOWER OIL    0.189099
dtype: float64


  FDC = df.loc[df.index==food,:]['FDC ID'][0]


### **[B]: Is your solution edible?**

...

### **[B]: What is total cost for population of interest?**

In [31]:
# Import wbdata
# Code function for total cost

### **[C]: Sensitivity of Solution**

In [30]:
# Code here