# **`Project 2: Team Thomas Allinson`**

### **Objective**: Analyze the comparative costs of a vegan diet versus an omnivorous diet within the American population, with a specific focus on their environmental impact.

#### Group Members:
> Johann: johann.dicken@berkeley.edu <br>
> Laure: laureho@berkeley.edu <br>
> Reily: reilyjean@berkeley.edu <br>
> Carmen: carmenvega@berkeley.edu <br>
> Steven: k1519632@berkeley.edu <br>

### **[A]: Description of population of interest**

...descripition here...

### **[A]: Dietary Reference Intakes**

In [6]:
import pandas as pd
import numpy as np

In [7]:
# Import Dietary Requirements spreadsheet data as a pd.DataFrame
df = pd.read_csv('Dietary_Requirements.csv')
df.head()

Unnamed: 0,Nutrition,Source,C 1-3,F 4-8,M 4-8,F 9-13,M 9-13,F 14-18,M 14-18,F 19-30,M 19-30,F 31-50,M 31-50,F 51+,M 51+
0,Energy,---,1000.0,1200.0,1400.0,1600.0,1800.0,1800.0,2200.0,2000.0,2400.0,1800.0,2200.0,1600.0,2000.0
1,Protein,RDA,13.0,19.0,19.0,34.0,34.0,46.0,52.0,46.0,56.0,46.0,56.0,46.0,56.0
2,"Fiber, total dietary",---,14.0,16.8,19.6,22.4,25.2,25.2,30.8,28.0,33.6,25.2,30.8,22.4,28.0
3,"Folate, DFE",RDA,150.0,200.0,200.0,300.0,300.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0
4,"Calcium, Ca",RDA,700.0,1000.0,1000.0,1300.0,1300.0,1300.0,1300.0,1000.0,1000.0,1000.0,1000.0,1200.0,1000.0


Dietary function takes 2 arguments: `age`, a positive integer, and `sex`, a string (not case-senstitive) with the classification of male, female, or child.

In [8]:
def dietary_ref(age, sex):

    # Validate age input
    if not isinstance(age, int) or age <= 0:
        return "Incorrect age input. Please enter a positive integer for the age."
    
    # Normalize and validate sex input
    sex = sex.lower()
    if sex not in ['male', 'female', 'child']:
        return "Incorrect sex input. Input must be Male, Female, or Child."
    
    # Determine the appropriate column based on age and sex
    if sex == 'child':
        if age <= 3:
            col_name = 'C 1-3'
        elif age <= 8:
            col_name = 'C 4-8'
        else:
            return "Age out of range for child category."
    else:
        if age <= 8:
            col_name = f"{'F' if sex == 'female' else 'M'} 4-8"
        elif age <= 13:
            col_name = f"{'F' if sex == 'female' else 'M'} 9-13"
        elif age <= 18:
            col_name = f"{'F' if sex == 'female' else 'M'} 14-18"
        elif age <= 30:
            col_name = f"{'F' if sex == 'female' else 'M'} 19-30"
        elif age <= 50:
            col_name = f"{'F' if sex == 'female' else 'M'} 31-50"
        else:
            col_name = f"{'F' if sex == 'female' else 'M'} 51+"
    
    # Extract and return the relevant nutrient recommendations
    if col_name in df.columns:
        return df[['Nutrition', col_name]].set_index('Nutrition')[col_name]
    else:
        return "Matching column not found in DataFrame. Check the column names."

In [9]:
# Example usage
dietary_ref(15, 'Male')

Nutrition
Energy                            2200.0
Protein                             52.0
Fiber, total dietary                30.8
Folate, DFE                        400.0
Calcium, Ca                       1300.0
Carbohydrate, by difference        130.0
Iron, Fe                            11.0
Magnesium, Mg                      410.0
Niacin                              16.0
Phosphorus, P                     1250.0
Potassium, K                      4700.0
Riboflavin                           1.3
Thiamin                              1.2
Vitamin A, RAE                     900.0
Vitamin B-12                         2.4
Vitamin B-6                          1.3
Vitamin C, total ascorbic acid      75.0
Vitamin E (alpha-tocopherol)        15.0
Vitamin K (phylloquinone)           75.0
Zinc, Zn                            11.0
Name: M 14-18, dtype: float64

### **[A]: Data on prices for different foods**

Let's import our google spreadsheet as a pd.DataFrame here!

In [11]:
# prices_df = pd.read_csv('file_name.csv')
# prices_df
apikey = "KNqUDtV7Kcktiuheo3EoNhB0zDlCevFAdqZrKgdj" 
%pip install -r requirements.txt --upgrade
import fooddatacentral as fdc

Collecting Pint>=0.8.1 (from -r requirements.txt (line 2))
  Using cached Pint-0.24.4-py3-none-any.whl.metadata (8.5 kB)
Collecting scipy>=1.1.0 (from -r requirements.txt (line 18))
  Using cached scipy-1.15.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting gspread (from -r requirements.txt (line 20))
  Using cached gspread-6.2.0-py3-none-any.whl.metadata (11 kB)
Collecting gspread_pandas (from -r requirements.txt (line 22))
  Using cached gspread_pandas-3.3.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting bottleneck>=1.3.6 (from -r requirements.txt (line 24))
  Using cached Bottleneck-1.4.2-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting eep153_tools (from -r requirements.txt (line 26))
  Using cached eep153_tools-0.12.4-py2.py3-none-any.whl.metadata (363 bytes)
Collecting fooddatacentral (from -r requirements.txt (line 28))
  Using cached fooddatacentral-1.0.10-py3-

In [12]:

import re

# Load the CSV file into a DataFrame
df = pd.read_csv('food_and_prices.csv')

# Define a regex pattern for common animal products
animal_product_pattern = r'\b(butter|cheese|milk|kefir|whey|eggnog|beef|chicken|dulce|pork|egg|fish|lamb|turkey|turtle|breast|mollusks|frog|thigh|yogurt|honey|gelatin|cream|lard|sausage|anchovy|shellfish|shrimp|mayo|ham|meat)\b'

# Create a new column 'animal product' that marks items based on the pattern
df['animal product'] = df['Food commodity ITEM'].apply(
    lambda x: 'animal product' if re.search(animal_product_pattern, str(x), re.IGNORECASE) else 'plant-based'
)

# Display the updated DataFrame
df[df['animal product'] == 'plant-based'].head(100)


Unnamed: 0,Food commodity ITEM,Carbon Footprint kg CO2eq/kg or l of food ITEM,Water Footprint liters water/kg o liter of food ITEM,FDC ID,FDC Food Name,Average Price per kg (USD),animal product
1,SIMPLE COOKIES**,1.39,1723.0,2707964,"Cookie, shortbread",10.80,plant-based
2,BREAD MULTICEREAL**,0.70,771.0,2707777,"Bread, multigrain",4.24,plant-based
3,BREAD PLAIN**,0.89,1031.0,174929,"Bread, sticks, plain",4.24,plant-based
4,BREAD WHOLE**,0.77,887.0,2707709,"Bread, whole wheat",4.24,plant-based
5,FLAVORED CRACKERS**,0.93,1378.0,2055556,FLAVORED CRACKERS,8.80,plant-based
...,...,...,...,...,...,...,...
107,LETTUCE,0.41,237.0,2709789,"Lettuce, raw",1.50,plant-based
108,SPINACH,0.34,292.0,1905313,SPINACH,2.20,plant-based
109,CARROT,0.24,195.0,2709660,"Carrots, raw",1.50,plant-based
110,GARLIC,0.71,589.0,1662203,GARLIC,10.00,plant-based


In [None]:
#Building A
import warnings
import fooddatacentral as fdc
id = 2705854
fdc.nutrients(apikey,fdc_id=id)
D = {}
count = 0
for food in diet_mins.index:
    try:
        FDC = diet_mins.loc[diet_mins.index==food,:].FDC.values[0]
        count+=1
        D[food] = fdc.nutrients(apikey,FDC).Quantity
        print(food)
    except AttributeError:
        warnings.warn(f"Couldn't find FDC Code {FDC} for food {food}.")

D = pd.DataFrame(D,dtype=float)

D

bmin = diet_mins['diet_minimums'].set_index('Nutrition')

# Drop string describing source
bmin = bmin.drop('Source',axis=1)

bmin

#bmax = diet_mins['diet_maximums'].set_index('Nutrition')

# Drop string describing source
#bmax = bmax.drop('Source',axis=1)

#bmax

### **[A]: Nutritional content of different foods**

...

### **[A]: Solution**

I think it'd be cool to make a graph for this :) For example, an overlying bar graph with different colors for sex, going across the x-axis with ages, y-axis being minimum diet cost.

In [29]:
# Code here

### **[B]: Is your solution edible?**

...

### **[B]: What is total cost for population of interest?**

In [31]:
# Import wbdata
# Code function for total cost

### **[C]: Sensitivity of Solution**

In [30]:
# Code here