# **`Project 2: Team Thomas Allinson`**

### **Objective**: Analyze the comparative costs of a vegan diet versus an omnivorous diet within the American population, with a specific focus on their environmental impact.

#### Group Members:
> Johann: johann.dicken@berkeley.edu <br>
> Laure: laureho@berkeley.edu <br>
> Reily: reilyjean@berkeley.edu <br>
> Carmen: carmenvega@berkeley.edu <br>
> Steven: k1519632@berkeley.edu <br>

### **[A]: Description of population of interest**

...descripition here...

### **[A]: Dietary Reference Intakes**

In [2]:
import pandas as pd
import numpy as np

In [3]:
# Import Dietary Requirements spreadsheet data as a pd.DataFrame
df = pd.read_csv('Dietary_Requirements.csv')
df.head()

Unnamed: 0,Nutrition,Source,C 1-3,F 4-8,M 4-8,F 9-13,M 9-13,F 14-18,M 14-18,F 19-30,M 19-30,F 31-50,M 31-50,F 51+,M 51+
0,Energy,---,1000.0,1200.0,1400.0,1600.0,1800.0,1800.0,2200.0,2000.0,2400.0,1800.0,2200.0,1600.0,2000.0
1,Protein,RDA,13.0,19.0,19.0,34.0,34.0,46.0,52.0,46.0,56.0,46.0,56.0,46.0,56.0
2,"Fiber, total dietary",---,14.0,16.8,19.6,22.4,25.2,25.2,30.8,28.0,33.6,25.2,30.8,22.4,28.0
3,"Folate, DFE",RDA,150.0,200.0,200.0,300.0,300.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0
4,"Calcium, Ca",RDA,700.0,1000.0,1000.0,1300.0,1300.0,1300.0,1300.0,1000.0,1000.0,1000.0,1000.0,1200.0,1000.0


Dietary function takes 2 arguments: `age`, a positive integer, and `sex`, a string (not case-senstitive) with the classification of male, female, or child.

In [4]:
def dietary_ref(age, sex):

    # Validate age input
    if not isinstance(age, int) or age <= 0:
        return "Incorrect age input. Please enter a positive integer for the age."
    
    # Normalize and validate sex input
    sex = sex.lower()
    if sex not in ['male', 'female', 'child']:
        return "Incorrect sex input. Input must be Male, Female, or Child."
    
    # Determine the appropriate column based on age and sex
    if sex == 'child':
        if age <= 3:
            col_name = 'C 1-3'
        elif age <= 8:
            col_name = 'C 4-8'
        else:
            return "Age out of range for child category."
    else:
        if age <= 8:
            col_name = f"{'F' if sex == 'female' else 'M'} 4-8"
        elif age <= 13:
            col_name = f"{'F' if sex == 'female' else 'M'} 9-13"
        elif age <= 18:
            col_name = f"{'F' if sex == 'female' else 'M'} 14-18"
        elif age <= 30:
            col_name = f"{'F' if sex == 'female' else 'M'} 19-30"
        elif age <= 50:
            col_name = f"{'F' if sex == 'female' else 'M'} 31-50"
        else:
            col_name = f"{'F' if sex == 'female' else 'M'} 51+"
    
    # Extract and return the relevant nutrient recommendations
    if col_name in df.columns:
        return df[['Nutrition', col_name]].set_index('Nutrition')[col_name]
    else:
        return "Matching column not found in DataFrame. Check the column names."

In [5]:
# Example usage
dietary_ref(15, 'Male')

Nutrition
Energy                            2200.0
Protein                             52.0
Fiber, total dietary                30.8
Folate, DFE                        400.0
Calcium, Ca                       1300.0
Carbohydrate, by difference        130.0
Iron, Fe                            11.0
Magnesium, Mg                      410.0
Niacin                              16.0
Phosphorus, P                     1250.0
Potassium, K                      4700.0
Riboflavin                           1.3
Thiamin                              1.2
Vitamin A, RAE                     900.0
Vitamin B-12                         2.4
Vitamin B-6                          1.3
Vitamin C, total ascorbic acid      75.0
Vitamin E (alpha-tocopherol)        15.0
Vitamin K (phylloquinone)           75.0
Zinc, Zn                            11.0
Name: M 14-18, dtype: float64

### **[A]: Data on prices for different foods**

Let's import our google spreadsheet as a pd.DataFrame here!

In [6]:
# prices_df = pd.read_csv('file_name.csv')
# prices_df

In [13]:

import re

# Load the CSV file into a DataFrame
df = pd.read_csv('min_cost_data_nutrients.csv')

# Define a regex pattern for common animal products
animal_product_pattern = r'\b(butter|cheese|milk|kefir|whey|eggnog|beef|chicken|pork|egg|fish|lamb|yogurt|honey|gelatin|cream|lard|sausage|anchovy|shellfish|shrimp|mayo|ham|meat)\b'

# Create a new column 'animal product' that marks items based on the pattern
df['animal product'] = df['Ingredient description'].apply(
    lambda x: 'animal product' if re.search(animal_product_pattern, str(x), re.IGNORECASE) else 'plant-based'
)

# Display the updated DataFrame
df[df['animal product'] == 'plant-based'].head(30)


Unnamed: 0,ingred_code,Ingredient description,Capric acid,Lauric acid,Myristic acid,Palmitic acid,Palmitoleic acid,Stearic acid,Oleic acid,Linoleic Acid,...,"Vitamin B-12, added",Vitamin B6,Vitamin C,Vitamin D,Vitamin E,"Vitamin E, added",Vitamin K,Water,Zinc,animal product
50,1073,"Dessert topping, semi solid, frozen",0.905,8.836,3.756,3.092,0.241,4.58,1.375,0.305,...,0.0,0.0,0.0,0.0,0.96,0.0,6.3,50.21,0.03,plant-based
114,1225,Dulce de Leche,0.207,0.211,0.75,2.007,0.163,0.805,1.895,0.288,...,0.0,0.016,2.6,0.2,0.2,0.0,1.3,28.71,0.79,plant-based
128,1250,Nutritional supplement for people with diabete...,0.003,0.0,0.0,0.15,0.003,0.059,2.306,0.392,...,0.66,0.22,26.4,1.1,1.46,1.46,8.8,79.74,1.65,plant-based
157,2003,"Spices, basil, dried",0.0,0.0,0.046,1.036,0.171,1.075,1.067,0.199,...,0.0,1.34,0.8,0.0,10.7,0.0,1714.5,10.35,7.1,plant-based
158,2005,"Spices, caraway seed",0.01,0.01,0.04,0.4,0.09,0.11,7.035,3.122,...,0.0,0.36,21.0,0.0,2.5,0.0,0.0,9.87,5.5,plant-based
159,2007,"Spices, celery seed",0.02,0.02,0.02,1.29,0.24,0.39,15.45,3.52,...,0.0,0.89,17.1,0.0,1.07,0.0,0.0,6.04,6.93,plant-based
160,2009,"Spices, chili powder",0.013,0.081,0.189,1.619,0.082,0.396,3.116,7.473,...,0.0,2.094,0.7,0.0,38.14,0.0,105.7,10.75,4.3,plant-based
161,2010,"Spices, cinnamon, ground",0.003,0.006,0.009,0.104,0.001,0.082,0.246,0.044,...,0.0,0.158,3.8,0.0,2.32,0.0,31.2,10.58,1.83,plant-based
162,2011,"Spices, cloves, ground",0.131,0.035,0.263,1.858,0.026,0.683,0.99,2.657,...,0.0,0.391,0.2,0.0,8.82,0.0,141.8,9.87,2.32,plant-based
163,2012,"Spices, coriander leaf, dried",0.0,0.0,0.003,0.096,0.016,0.005,2.216,0.328,...,0.0,0.61,566.7,0.0,1.03,0.0,1359.5,7.3,4.72,plant-based


### **[A]: Nutritional content of different foods**

...

### **[A]: Solution**

I think it'd be cool to make a graph for this :) For example, an overlying bar graph with different colors for sex, going across the x-axis with ages, y-axis being minimum diet cost.

In [29]:
# Code here

### **[B]: Is your solution edible?**

...

### **[B]: What is total cost for population of interest?**

In [31]:
# Import wbdata
# Code function for total cost

### **[C]: Sensitivity of Solution**

In [30]:
# Code here