---
title: Are Americans getting less helathier? 
---

*By: Esther Aduamah and Brandon Spiller*


## Introduction

Americans are known for having an extremely unhealthy diet. In our Project We 

What is the question you are trying to answer or the story that you are trying you tell?
Why is this question or story important?
What were the main steps your project made towards answering the question or telling the story?

[Source](https://www.uab.edu/inquiro/issues/past-issues/volume-9/the-effects-of-an-american-diet-on-health)

## Methodology
The United States Department of Agriculture keeps an extensive database of information about food. The information we used is all hosted by [USDA](https://www.usda.gov/). We used following two datasets: Public-Use Data Files and Codebooks from the FoodAPS National Household Food Acquisition and Purchase Survey [https://www.ers.usda.gov/data-products/foodaps-national-household-food-acquisition-and-purchase-survey] as well as the FoodData Central (FDC) [https://fdc.nal.usda.gov/]. 

The **FoodData Central** were accsessed programatically through the USDA's API. This data was stored as a JSON file.

**FoodAPS National Household Food Acquisition and Purchase Survey** was downloaded from the USDA website as a CSV file and read into our code as a Pandas Dataframe. 

Below is the specific data that we used from each of the datasets:

- **FoodAPS National Household Food Acquisition and Purchase Survey**
    - nutrient breakdown in each food item (faps_fahnutrients)
    - household groceries and food items they bought (faps_fahitem_puf)
    - each groceries shopping date and details (faps_fahevent_puf)
- **FoodData Central**
    - specific nutrient data (specifically the amount of each nutrient within each food) for many different foods, ingredients, and meals

We got and stored the data from **FoodAPS National Household Food Acquisition and Purchase Survey** using Pandas. First by loading the three CSV files from FoodAPS into three seperate dataframes. Next, we used merge to link together the food purchase data with the nutrient inforation and when the household bought the food. After we used a while loop to go through event, see what month it is in and 
We got and stored the data from **FoodData Central** by populating a JSON file with the data we received by going through the USDA FoodData Central's API.


## Results
Our first v

In [5]:
%load_ext autoreload

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from perhaps_interest import merge_items_with_nutrients

fah_items = pd.read_csv("csv_data/faps_fahitem_puf.csv", encoding="latin-1")
fah_nutrients = pd.read_csv(
    "csv_data/faps_fahnutrients.csv", encoding="latin-1"
)
fahevent = pd.read_csv("csv_data/faps_fahevent_puf.csv", encoding="latin-1")

merged_data = merge_items_with_nutrients(fah_items, fah_nutrients, fahevent)

# Create 'month' column and group by month
merged_data["month"] = merged_data["date"].dt.to_period("M")
merged_data = merged_data.sort_values(by="month").reset_index(drop=True)  # Sorting by month in asc order
nutrients_month_total = pd.DataFrame(columns=["protein", "add_sugars", "carb", "totfat"])  # Making a new df for month aggregate

i = 0
months = list(merged_data["month"].unique())
months.sort()
for month in months:  # Cycling through each month
    one_month_total = [0, 0, 0, 0]  # Each number represent the aggregate of a month for the four columns respectively
    while (i < merged_data.shape[0]) and (merged_data.loc[i, "month"] == month):  # While we are still in this month
        if not pd.isna(merged_data.loc[i, "protein"]):  # Checking for NaN because it messes up +=
            one_month_total[0] += merged_data.loc[i, "protein"]
        if not pd.isna(merged_data.loc[i, "add_sugars"]):
            one_month_total[1] += merged_data.loc[i, "add_sugars"]
        if not pd.isna(merged_data.loc[i, "carb"]):
            one_month_total[2] += merged_data.loc[i, "carb"]
        if not pd.isna(merged_data.loc[i, "totfat"]):
            one_month_total[3] += merged_data.loc[i, "totfat"]
        i += 1
    nutrients_month_total.loc[month, "protein"] = one_month_total[0]
    nutrients_month_total.loc[month, "add_sugars"] = one_month_total[1]
    nutrients_month_total.loc[month, "carb"] = one_month_total[2]
    nutrients_month_total.loc[month, "totfat"] = one_month_total[3]

# Plot nutrient trends
plt.figure(figsize=(12, 6))

str_month = []  # Stringifying months for plotting purposes
for month in months:
    str_month.append(str(month))

for nutrient in ["protein", "add_sugars", "carb", "totfat"]:
    plt.plot(
        str_month, nutrients_month_total[nutrient], label=nutrient
    )

plt.title("Monthly Nutrient Consumption from Food-at-Home Purchases")
plt.xlabel("Month")
plt.ylabel("Total Nutrient Amount")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

FileNotFoundError: [Errno 2] No such file or directory: 'csv_data/faps_fahitem_puf.csv'

In [3]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from food_aps_mani import get_household_totals
from food_aps_mani import split_dv_levels 

df = pd.read_csv("csv_data/faps_fahnutrients.csv", encoding="latin-1")

# Daily FDA values for 2000 calorie diet (grams)
daily_values_dict = {"protein": 50, "totfat": 78, "carb": 275, "totsug": 50}

nutrient_labels = {
    "protein": "Protein",
    "totfat": "Fat",
    "carb": "Carbohydrates",
    "totsug": "Sugars",
}

nutrients = list(daily_values_dict.keys())

household_totals_df = get_household_totals(df, nutrients)
avg_grams_consumed = household_totals_df[nutrients].mean()

percent_dv_consumed = {
    nutrient: (avg_grams_consumed[nutrient] / dv) * 100
    for nutrient, dv in daily_values_dict.items()
}

labels, under_dv, over_dv = split_dv_levels(percent_dv_consumed, nutrient_labels)

# Plotting daily values
fig, ax = plt.subplots(figsize=(10, 6))

ax.bar(labels, under_dv, label="Consumed (within DV)", color="blue")
ax.bar(
    labels, over_dv, bottom=under_dv, label="Consumed (over DV)", color="purple"
)

# 100% DV Line
ax.axhline(100, color="red", linestyle="--", linewidth=1.2, label="100% DV")

ax.set_ylabel("Percent of Daily Value (%)")
ax.set_title("Average Nutrient Intake as % of FDA Daily Value")
ax.legend()
ax.grid(True)
plt.tight_layout()  # helps to avoid overlap
plt.show()

FileNotFoundError: [Errno 2] No such file or directory: 'csv_data/faps_fahnutrients.csv'

In [4]:
import json
import matplotlib.pyplot as plt
from datamanipulation import get_food_details

with open("food_nutrition.json", encoding="utf-8") as file:
    data_string = file.read()
    data = json.loads(data_string)
# Preset the keys to know what the food is initailly

# Top american foods according to google
american_food_keys = [
    "Fast Food, Pizza",
    "Fried Chicken",
    "Hamburger",
    "Hot Dog",
    "Grilled Cheese Sandwich",
    "Spaghetti with Meatballs",
    "Taco",
    "Caesar Salad",
    "Pancakes",
]
# preseting the intial food in the dictionary
matched_food = {key: None for key in american_food_keys}

for food in data:
    description = food.get("description", "").lower()

    for american_food in american_food_keys:
        if american_food.lower() in description:
            # print(f"Matched '{american_food}' with USDA item: {description}")
            if matched_food[american_food] is not None:
                pass  # Do nothing if the condition is false
            elif matched_food[american_food] is None:
                matched_food[american_food] = food
            else:
                pass

food_dict = {}
for food_name, usda_item in matched_food.items():
    if usda_item:  # Ensure the item is not None
        food_dict[food_name] = get_food_details(usda_item)
    else:
        food_dict[food_name] = [0] * 7


# this code gives us the total nutrients of the top 10 most eaten foods
total_nutrients = []
list_len = [0, 1, 2, 3, 4, 5, 6]
for i in list_len:
    total_one_nutrient = sum(values[i] for values in food_dict.values())
    total_nutrients.append(total_one_nutrient)

# order is proteins, fats, carbs, sugars, fibers, sodium, and water
nrf_score_list = []
for food, values in food_dict.items():
    nrf_score = ((values[0] / 50) + (values[4] / 28) + (values[6] / 500)) - (
        (values[1] / 20)
        + (values[3] / 50)
        + (values[5] / 2000)
        + (values[2] / 275)
    )
    nrf_score_list.append(nrf_score)

plt.bar(american_food_keys, nrf_score_list)
plt.title("NRF Score of the 10 most common foods in America")
plt.xlabel("Foods")
plt.ylabel("NRF Score")
plt.tight_layout
plt.show()



[101.77, 39.36, 158.6, 32.99, 13.299999999999999, 4727, 438.6]


  plt.show()
