# Introduction

This project involves a comprehensive analysis of over 350 food items from Stony Brook University's Dining Hall, with data meticulously gathered from the university's dining hall website and organized into an Excel spreadsheet. Using Python and relevant libraries, this analysis will extract and examine data from the foods.csv file to offer insights on how students can make informed, healthy dietary choices that align with their nutritional goals. The project will also include the generation of statistical information and visual representations, highlighting the health benefits and drawbacks of various foods available at the dining hall.

Important Note: Each food's nutritional value listed is based on a single serving.

## Food and Nutrition

Food and nutrition are vital for the body’s overall function, providing the energy and nutrients needed for growth, maintenance, and disease prevention. Macronutrients like carbohydrates, fats, and proteins are essential for fueling bodily processes, building and repairing tissues, and supporting brain health. Calories derived from these nutrients supply the energy necessary for every activity, from basic metabolic functions to complex cognitive tasks. Fats, including cholesterol, play a crucial role in hormone production, brain development, and protecting vital organs, while sodium helps regulate fluid balance, nerve function, and muscle contractions. Fiber, found in carbohydrates, aids digestion, supports a healthy gut, and helps prevent conditions like heart disease and diabetes.

Balanced nutrition is key to preventing chronic diseases such as heart disease, diabetes, and obesity. Additionally, mindful eating—paying attention to what and how much we consume—promotes a healthy gut and mental well-being, which is particularly important for students engaged in rigorous academic studies. By understanding and practicing mindful eating, individuals can enhance both their physical health and peace of mind, leading to better overall performance in daily life.

## Importing Necessary Libraries

In [15]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Cleaning data

Creating a new variable to store the data from the foods CSV file.

In [16]:
food = pd.read_csv("foods.csv")
food

Unnamed: 0,Name,Calories,Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrate (g),Fiber (g),Protein (g),Type
0,Scrambled eggs with cream and butter,670,19.0,582,460,1,0,19,Breakfast
1,Scrambled egg whites,70,0.0,0,220,1,0,14,Breakfast
2,Cheesy tofu fajita scramble,180,12.0,2,330,6,4,14,Breakfast
3,Pork bacon,60,4.5,13,220,0,0,4,Breakfast
4,Asparagus cheddar frittata,240,17.0,271,510,6,2,16,Breakfast
...,...,...,...,...,...,...,...,...,...
386,Chicken noodle soup,250,3.0,56,1500,39,4,16,Lunch/Dinner
387,Broccoli and cheddar cheese soup,220,17.0,41,720,13,1,5,Lunch/Dinner
388,Tomato basil bisque soup,160,7.0,25,250,20,1,4,Lunch/Dinner
389,Fresh herb croutons soup,110,6.0,0,160,12,1,2,Lunch/Dinner


Performing a quick overview of the data to ensure values are displayed correctly from top to bottom.

In [17]:
food.head()

Unnamed: 0,Name,Calories,Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrate (g),Fiber (g),Protein (g),Type
0,Scrambled eggs with cream and butter,670,19.0,582,460,1,0,19,Breakfast
1,Scrambled egg whites,70,0.0,0,220,1,0,14,Breakfast
2,Cheesy tofu fajita scramble,180,12.0,2,330,6,4,14,Breakfast
3,Pork bacon,60,4.5,13,220,0,0,4,Breakfast
4,Asparagus cheddar frittata,240,17.0,271,510,6,2,16,Breakfast


In [18]:
food.tail()

Unnamed: 0,Name,Calories,Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrate (g),Fiber (g),Protein (g),Type
386,Chicken noodle soup,250,3.0,56,1500,39,4,16,Lunch/Dinner
387,Broccoli and cheddar cheese soup,220,17.0,41,720,13,1,5,Lunch/Dinner
388,Tomato basil bisque soup,160,7.0,25,250,20,1,4,Lunch/Dinner
389,Fresh herb croutons soup,110,6.0,0,160,12,1,2,Lunch/Dinner
390,Cajun chicken penne,420,16.0,22,1100,51,4,18,Lunch/Dinner


Fetching the initial dimensions of the data in terms of rows/columns and grasping general information about the DataFrame.

In [19]:
food.shape
food.info

<bound method DataFrame.info of                                      Name  Calories  Fat (g) Cholesterol (mg)  \
0    Scrambled eggs with cream and butter       670     19.0              582   
1                    Scrambled egg whites        70      0.0                0   
2             Cheesy tofu fajita scramble       180     12.0                2   
3                              Pork bacon        60      4.5               13   
4              Asparagus cheddar frittata       240     17.0              271   
..                                    ...       ...      ...              ...   
386                   Chicken noodle soup       250      3.0               56   
387      Broccoli and cheddar cheese soup       220     17.0               41   
388              Tomato basil bisque soup       160      7.0               25   
389              Fresh herb croutons soup       110      6.0                0   
390                   Cajun chicken penne       420     16.0               22

276 rows of food entries and 5 columns of categories (name, calories, carbs, protein, type).

Replacing any 'n/a' values with 0.

In [20]:
food = food.replace("n/a", 0)

Converting the datatypes of columns such as calories, fat, cholesterol, sodium, carbohydrates, fiber, and protein to numeric integers.

In [21]:
food["Calories"] = pd.to_numeric(food["Calories"])
food["Fat (g)"] = pd.to_numeric(food["Fat (g)"])
food["Cholesterol (mg)"] = pd.to_numeric(food["Cholesterol (mg)"])
food["Sodium (mg)"] = pd.to_numeric(food["Sodium (mg)"])
food["Carbohydrate (g)"] = pd.to_numeric(food["Carbohydrate (g)"])
food["Fiber (g)"] = pd.to_numeric(food["Fiber (g)"])
food["Protein (g)"] = pd.to_numeric(food["Protein (g)"])
food.dtypes

ValueError: Unable to parse string "Na" at position 21

Assessing data quality to identify any remaining nulls or 'n/a' values.

In [None]:
print(food.isnull().any())
print(food.describe())

print(food[food["Cholesterol (mg)"] == "n/a"])
print(food[food["Fiber (g)"] == "n/a"])
#No more "n/a" values in columns.


Removing rows with null values, reducing the dataset to 362 rows.

In [None]:
food.dropna()

## Data Visualization and Analysis

In [None]:
f, axes = plt.subplots(2, 2, figsize=(10, 10), sharex=True, sharey=True)

s = np.linspace(0, 3, 10)
cmap = sns.cubehelix_palette(start=0.0, light=1, as_cmap=True)

sns.kdeplot(x=food['Carbohydrate (g)'], y=food['Protein (g)'], cmap=cmap, fill=True, ax=axes[0, 0])
axes[0, 0].set(xlim=(-10, 50), ylim=(-30, 70), title='Carbs and Protein')

cmap = sns.cubehelix_palette(start=0.25, light=1, as_cmap=True)

sns.kdeplot(x=food['Fat (g)'], y=food['Carbohydrate (g)'], fill=True, ax=axes[0, 1])
axes[0, 1].set(xlim=(-10, 50), ylim=(-30, 70), title='Carbs and Fat')

cmap = sns.cubehelix_palette(start=0.45, light=1, as_cmap=True)

sns.kdeplot(x=food['Fiber (g)'], y=food['Fat (g)'], fill=True, ax=axes[1, 0])
axes[1, 0].set(xlim=(-10, 50), ylim=(-30, 70), title='Fiber and Fat')

cmap = sns.cubehelix_palette(start=0.56, light=1, as_cmap=True)

# Access the correct subplot index [1, 1] instead of [1, 2]
sns.kdeplot(x=food['Carbohydrate (g)'], y=food['Calories'], fill=True, ax=axes[1, 1])
axes[1, 1].set(xlim=(-10, 100), ylim=(-30, 70), title='Calories and Carbs')

f.tight_layout()


## Breakdown of Breakfast

Displaying all the different types of breakfast foods served in the SBU dining hall, along with their corresponding nutritional values.

In [None]:
breakfast = food.loc[food["Type"]=="Breakfast"]
breakfast

In [None]:
breakfast_sort1 = breakfast.sort_values("Protein (g)", ascending=False)
breakfast_sort1.head()

Top 5 breakfast foods with the highest amount of protein, in grams.

In [None]:
breakfast_sort1.tail()

Top 5 breakfast foods with the least amount of protein, in grams.

In [None]:
plt.bar(breakfast_sort1["Name"], breakfast_sort1["Protein (g)"], color = "g")
plt.xticks(rotation = 90)
plt.show()

Bar graph representing breakfast foods based on protein level in a descending fashion. Foods like {ham and cheddar scramble, scrambled eggs with cream and butter, and sausage egg and cheese croissant} appears to provide the most amount of protein. While foods like {breakfast potatoes, tater tots, and blueberry compotes} provide the least amount of protein.

In [None]:
breakfast_sort2 = breakfast.sort_values("Carbohydrate (g)", ascending=False)
breakfast_sort2.head()

Top 5 breakfast foods with the most amount of carbs, in grams.

In [None]:
breakfast_sort2.tail()

Top 5 breakfast foods with the least amount of carbs, in grams.

In [None]:
plt.plot(breakfast_sort2["Name"], breakfast_sort2["Carbohydrate (g)"])
plt.xticks(rotation = 90)
plt.show()

Line graph representing breakfast foods based on carb level, shown in a negative slope. Foods like {belgian waffle, blueberry pancakes, and french toast sticks} appear to have the most amount of carbs. While foods like {scrambled eggs with cream and butter, bacon, and chicken breakfast sausage patty} have the least amount of carbs.

In [None]:
breakfast_sort3 = breakfast.sort_values("Calories", ascending=False)
breakfast_sort3.head()

Top 5 breakfast foods with the highest amount of calories.

In [None]:
breakfast_sort3.tail()

Top 5 breakfast foods with the least amount of calories.

In [None]:
plt.scatter(breakfast_sort3["Name"], breakfast_sort3["Calories"])
plt.xticks(rotation = 90)
plt.show()

Scatter plot representing breakfast foods based on calories level, shown to have a negative trend. Foods like {sausage egg and cheese crossiant, belgian waffle, ham and cheddar scramble} appear to possess the most amount of calories. While foods like {breakfast potatoes, country ham, and blueberry compote} possess the least amount of calories.

In [None]:
best_breakfast = food[(food["Type"]=="Breakfast") & (food["Carbohydrate (g)"]<10) & (food["Protein (g)"]>15)]
best_breakfast

A healthy diet normally consist of higher protein consumption, rather than carbs. The top 2 breakfast foods with high protein and low carb amounts is scrambled eggs with cream and butter and ham and cheese cheddar scramble. A healthy breakfast would normally consist an average intake of 300-400 calories, so 1-2 servings of these two food would suffice, as good breakfast options.

In [None]:
lunch_dinner = food.loc[food["Type"]=="Lunch/Dinner"]
lunch_dinner

All the different types of lunch/dinner foods served in SBU dining hall, with their corresponding nutriential values.

In [None]:
ld_sort1 = lunch_dinner.sort_values("Protein (g)", ascending=True)
ld_sort1.tail()

Top 5 lunch/dinner foods with the highest amount of protein, in grams.

In [None]:
ld_sort1.head()

Top 5 lunch/dinner foods with the least amount of proteins, in grams.

In [None]:
ld_sort2 = lunch_dinner.sort_values("Protein (g)", ascending=True).tail(40)
plt.plot(ld_sort2["Name"], ld_sort2["Protein (g)"])
plt.xticks(rotation = 90)
plt.show()

Line graph representing highest 40 lunch/dinner foods based on protein level, shown in a positive slope. Foods like {baked beef and cheese ziti, maple glazed pork, and chicken pot pie with potatoes} appear to have the least amount of proteins, among the top 40 most protein lunch/dinner foods. While foods like {chicken cordon bleu sandwich, grilled buffalo chicken sandwich, and katsu pork cutlet} have the most amount of proteins, among the top 40 most protein lunch/dinner foods.
<br>
Note: Data is subsetted into a smaller chunk, to show the names of all the foods in a clearler form. {original: 140 rows, now: 40 rows}

In [None]:
ld_sort3 = lunch_dinner.sort_values("Protein (g)", ascending=True).head(40)
plt.plot(ld_sort3["Name"], ld_sort3["Protein (g)"])
plt.xticks(rotation = 90)
plt.show()

Line graph representing lowest 40 lunch/dinner foods based on protein level, shown to have a positive (stair-like) slope. Foods like {apple compote, mango pico de gallo, fried platains} appear to have the least amount of proteins, among the top 40 least protein lunch/dinner foods. While foods like {broccoli rice casserole, yellow rice, corn nuggets} have the most amount of proteins, among the top 40 least protein lunch/dinner foods.
<br>
Note: Data is subsetted into a smaller chunk, to show the names of all the foods in a clearler form. {original: 140 rows, now: 40 rows}

In [None]:
ld_sort4 = lunch_dinner.sort_values("Carbohydrate (g)", ascending=True)
ld_sort4.tail()

Top 5 lunch/dinner foods with the highest amount of carbs, in grams.

In [None]:
ld_sort4.head()

Top 5 lunch/dinner foods with the least amount of carbs, in grams.

In [None]:
ld_sort5 = lunch_dinner.sort_values("Carbohydrate (g)", ascending=True).tail(40)
plt.bar(ld_sort5["Name"], ld_sort5["Carbohydrate (g)"])
plt.xticks(rotation = 90)
plt.show()

Bar graph representing highest 40 lunch/dinner foods based on carb level, shown to have a positive trend. Foods like {creamy rigatoni alla vodka, puerto rican beef sancocho, and spinach and bacon alfredo pizza} appear to have the least amount of carbs, among the top 40 most carb lunch/dinner foods. While foods like {asian glazed tofu wrap, teriyaki black bean burger, and spicy black bean burger} have the most amount of carbs, among the top 40 most carb lunch/dinner foods.
<br>
Note: Data is subsetted into a smaller chunk, to show the names of all the foods in a clearler form. {original: 140 rows, now: 40 rows}

In [None]:
ld_sort6 = lunch_dinner.sort_values("Carbohydrate (g)", ascending=True).head(40)
plt.bar(ld_sort6["Name"], ld_sort6["Carbohydrate (g)"])
plt.xticks(rotation = 90)
plt.show()

Bar graph representing lowest 40 lunch/dinner foods based on carb level, shown to have a positive trend. Foods like {buffalo chicken breast, roast beef with Au Jus, mustard and herb pork loin} appear to have the least amount of carbs, among the top 40 least carb lunch/dinner foods. Meanwhile foods like {pork medallions, potato kale soup, and broccoli cheese soup} have the most amount of carbs, among the top 40 least carb lunch/dinner foods.
<br>
Note: Data is subsetted into a smaller chunk, to show the names of all the foods in a clearler form. {original: 140 rows, now: 40 rows}

In [None]:
ld_sort7 = lunch_dinner.sort_values("Calories", ascending=True)
ld_sort7.tail()

Top 5 lunch/dinner foods with the highest amount of calories.

In [None]:
ld_sort7.head()

Top 5 lunch/dinner foods with the least amount of calories.

In [None]:
ld_sort8 = lunch_dinner.sort_values("Calories", ascending=True).tail(40)
plt.scatter(ld_sort8["Name"], ld_sort8["Calories"])
plt.xticks(rotation = 90)
plt.show()

Scatter plot representing highest 40 lunch/dinner foods based on calories level, shown to have a positive trend. Foods like {chicken tikka masala, pepper jack chicken mac & cheese, and baked beef and cheese ziti} appear to have the least amount of calories, among the top 40 most calories lunch/dinner foods. While foods like {korean fried chicken, fried chicken tenders, and grilled buffalo chicken sandwich} have the most amount of calories, among the top 40 most calories lunch/dinner foods. 
<br>
Note: Data is subsetted into a smaller chunk, to show the names of all the foods in a clearler form. {original: 140 rows, now: 40 rows}

In [None]:
ld_sort9 = lunch_dinner.sort_values("Calories", ascending=True).head(40)
plt.scatter(ld_sort9["Name"], ld_sort9["Calories"])
plt.xticks(rotation = 90)
plt.show()

Scatter plot representing lowest 40 lunch/dinner foods based on calories level, shown to have a positive trend. Foods like {steamed broccoli, mango pico de gallo, and greenbean saute} appear to have the least amount of calories, among the top 40 least calories lunch/dinner foods. While foods like {pepper and onion piza, jasmine rice, and dijon salmon} have the most amount of calories, among the top 40 least calories lunch/dinner foods. 
<br>
Note: Data is subsetted into a smaller chunk, to show the names of all the foods in a clearler form. {original: 140 rows, now: 40 rows}

In [None]:
best_ld = food[(food["Type"]=="Lunch/Dinner") & (food["Carbohydrate (g)"]<30) & (food["Protein (g)"]>35)]
best_ld

A healthy diet normally consist of higher protein consumption, rather than carbs. The top 2 lunch/dinner foods with high protein and low carb amounts is roast beef with Au Jus and mustard and herb pork loin. A healthy lunch/dinner would normally consist an average intake of 500-700 calories each, so 2-3 servings of these two food would suffice, as good lunch/dinner options.

In [None]:
dessert = food.loc[food["Type"]=="Dessert"]
dessert

All the different types of dessert foods served in SBU dining hall, with their corresponding nutriential values.

In [None]:
dessert_sort1 = dessert.sort_values("Protein (g)", ascending=False)
dessert_sort1.head()

Top 5 desserts with the highest amount of protein, in grams.

In [None]:
dessert_sort1.tail()

Top 5 dessert with the least amount of protein, in grams

In [None]:
plt.bar(dessert_sort1["Name"], dessert_sort1["Protein (g)"], color = "r")
plt.xticks(rotation = 90)
plt.show()

Bar graph representing desserts based on protein level, in a descending fashion. Foods like {plain english muffin, thomas english muffin, and sesame mini bagel} appears to provide the most amount of protein. While foods like {blueberry scone, cranberry scone, and apple turnover danish mini} provide the least amount of protein.

In [None]:
dessert_sort2 = dessert.sort_values("Carbohydrate (g)", ascending=False)
dessert_sort2.head()

Top 5 desserts with the highest amount of carbs, in grams.

In [None]:
dessert_sort2.tail()

Top 5 desserts with the least amount of carbs, in grams

In [None]:
plt.scatter(dessert_sort2["Name"], dessert_sort2["Carbohydrate (g)"])
plt.xticks(rotation = 90)
plt.show()

Scatter plot representing desserts based on carb level, shown in a negative trend. Foods like {blondie, cranberry mini muffin, chocolate chip mini muffin} appears to provide the most amount of carbs. While foods like {blueberry scone, cinnamon raisin scone, and apple turnover danish mini} provide the least amount of carbs.

In [None]:
dessert_sort3 = dessert.sort_values("Calories", ascending=False)
dessert_sort3.head()

Top 5 desserts with the highest amount of calories.

In [None]:
dessert_sort3.tail()

Top 5 desserts with least amount of calories.

In [None]:
plt.plot(dessert_sort3["Name"], dessert_sort3["Calories"])
plt.xticks(rotation = 90)
plt.show()

Line graph representing desserts based on calorie level, shown to have a negative slope. Foods like {blondie, chocolate chip mini muffin, cranberry mini muffin} appears to provide the most amount of calories. Meanwhile foods like {cinnamon raisin scone, cranberry scone, and blueberry scone} provide the least amount of calories.

In [None]:
best_dessert = food[(food["Type"]=="Dessert") & (food["Carbohydrate (g)"]<26) & (food["Protein (g)"]>3)]
best_dessert

Although desserst are normally not healthly and is high on sugar level. The top 2 desserts foods with high protein and low carb intake is poppy mini bagels and sesame mini bagels. A healthy level of dessert consumption would normally consist of an average intake of 100-200 calories, so 1-2 serving of these two food would suffice as a decent dessert meal, if one is craving dessert foods.

Thoughts on future improvements: 
1. Have another subcategory determing the type of food whether its {meat, vegetable, or soup}, to see which vegetables, meat, and soup are best in mantaining a healthly diet. An ideal meal would have a healthy combination of all 3, while still fitting the criteria of having high protein, low carbs, and sufficent calories intake.
2. Filter the data to find meal plans for vegetarians/vegans.