![Food Photo by Ella Olsson](https://images.pexels.com/photos/1640777/pexels-photo-1640777.jpeg?cs=srgb&dl=pexels-ella-olsson-1640777.jpg&fm=jpg)

**INTRODUCTION**

This kernel is an exploration and amalgamation of different visualizations that can be done on csv files. The visuals are from plotly. In addition to that, it also has valuable insights on the quality of food we eat and nutrients that it carries.


**CONTEXT: FOOD & NUTRITION**

Nutrition inside the food are the way we get fuel, providing energy for our bodies. We need to replace nutrients in our bodies with a new supply every day. Fats, proteins, and carbohydrates are all required. Nutrition is the science that interprets the nutrients and other substances in food in relation to maintenance, growth, reproduction, health and disease of an organism. It includes ingestion, absorption, assimilation, biosynthesis, catabolism and excretion.

Counting calories and reducing fat intake is the number one advice given by all dieticians and nutritionist. The knowledge about nutrition is essential part to keep eating mindfully and also peace of mind. Furthermore, a diet filled with vegetables, fruits and whole grains could help prevent major conditions such as stroke, diabetes and heart disease.


**CONTENT:**

   **1. Data Cleaning**

   **2. Data Visualization & Analysis**

        * Group Metrics

        * Food Myth
   

**REFERENCE:**
[Nutrition Details for Most Common Foods](https://www.kaggle.com/niharika41298/nutrition-details-for-most-common-foods)

In [None]:
#import libraries and define functions for plotting the data
import pandas as pd

import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [None]:
nutrients=pd.read_csv("/kaggle/input/nutrition-details-for-most-common-foods/nutrients_csvfile.csv")
nutrients.head()

# **1. Data Cleaning**

Although the data here seems clean, some minor alteration are required. There are 3 concern in this step.
1. Convert
    * "t" and "t'" value into 0.
 Since the dataset has "t" and "t'" value that indicates miniscule amount nutrients inside the food, ignore them into 0
    *  Commas to numerical data 
    *  nutrients datatypes into int or float variable
    *  "-1" and "a" value into 0
2. Deletes food with incomplete data/null value
3. Checking food category distribution

In [None]:
#replace t in the data by 0. t indicates miniscule amount inside the food item
nutrients = nutrients.replace("t", 0)
nutrients = nutrients.replace("t'", 0)

nutrients.head()

In [None]:
#check the size of dataset
display(nutrients)

In [None]:
#convert commas to numerical data for respective int or float variable
nutrients = nutrients.replace(",","", regex=True)
nutrients['Protein'] = nutrients['Protein'].replace("-1","", regex=True)
nutrients['Fiber'] = nutrients['Fiber'].replace("a","", regex=True)
nutrients['Calories'][91] = (8+44)/2

#convert grams, calories, protein, fat, saturated fat, fiber and carbs datatypes to int
nutrients['Grams'] = pd.to_numeric(nutrients['Grams'])
nutrients['Calories'] = pd.to_numeric(nutrients['Calories'])
nutrients['Protein'] = pd.to_numeric(nutrients['Protein'])
nutrients['Fat'] = pd.to_numeric(nutrients['Fat'])
nutrients['Sat.Fat'] = pd.to_numeric(nutrients['Sat.Fat'])
nutrients['Fiber'] = pd.to_numeric(nutrients['Fiber'])
nutrients['Carbs'] = pd.to_numeric(nutrients['Carbs'])

#check the result of dataset
nutrients.dtypes

In [None]:
#quick check data quality
print(nutrients.isnull().any())
print('-'*245)
print(nutrients.describe())
print('-'*245)

In [None]:
#drop row in with null value
nutrients = nutrients.dropna()

display(nutrients)

In [None]:
#Simplifying Categories
nutrients['Category'] = nutrients['Category'].replace('DrinksAlcohol Beverages', 'Drinks, Alcohol, Beverages', regex=True)
nutrients['Category'] = nutrients['Category'].replace('Fats Oils Shortenings', 'Fats, Oils, Shortenings', regex=True)
nutrients['Category'] = nutrients['Category'].replace('Fish Seafood', 'Fish, Seafood', regex=True)
nutrients['Category'] = nutrients['Category'].replace('Meat Poultry', 'Meat, Poultry', regex=True)
nutrients['Category'] = nutrients['Category'].replace(['Breads cereals fastfoodgrains', 'Seeds and Nuts'], 'Grains', regex=True)
nutrients['Category'] = nutrients['Category'].replace(['Desserts sweets', 'Jams Jellies'], 'Desserts', regex=True)
nutrients['Category'] = nutrients['Category'].replace(['Fruits A-F', 'Fruits G-P', 'Fruits R-Z'], 'Fruits', regex=True)
nutrients['Category'] = nutrients['Category'].replace(['Vegetables A-E', 'Vegetables F-P', 'Vegetables R-Z'], 'Vegetables', regex=True)

In [None]:
#Convert grams, calories, protein, fat, saturated fat, fiber and carbs value into per grams
nutrients['Calories'] = nutrients['Calories'] / nutrients['Grams']
nutrients['Protein'] = nutrients['Protein'] / nutrients['Grams']
nutrients['Fat'] = nutrients['Fat'] / nutrients['Grams']
nutrients['Sat.Fat'] = nutrients['Sat.Fat'] / nutrients['Grams']
nutrients['Fiber'] = nutrients['Fiber'] / nutrients['Grams']
nutrients['Carbs'] = nutrients['Carbs'] / nutrients['Grams']

In [None]:
#Final Checking Categories Distribution
category_dist = nutrients.groupby(['Category']).mean()
category_dist

# **2. Data Visualization & Analysis**

**1. Group Metrics**

Consist of data visualization from **category distribution from all metrics** & **finding top 20 food with most nutrients.**

In [None]:
#Category Distribution from All Metrics
fig = make_subplots(
    rows=2, cols=3,
    specs=[[{"type": "domain"},{"type": "domain"},{"type": "domain"}],
           [{"type": "domain"},{"type": "domain"},{"type": "domain"}]])

fig.add_trace(go.Pie(values=category_dist['Calories'].values, title='CALORIES', labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),
              row=1, col=1)

fig.add_trace(go.Pie(values=category_dist['Protein'].values,title='PROTEIN', labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),
              row=1, col=2)

fig.add_trace(go.Pie(values=category_dist['Fat'].values,title='FAT', labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),
              row=1, col=3)

fig.add_trace(go.Pie(values=category_dist['Sat.Fat'].values,title='SAT.FAT', labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),
              row=2, col=1)

fig.add_trace(go.Pie(values=category_dist['Fiber'].values,title='FIBER', labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),
              row=2, col=2)

fig.add_trace(go.Pie(values=category_dist['Carbs'].values,title='CARBS', labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),
              row=2, col=3)

fig.update_layout(title_text="Category Distribution of All Metrics",height=700, width=1000)
fig.show()

**Important Insight:**

1. Calories
    * **[Drinks, Alcohol, Beverages]** and **[Soups]**category have lowest calories content (2.61%) & (2.2%).
    * **[Grains]** has more calorie content (16.5%) than **[Meat, Poultry]** (13.8%) and **[Desserts]** (12.9%), while **[Fats, Oils, Shortening]** has largest percentage (28.3%).


2. Protein
    * Largest percentage of protein is in **[Fish, Seafood]** category (28.2%), followed by **[Meat, Poultry]** (22.9%).
    * **[Drinks, Alcohol, Beverages]** category didn't contain any protein.


3. Fat
    * **[Fats, Oils, Shortening]** category has largest fat content (53.7%%), while **[Fruits]** and **[Vegetables]** have fat less than 1%
    * **[Drinks, Alcohol, Beverages]** category didn't contain any fat.


4. Sat.fat
    * **[Drinks, Alcohol, Beverages]** category didn't contain any saturated fat (0%), while **[Fats, Oils, Shortening]** category has largest percentage (51%).
    * **[Fruits]** and **[Vegetables]** have saturated fat less than 1%.


5. Fiber
    * Surprisingly, **[Fats, Oils, Shortening]** category has largest fiber content (58.4%), followed by **[Fish, Seafood]** (22.2%).
    * There is great difference in fiber content compared to other categories.
    * **[Fruits]** and **[Vegetables]** have fiber less than 5%.
    * **[Drinks, Alcohol, Beverages]** and **[Meat, Poultry]** category didn't contain any fiber content.


6. Carbs
    * **[Desserts]** category has largest carbohydrate content (31.7%), followed by **[Grains]** (25%) and **[Fruits]** (11.6%).
    * **[Meat, Poultry]** has lowest carbohydrate content (0.73%).


7. Conclusion 
    * It is clear that **[Fats, Oils, Shortening]** category has the largest amout of calories, fat, saturated fat, and fiber.
    * **[Vegetables]**, **[Soups]**, and **[Fruits]** category mostly have low nutrients content less than 10% in each category, unless **[Fruits]** category which has carbohydrate content (11.6%).
    * There are great difference fat, sat.fat, and fiber content from **[Fats, Oils, Shortening]** category as the largest compared to other categories.
    * **[Drinks, Alcohol, Beverages]** category didn't contain protein, fat, saturated fat, and fiber.
    * **[Meat, Poultry]** category didn't contain any fiber content.

In [None]:
#Finding Top 20 Food with Most Nutrients
calories = nutrients.sort_values(by='Calories', ascending= False)
protein = nutrients.sort_values(by='Protein', ascending= False)
fat = nutrients.sort_values(by='Fat', ascending= False)
sat_fat = nutrients.sort_values(by='Sat.Fat', ascending= False)
fiber = nutrients.sort_values(by='Fiber', ascending= False)
carbs = nutrients.sort_values(by='Carbs', ascending= False)

top_20_calories = calories.head(20)
top_20_protein = protein.head(20)
top_20_fat = fat.head(20)
top_20_sat_fat = sat_fat.head(20)
top_20_fiber = fiber.head(20)
top_20_carbs = carbs.head(20)

#Top 20 Calories
fig = px.bar(top_20_calories, x='Food', y='Calories', color='Calories', title=' Top 20 Calories Rich Foods', template = 'plotly_white')
fig.show()

#Top 20 Protein
fig = px.bar(top_20_protein, x='Food', y='Protein', color='Protein', title=' Top 20 Protein Rich Foods', template = 'plotly_white')
fig.show()

#Top 20 Fat
fig = px.bar(top_20_fat, x='Food', y='Fat', color='Fat', title=' Top 20 Fat Rich Foods', template = 'plotly_white')
fig.show()

#Top 20 Sat.Fat
fig = px.bar(top_20_sat_fat, x='Food', y='Sat.Fat', color='Sat.Fat', title=' Top 20 Sat.Fat Rich Foods', template = 'plotly_white')
fig.show()

#Top 20 Fiber
fig = px.bar(top_20_fiber, x='Food', y='Fiber', color='Fiber', title=' Top 20 Fiber Rich Foods', template = 'plotly_white')
fig.show()

#Top 20 Carbs
fig = px.bar(top_20_carbs, x='Food', y='Carbs', color='Carbs', title=' Top 20 Carbs Rich Foods', template = 'plotly_white')
fig.show()


Some inferences from the above bar charts:
1. **[Fats, Oils, Shortening]** and **[Grains]** dominate food with largest calories, fat, and saturated fat content. 
2. **Butter** and **Oyster** contains highest amount of protein and made a great difference compared to others.
3. **Top 7 fat rich food** have same amount fat content.
4. **Most of food** category have a low fiber content (less than 0.5%), except for **Butter** and **Oyster.**
5. **Desserts**  dominate food with largest carbohydrate content.

**2. Food Myth**

In this section we'll answer some myth about food based on dataset. Here we go.

![Food Myths](https://static.toiimg.com/thumb/msid-69940117,width-1200,height-900,imgsize-691403/69940117.jpg)

> #  *A. Fat makes you fat. All fats are bad for you*.

Fat gets such a bad reputation that we often forget there are such things as healthy fats. Let's dive in to get know more the relation between fat and saturated fat in most common foods.

In [None]:
#Relation Between Fat & Saturated Fat
fig  = px.scatter(nutrients, x = 'Fat', y = 'Sat.Fat', trendline = 'lowess', color = 'Fat',
                 color_discrete_map={'Fat':'#cd0000', 'Sat.Fat':'#3399ff'},hover_name='Food' ,template = 'plotly_white',
                  title = 'Relation Between Saturated Fat and Fat')
fig.show()

**Fat** and **Saturated Fat** mostly have linear trend that saturated fat slightly less amount than Fat. So, does it means fat and saturated fat have same impact?

Based on National Health Service (NHS) information, average daily reference intakes for adults are **Fat (70 gram)** and **Saturated Fat (20gram)**. That means, normally you should stop consume at 20gram saturated fat before considering 70gram fat. But, anyway, fat is still an essential component of our diet *(Please keep your diet balance)*. So. it is clear, in the same amount *SATURATED FAT HAS MORE DANGEROUS IMPACT FOR BODY THAN FAT.*

> #  *B. You need to eat meat to get enough protein.*

to answer that statement, we have to look comparison chart of protein content in each food category. Let's see how much protein content in **[Meat, Poultry]** category.

In [None]:
#Food Comparison based on Protein Content
fig = go.Figure(go.Pie(values=category_dist['Protein'].values, text=category_dist.index, labels=category_dist.index,
                marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))))
fig.update_layout(title_text="Food Comparison based on Protein Content",height=600, width=800)
fig.show()

**[Fish, Seafood]** category has higher protein content (28.2%) than **[Meat, Poultry]** (22.9%). While, **[Vegetables], [Soups], and [Desserts]** have a quite similar average protein content (around 2.5%), it means *THERE IS NO DIFFERENCE MEANING IN PROTEIN INTAKE FROM THEM*. But, since **[Fish, Seafood]** category actually can be included in the broad meats category in real life, we also could say *MEATS CONTRIBUTE LARGEST PROTEIN SOURCE IN MOST COMMON FOOD.*

Hey! we already knew that meat was protein source, how about we find *Top 10 Meat with High Protein Content*, so you'll know what kind of food you need to get largest protein source. Check it out!

In [None]:
#Top 10 Meats High Protein Content
meats = nutrients[nutrients['Category'].isin(['Fish, Seafood', 'Meat, Poultry'])]
meats_protein=meats.sort_values(by='Protein', ascending= False)
meats_protein=meats_protein.head(10)

fig = go.Figure(go.Pie(values=meats_protein['Protein'].values, text=meats_protein['Food'],
               marker = {"colors": ['#100b','#f00560'],"line": {"color": '#FFFFFF', "width" : 2.5}}))
fig.update_layout(title_text="Meat with High Protein Content",height=500, width=800)
fig.show()

Well, well, well. What could we say? Oyster really was super protein food, not gonna lie.

**Conclusion:**
Based on data, we could say **THE STATEMENT IS HALF TRUE**. 
It's true **[Meat, Poultry]** has quite high protein content, and **[Fish, Seafood]** are included in the broad meats category in real life. But, for common people, the statement could lead to misunderstanding, while meats associated with livestocks and seafoods are fish only. In other hand, the fact about **[Fish, Seafood]** category *DOMINATE TOP 10 MEAT WITH HIGH PROTEIN CONTENT* can't ignore easily.

> #  *C. Margarine is more calorie wise than butter.*

since we will comparison margarine and butter, how about we try to visualize at **[Fats, oils, Shortening]** category by calories content at once?

In [None]:
miscellaneous = nutrients[nutrients['Category'].isin(['Fats, Oils, Shortenings'])]

#Top 10 High Calories
miscellaneous_calories=miscellaneous.sort_values(by=['Calories'], ascending= False)
#miscellaneous_calories=miscellaneous_calories.head(10)

fig = go.Figure(data=[go.Scatter(x=miscellaneous_calories['Food'], y=miscellaneous_calories['Calories'], 
                mode='markers', marker_color=["blue", "purple", "pink", "teal", "silver","yellow", "lightsalmon", "tan", "teal", "silver"], 
                marker_size=miscellaneous_calories['Calories']*5)])
fig.update_layout(title='Fats, Oils, Shortenings with High Calories Content')
fig.show()

Hm... it's definitely strange. There're two different values in both butter and margarine. Let's try to open table value of **[Fats, oils, Shortening]** category to make sure.

In [None]:
display(miscellaneous)

Interesting, actually there are two reason why we lost from above chart. The first reason, because there are food with **two or three similiar name, but different weight(grams).** The second reason, in butter case, there are **3 different calories value**.
> How could that happen? short answer: we don't know. It could be there are errors input in dataset or that's how it is work. *(PS: if some of you are nutritionist, we'll glad to hear it from you.)*

Well, it's already late, so we will only talk about possibility result.
* Let's assume the dataset was right and by any chance there is butter out there with calories content around 1 cal/gram. It is clear that ***MARGARINE IS WORSE THAN BUTTER.***
* Again, let's assume there are error input in butter calories item and we ignore 1 calorie value, surprisingly ***MARGARINE IS STILL SLIGHTLY WORSE THAN BUTTER***
* Lard is still a champion, anyway.

*note: Based on National Health Service (NHS) information *again*, average daily reference intakes is **2500 calories for men** and **2000 for woman***