# **The Healthy Breakfast Challenge**
 As a nutritionist, you have been hired by local schools to improve the breakfast habits of students. Recent studies have shown that having a nutritious breakfast greatly improves students' energy levels, concentration, and overall performance in school. However, the market is flooded with a variety of cereal brands, each claiming to be the best choice. With the goal of maximizing nutritional value, you need to make informed decisions on which cereal brand to recommend to the local schools.

 **Overview of your Mission**
 - **Milestone 1: Data Cleaning** <br> First up, you will dive into the cereal dataset. Just like detectives, you'll need to understand and clean this data. This means identifying and fixing any mistakes. This ensures you will be working with reliable data. <br>
 - **Milestone 2: Data Visualization** <br> With our data cleaned, it's time to create graphs to show the relationship between different nutritional elements in cereals.
 - **Milestone 3:Data Interpretation** <br> Finally, you need to interpret your findings. This is where you will make sense of your graphs, and will have found the best three cereals to recommend to local schools.

### **Milestone 1: Data Cleaning**

As a nutritionist, your first task is crucial. Before you can make any recommendations, you need to make sure you have a solid foundation â€” that means starting with clean data. Looking at the cereal dataset below, follow the following instructions:

1. **Spot the clues:** First, scan through the cereal dataset for any bizzare entries. Are there any cereals with missing information? Are there numbers that just don't make sense?
2. **The Clean-up:** Once you identify the bizzare entries, you can start the clean-up process. This can include filling in missing data, correcting inaccuracies, or even removing entries that just don't fit. You may need to research the cereal brand to input the correct data.
3. **Final Inspection:** Take one last look at the cereal dataset so you don't miss anything.

![Data cleaning table should be displayed here](images/table1_cleaning.png)

### **Milestone 2: Data Visualization**
As you've seen, our dataset includes nutritional information for various cereals. For now, we are interested in two columns: **sugars** represented in grams and **vitamins** represented as a percentage of the recommended daily in-take.

##### <ins>**Task 1:**</ins> Given the data table below, choose a type of graph that you think will best represent the relationship between **sugars** and **ratings**.

![Display cereal first 15 rows of cereal data](images/table_tograph.png)


##### What type of graph did you choose?  ____________________________________________________

##### <ins>**Task 2:**</ins> Examine the three graphs below. Choose the graph that you think best represents the relationship between the sugar content and vitamin content in cereals. 

A) Scatter Plot <br>
![Graph 1: Scatter Plot Graph should be displayed here](images/scatter.png)<br>
B) Bar Graph<br>
![Graph 2: Bar Graph should be displayed here](images/bar.png)<br>
C) Line Graph <br>
![Graph 3: Line Graph should be displayed here](images/line.png)<br>


##### Which graph best represented the relationship between sugar and vitamin? _________________



##### <ins>**Task 3:**</ins> Now, it's time to code your own graph.sugar. To code your own graph, follow the steps below.

1. Run the set-up code below.

In [None]:
# Set-up Code
import plotly.express as px
import pandas as pd

# Load data
cereal_df = pd.read_csv("csv_files/cereal.csv")

2. Choose the the type of graph you want to visualize. Here are your options:
    - For a <u>bar graph</u>: Write `px.bar`
    - For a <u>line graph</u>: Write `px.line`
    - For a <u>scatter plot</u>: Write `px.scatter`

In [None]:
# ** STUDENT SECTION BEGINS **
# Choose a type of graph. Example: chosen_graph = px.line

chosen_graph = px.bar  # The default is bar. Change as needed.

# ** STUDENT SECTION ENDS **

##### <ins>**Task 4:**</ins> It's time to find cerelas with high rating but low sugar. Now that you know how to create different types of graph, it's time to find cereals with high rating but low sugar. To code this graph follow the steps below.

1. Run the set-up code below.


In [None]:
# Set-up Code
import plotly.express as px
import pandas as pd

# Load data
cereal_df = pd.read_csv("csv_files/cereal.csv")

### Goal 1: Find the cereals with high ratings but low sugar


<span style="color:black"> **Task 1: Choose a type of graph.** </span>
Look at the cereal data, and decide what type of graph would best suit your needs: <br>

- Bar Graphs: Great for comparing things between different groups.
- Line Graphs: Best for showing changes over time. 
- Scatter Plots: Ideal for showing relationships between two things.<br>

<span style="color:black"> **Task 2: Choose two attributes to use in your graph.** </span>
Examples of attributes are protein, fat, sugars, etc. Look at the cereal table, to determine what attributes to use.

- Which attribute is on the x-axis? Which attribute is on the y-axis?

<span style="color:black"> **Task 3: Analyze graph and make 3 Recommendations** </span>
Based off of your graph, which three cereals should you recommend to the school board? Remember, the goal is to recommend three cereals that have high ratings but are low in sugar.

In [31]:
# ** STUDENT SECTION BEGINS **

# Task 1: Choose a type of graph
# options: px.bar, px.line, px.scatter (example: chosen_graph = px.line)
chosen_graph = px.line # The default is bar. Change as needed.


# Task 2: Choose two attributes you want to visualize. Be careful of spelling!
x_attribute = "sugars"
y_attribute = "vitamins"

# ** STUDENT SECTION ENDS **

In [None]:
# Create an interactive graph

# sub_df = cereal_df.head(30)


fig = chosen_graph(
    cereal_df,
    # sub_df
    x = x_attribute.lower(),
    y = y_attribute.lower(),
    hover_name="name",  # Show cereal name on hover
    title = f"Relationship between {x_attribute.capitalize()} and {y_attribute.capitalize()}",
)

# Show the plot
fig.show()

# If figure does not show: uncomment the next line.
# fig.write_html("graph1.html") # downloads graph as an html file

### **Part 3: Data Interpretation**

### Task: Find cereals that are rich in fiber and protein but low in sugar and fat

<span style="color:black"> **Task 1: Run the code below to generate the graph** </span> <br>
A graph is already provided for you. SImply run the code below to see the graph.

<span style="color:black"> **Task 2: Explore the graph** </span>
- What observations can you make about this graph?
- What do you notice about the placement of each circle along the the x and y axis?
- What do you notice about the size of each circle?
- What do you notice about the colour of each circle?

<span style="color:black"> **Task 3: Recommend 3 cerals that meet the criteria.** </span>
Use your mouse to hover over the points to see more information. Analyze the graph to determine which 3 cereals are high in fiber and protein but low in sugar and fat.



In [4]:
# Graph
graph = px.scatter(
    cereal_df,
    x="fiber",  # Fiber on the x-axis
    y="protein",  # Protein on the y-axis
    size="fat",  # Represent fat content with the size of the marker
    color="sugars",  # Represent sugar level with color
    hover_name="name",  # Show cereal name when you hover over a point
    title="Cereals: Fiber & Protein vs. Sugar & Fat",
    labels={
        "fiber": "Fiber (g)",
        "protein": "Protein (g)",
        "sugars": "Sugar (g)",
        "fat": "Fat (g)",
    },
)

# Show graph
graph.show()