# Data Challenge: Creating Interactive Plotly Visuals

## Targeted KSBs (Knowledge, Skills, and Behaviors)

- **S6** – Demonstrates mastery in creating dynamic visualizations using Python (Plotly)
- **K10** – Applies chart selection principles based on data types, variables, and audience needs
- **S12** – Performs comprehensive data exploration to uncover patterns and relationships

---

## Dataset Description:

This dataset contains information about various Indian sweets, including their ingredients, preparation time, and flavor profile.  You can read more about the data [Here](https://www.kaggle.com/datasets/nehaprabhavalkar/indian-food-101?select=indian_food.csv)

---

## Task 1: Plot a Scatter Plot of Prep Time vs Cook Time
### Objective:
Create a line chart showing the relationship between **prep time** and **cook time** for each food item. Use Plotly to visualize these two variables and identify any patterns.

### Instructions:
1. **Load the dataset** using `pandas`.
2. Use **Plotly Express** to create a **scatterplot**.
3. Set **prep_time** on the x-axis and **cook_time** on the y-axis.
4. **Label** the axes appropriately and add a **title** for clarity.

In [8]:
#Run this cell without changes 
import pandas as pd 
import plotly.express as px

In [9]:
# Read in the data (data/indian_food.csv) -- Hint use pandas to read in the CSV 

df = pd.read_csv('/Users/Marcy_Student/Desktop/marcy/marcy-git/DA2025_Lectures/Mod2/data/indian_food.csv')
df

Unnamed: 0,name,ingredients,diet,prep_time,cook_time,flavor_profile,course,state,region
0,Balu shahi,"Maida flour, yogurt, oil, sugar",vegetarian,45,25,sweet,dessert,West Bengal,East
1,Boondi,"Gram flour, ghee, sugar",vegetarian,80,30,sweet,dessert,Rajasthan,West
2,Gajar ka halwa,"Carrots, milk, sugar, ghee, cashews, raisins",vegetarian,15,60,sweet,dessert,Punjab,North
3,Ghevar,"Flour, ghee, kewra, milk, clarified butter, su...",vegetarian,15,30,sweet,dessert,Rajasthan,West
4,Gulab jamun,"Milk powder, plain flour, baking powder, ghee,...",vegetarian,15,40,sweet,dessert,West Bengal,East
...,...,...,...,...,...,...,...,...,...
250,Til Pitha,"Glutinous rice, black sesame seeds, gur",vegetarian,5,30,sweet,dessert,Assam,North East
251,Bebinca,"Coconut milk, egg yolks, clarified butter, all...",vegetarian,20,60,sweet,dessert,Goa,West
252,Shufta,"Cottage cheese, dry dates, dried rose petals, ...",vegetarian,-1,-1,sweet,dessert,Jammu & Kashmir,North
253,Mawa Bati,"Milk powder, dry fruits, arrowroot powder, all...",vegetarian,20,45,sweet,dessert,Madhya Pradesh,Central


In [11]:
# Create a scatterplot of prep_time on X-axis & cook_time on Y-axis 

fig = px.scatter(df, x='prep_time', y='cook_time', title='Relationship Between Prep Time & Cook Time')

# Show the plot
fig.show()

### What insights did you get from Task 1? (Double-click to type answer)

- From the chart, I learned that there is generally a low cook time with a low prep time. 
- Outliers for prep time, which shows very long prep time, still have a low cook time.

***

## Task 2: Bar Chart of Cook Time by Region

### Objective:
Create a bar chart that shows the average cook time for each region. This will help us understand the cooking time distribution across different regions.  **There is a "weird" bar in the chart why is that the case??**


### Instructions:
- Group the data by the region. (Hint:  may need a df.groupby() method here!)

- Calculate the average cook time for each region.

- Create a bar chart using Plotly to show this average cook time for each region.

- Label the axes and title the chart.

In [12]:
# Task 2: Create a bar chart showing the average cook time by region
# Fill in the code to group by region and calculate the average cook time
df_region_avg = df.groupby('region')['cook_time'].mean().reset_index()

# Create the bar chart
fig = px.bar(df_region_avg, x='region', y='cook_time', title="Average Cook Time by Region")

# Show the plot
fig.show()


In [18]:
df[['region', 'cook_time']].value_counts()

region      cook_time
West         30          21
South        20          19
North        30          15
North East  -1           12
South        30          10
                         ..
-1           2            1
North East   25           1
             35           1
             45           1
West         720          1
Name: count, Length: 67, dtype: int64

In [13]:
df_region_avg

Unnamed: 0,region,cook_time
0,-1,21.230769
1,Central,48.333333
2,East,37.483871
3,North,41.102041
4,North East,14.32
5,South,34.338983
6,West,37.824324


### "Weird" Bar Chart

- I'm still not sure. By looking at the original df (filtered to region and cook time columns), it appears there are values in the region column that = -1. Perhaps a error in the data collection?

### What insights did you get from Task 2? (Double-click to type answer)
- From the bar chart, we can see that Central region has the highest average cook time compared to other regions.
- Similarly, Northeast region has the lowest average cook time compared to other regions.

***

## Task 3: Pie Chart of Flavor Profile Distribution

### Objective:
Create a pie chart showing the distribution of flavor profiles (e.g., sweet, savory) across the dataset.

### Instructions:
- Use Plotly Express to create a pie chart.

- Plot the flavor_profile column, which will show the distribution of flavor types.

- Ensure the chart is labeled clearly.

In [31]:
# Task 3: Create a pie chart showing the flavor profile distribution
# Fill in the code to create the pie chart -- look up documentation if needed 
fig = px.pie(df, 'flavor_profile', title="Flavor Profile Make-Up") 

# Show the plot
fig.show()


### What insights did you get from Task 3? (Double-click to type answer)
- From the pie chart, we can see that Spicy foods make up over 50% of the data, the most compared to other flavor profiles.
- Furthermore, we can see that sour flavors make up the least amount of data.