## 📘 Data Challenge 11 – Dash App with a Pie or Scatter Plot

Assignment Type: Partner/Group 
Estimated Time: 45–60 minutes

---
### 🎯 Targeted KSBs (Knowledge, Skills, and Behaviors)
S6 – Creates dynamic visualizations using Python (Plotly)

S7 – Designs clear dashboards using Dash

K10 – Chooses appropriate chart types for data and audience

B6 – Applies structured thinking to turn analysis into a dashboard app

---

### 📊 Scenario:
You’re helping an athletic director create a quick dashboard for just **five sports**. She wants to better understand how men’s and women’s revenue compare across a few programs. Your job is to make a simple Dash app with a pie chart or scatter plot showing the difference.

---
### ✅ Your Task (Step-by-Step)

- Load the sports.csv file using pandas.

- Drop missing values in the sports, rev_men, and rev_women columns.

- Pick 5 unique sports (your choice!) and filter the DataFrame to only include those.

    - Example: "Basketball", "Tennis", "Soccer", "Volleyball", "Golf"

- Create a new column called "Total_Revenue" by adding men’s and women’s revenue.

- Create either a pie chart or a scatter plot:

In [2]:
# Import packages 

import dash
from dash import html, dcc
import pandas as pd
import plotly.express as px
import warnings
warnings.simplefilter(action='ignore', category=Warning)

In [3]:
# Load and filter the data
df = pd.read_csv('/Users/Marcy_Student/Desktop/marcy/marcy-git/DA2025_Lectures/Mod2/data/sports.csv')
df = df[["sports", "rev_men", "rev_women"]].dropna()
df['sports'].value_counts()

sports
Basketball                    9448
Soccer                        6657
Tennis                        4628
Golf                          4258
All Track Combined            3604
Track and Field, X-Country    3442
Track and Field, Outdoor      1972
Lacrosse                      1913
Swimming and Diving           1366
Swimming                      1117
Track and Field, Indoor       1112
Volleyball                    1016
Ice Hockey                     501
Bowling                        385
Water Polo                     378
Rowing                         336
Rodeo                          294
Skiing                         188
Other Sports                   186
Wrestling                      179
Fencing                        168
Squash                         141
Gymnastics                      64
Weight Lifting                  64
Beach Volleyball                61
Table Tennis                    49
Archery                         39
Sailing                         19
Rifle        

In [4]:
pd.unique(df['sports'])

array(['Basketball', 'All Track Combined', 'Tennis', 'Golf', 'Soccer',
       'Lacrosse', 'Swimming and Diving', 'Track and Field, X-Country',
       'Track and Field, Indoor', 'Track and Field, Outdoor', 'Skiing',
       'Rodeo', 'Volleyball', 'Archery', 'Wrestling', 'Swimming',
       'Other Sports', 'Water Polo', 'Fencing', 'Gymnastics', 'Rowing',
       'Sailing', 'Ice Hockey', 'Squash', 'Bowling', 'Table Tennis',
       'Equestrian', 'Diving', 'Rifle', 'Beach Volleyball',
       'Weight Lifting'], dtype=object)

In [5]:
# Pick 5 sports
top5 = ['Basketball', 'Tennis', 'Soccer', 'Swimming', 'Rowing']
#Copying the dataframe to not overwrite the original 
df_5 = df[df["sports"].isin(top5)].copy()
pd.unique(df_5['sports'])

array(['Basketball', 'Tennis', 'Soccer', 'Swimming', 'Rowing'],
      dtype=object)

In [6]:
df_5

Unnamed: 0,sports,rev_men,rev_women
1,Basketball,1211095.0,748833.0
7,Tennis,78274.0,131145.0
11,Basketball,4189826.0,1966556.0
15,Soccer,1062855.0,944819.0
17,Tennis,324418.0,443408.0
...,...,...,...
132299,Tennis,132323.0,141556.0
132317,Basketball,104775.0,62345.0
132318,Basketball,549740.0,547230.0
132322,Soccer,652950.0,357251.0


In [7]:
# Create new column called Total_Revenue that adds up the men and women's revenue columns
df_5["Total_Revenue"] = df_5['rev_men'] + df_5['rev_women']
df_5

Unnamed: 0,sports,rev_men,rev_women,Total_Revenue
1,Basketball,1211095.0,748833.0,1959928.0
7,Tennis,78274.0,131145.0,209419.0
11,Basketball,4189826.0,1966556.0,6156382.0
15,Soccer,1062855.0,944819.0,2007674.0
17,Tennis,324418.0,443408.0,767826.0
...,...,...,...,...
132299,Tennis,132323.0,141556.0,273879.0
132317,Basketball,104775.0,62345.0,167120.0
132318,Basketball,549740.0,547230.0,1096970.0
132322,Soccer,652950.0,357251.0,1010201.0


In [8]:
# Make your pie or scatteplot using plotly 

fig = px.pie(df_5, names='sports', values='Total_Revenue', title='Revenue Makeup of Sports')
fig.show()

In [9]:
# Make the App -- DO NOT RUN THIS CELL YET It may give you a "port already in use error if you do"

app = dash.Dash(__name__)
app.title = 'Revenue Makeup of Sports'

app.layout = html.Div([
    html.H1("Revenue Analysis for 5 Sports", style={'textAlign': 'center'}),
    dcc.Graph(figure=fig)
])

if __name__ == '__main__':
    app.run(debug=True)

### Copy and paste the code in this notebook into a file called `app.py` and run that file; then go to your localhost address:  http://localhost:8050/ to see the updated visual