# Python + Plotly Express Walkthrough Lab
## Goal: Learn how to use Python and Plotly Express library to create three types of interactive data visualizations. This lab will focus on learning how to create a bar, line, and scatter plots that tell a clear and insightful story. 

### 1. Setup and Imports - Start by ensureing the right packages are installed and imported. There is no need to isntall packages as the ones we need for this lab and the assignment were imported during the set up. 

In [172]:
# import packages
import pandas as pd
import plotly.express as px

### 2. Example 1 - Bar Chart: Comparing Product Sales
Scenario: You are data analyst tasked with creating a visual to help understand product sale trends in a business unit. You want to compare how each product performed in 2024. 

The first thing we will do is create a dataset manually using the pandas DataFrame constructor. The pandas dataframe is a 2-dimensional labeled data structure with columns and rows. You can think of it like a spreadsheet in excel or SQL table. It is generally the most commonly used pandas object. We will create a simple dataset with product names and their corresponding sales figures. 

In [186]:
salecs_data = pd.DataFrame({
    "Product": ["A", "B", "C", "D", "E"],
    "Sales": [150, 200, 300, 250, 400]
})
print(salecs_data)

  Product  Sales
0       A    150
1       B    200
2       C    300
3       D    250
4       E    400


Now that the data has been created we will create a bar chart using the plotly express bar constructor. Here we will invoke pandas express bar constructor to create a bar chart. Note how the x parameter is used to specify the column for the x-axis, the y parameter is used to specify the column for the y-axis, and the title parameter is used to set the title of the chart. Those are the most basic parameters that you'd want to use. In the code below we see other optional paramenters; 
* The text parameter is used to display the sales values on top of each bar
* The color parameter is used to color the bars based on the product names.
Note also how the the resulting bar chart is stored in the fig variable by using the `fig = px...`
Check out this link for detailed documentation of the plotly express bar constructor: https://plotly.github.io/plotly.py-docs/generated/plotly.express.bar.html

In [187]:
fig = px.bar(
        salecs_data, 
        x="Product", 
        y="Sales", 
        title="Product Sales in 2024",
        text="Sales",
        color=["lightslategray","lightslategray","crimson",'lightslategray','lightslategray']
)

With our bar chart initiated we will now make some adjustments to the layout of the chart. To make modifications to the chart we simply call the variable where we stored the chart and plotly function to apply customizations. We will use three functions to start out with;
* `update_traces()` this is method used to modify the properties of one or more traces within a chart. It provides a way to efficiently update various visual aspects or data-related attributes of traces without recreating the entire chart. Here is detailed information of the various paramentes that can be modified with this function. https://plotly.com/python-api-reference/generated/plotly.graph_objects.Bar.html
* `update_layout()` this method that may be used to update multiple nested properties of a chart's layout. Below we udpate the background color, yaxis title, and xaxis title. More information on the parameters can be found here https://plotly.com/python/reference/layout/
* `update_yaxis()` this method is used to make granular adjustments to the y axis. See here for more details https://plotly.com/python/reference/layout/yaxis/. 

In [188]:

fig.update_traces(textposition = 'outside', hovertemplate="Product %{x}: %{y}",showlegend=False)
fig.update_layout(template = "plotly_white", yaxis_title = "Sales ($)", xaxis_title="Product")
fig.update_yaxes(range=[0,1000])

### 3. Example 2 - Line Chart: Trends Overtime
We’ll query the Census ACS (American Community Survey) for Utah across multiple years and variables, collect results into a tidy DataFrame, and prep it for a future chart.
#### Imports, API key, and basics
We’ll use requests to call the API, pandas for tabular data, and the us package to get Utah’s FIPS code (the Census needs FIPS).

In [176]:
import requests
import os
from dotenv import load_dotenv
from us import states
load_dotenv()

API_KEY = os.getenv('API_KEY')
BASE = 'https://api.census.gov/data'

### Pick survey, years, geography, and variables
The ACS 1-year API is on the url substring of `acs/acs1`. We’ll loop from 2014–2024 for Utah. Variables come from table B15001 (education by sex/age). Each ..._E ends with “estimate”.

In [185]:
SURVEY = 'acs/acs1'
YEARS = list(range(2014,2025)) #2014..2024 inclusive
STATE_FIPS = states.UT.fips

# Human Friendly Lables and the the corresponding codes for the ACS survey
VAR_CODES = ['B15001_003E','B15001_009E','B15001_044E','B15001_050E']
LABELS = ['18 to 24 Males','Males With BS/BA','18 to 24 Females','Females With BS/BA']

### Understand the Census API URL and response
The API pattern is: `{BASE}/{YEAR}/{SURVEY}?get=NAME,<var1>,<var2>,...&for=state:<fips>&key=<API_KEY>`
The first row of the JSON is the header list (column names). The second row is the values.
Let’s test one year + one variable to see the shape:

In [178]:
def build_url(year,var_code):
    params = {
        "get":f"NAME,{var_code}",
        "for":f"state:{STATE_FIPS}",
        "key":API_KEY
    }
    return (
        f"{BASE}/{year}/{SURVEY}"
        f"?get={params['get']}&for={params['for']}&key={params['key']}"
    )
test_url = build_url(2021,VAR_CODES[0])
resp = requests.get(test_url)
print("Status:",resp.status_code)
print("Example payload",resp.json())


Status: 200
Example payload [['NAME', 'B15001_003E', 'state'], ['Utah', '194844', '49']]


### Loop over years and variables; collect clean rows
Instead of writing to the DataFrame row-by-row (slow), collect rows in a list and build a DataFrame once. Also handle non-200 statuses gracefully.

In [179]:
rows =[]

for year in YEARS:
    print(f"Processing year: {year}")
    for label,var_code in zip(LABELS,VAR_CODES):
        url = build_url(year,var_code)
        r= requests.get(url)
        if r.status_code != 200:
            print(f" ⚠️ {year} {var_code} request failed: {r.status_code}")
            continue

        data = r.json()
        header,values = data[0],data[1]
        #header looks like ['NAME','B15001_003E','state']: locate the variable's index
        var_idx = header.index(var_code)
        
        rows.append({
            "Year": year,
            "Variable": label,
            "Population": values[var_idx],
            "StateName": values[header.index("NAME")],
            "StateFips":values[header.index("state")],
            "Var Code": var_code,
        })
census_data= pd.DataFrame(rows)


Processing year: 2014
Processing year: 2015
Processing year: 2016
Processing year: 2017
Processing year: 2018
Processing year: 2019
Processing year: 2020
 ⚠️ 2020 B15001_003E request failed: 404
 ⚠️ 2020 B15001_009E request failed: 404
 ⚠️ 2020 B15001_044E request failed: 404
 ⚠️ 2020 B15001_050E request failed: 404
Processing year: 2021
Processing year: 2022
Processing year: 2023
Processing year: 2024


#### Now to make it chart-ready: types, sorting, and (optional) pivot
Convert numbers to numeric, sort by year, and optionally pivot for multi-series charts.

In [180]:
#clean types
census_data["Year"] = census_data["Year"].astype(int)
census_data["Population"] = pd.to_numeric(census_data["Population"],errors="coerce")

#sort for nice plotting
census_data = census_data.sort_values(["Variable","Year"]).reset_index(drop=True)

#Wide format (columsn per variable) is handy for line charts
wide = census_data.pivot(index="Year",columns="Variable",values="Population").sort_index()

print("Long (tidy) data:\n", census_data.head(8))
print("\nWide Data fro quick plotting:\n", wide.head())

Long (tidy) data:
    Year          Variable  Population StateName StateFips     Var Code
0  2014  18 to 24 Females      165938      Utah        49  B15001_044E
1  2015  18 to 24 Females      168679      Utah        49  B15001_044E
2  2016  18 to 24 Females      169213      Utah        49  B15001_044E
3  2017  18 to 24 Females      169883      Utah        49  B15001_044E
4  2018  18 to 24 Females      174265      Utah        49  B15001_044E
5  2019  18 to 24 Females      178772      Utah        49  B15001_044E
6  2021  18 to 24 Females      186947      Utah        49  B15001_044E
7  2022  18 to 24 Females      198328      Utah        49  B15001_044E

Wide Data fro quick plotting:
 Variable  18 to 24 Females  18 to 24 Males  Females With BS/BA  \
Year                                                             
2014                165938          166681               15930   
2015                168679          172958               15594   
2016                169213          176913    

#### Build Line Chart
Let’s turn our tidy census_data into a few Plotly Express charts. We will build 3 progressively “smarter” visuals and explain why each design choice helps the story.
#####  Setup 
Plotly Express (px) is the “quick grammar” for Plotly. We’ll also set a default template and a helper subtitle noting ACS details.

In [181]:
px.defaults.template = "plotly_white"
px.defaults.width = 900
px.defaults.height = 500

SUBTITLE = "American Community Survey (ACS) 1-year estimates * Utah * Variables from table B15001"

##### Multi-series line chart (long/tidy data)
This uses our census_data (long format). Each variable is a colored line across years. Markers help with sparse year counts; hover labels are customized so they read like a sentence.

In [182]:
fig = px.line(
    census_data,
    x="Year",
    y="Population",
    color="Variable",
    markers=True,
    line_group="Variable",
    labels={
        "Population":"Populaiton(estimate)",
        "Year": "Year",
        "Variable":"Category"
    },
    title=f"Education/Age by Sex Over Time<br><sup>{SUBTITLE}</sup>"
)

# Make the y-axis human friendly and hovers clean

fig.update_layout(
    legend_title_text="Series",
    hovermode="x unified",
    margin=dict(l=40,r=40,t=90,b=40)
)
fig.update_yaxes(separatethousands=True)

fig.update_traces(
    hovertemplate="<b>%{fullData.name}</b><br>Year: %{x}<br>Population: %{y:,}<extra></extra>"
)
fig.show()

##### Faceted view by Sex (small multiples)
Small multiples reduce categorical clutter. We’ll split Variable into two dimensions: Sex and Group (age or degree). Then facet by Sex, color by Group. Now each panel tells a simpler micro-story, which is easier to compare side-by-side.

In [183]:
# Map labels -> components (so we can facet)
label_parse = {
    "18 to 24 Males": ("Male", "Age 18-24"),
    "Males With BS/BA": ("Male", "BS/BA"),
    "18 to 24 Females": ("Female","Age 18-24"),
    "Females With BS/BA": ("Female", "BS/BA")
}
viz_df = census_data.copy()
viz_df[["Sex","Group"]] = viz_df["Variable"].apply(lambda s: pd.Series(label_parse[s]))

fig2 = px.line(
    viz_df,
    x="Year",
    y="Population",
    color="Group",
    facet_col="Sex",
    facet_col_spacing=0.08,
    markers=True,
    category_orders={"sex":["Male","Female"],"Group":["Age 18-24","BS/BA"]},
    labels={
        "Population": "Population (estimate)",
        "Year":"Year",
        "Group":"Category"
    },
    title=f"Education/Age by Sex (Small Multiples)<br><sup>{SUBTITLE}</sup>"
)
fig2.update_layout(
    hovermode="x unified",
    margin=dict(l=40,r=40,t=90,b=40),
)

# Improve axis legibility and remove duplciated face titles in legends
fig2.for_each_yaxis(lambda a: a.update(separatethousands=True))
# Cleaner face titles
fig2.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

fig2.update_traces(hovertemplate="Year: %{x}<br>%{legendgroup}: %{y:,} <extra></extra>")
fig2.show()

#### “Latest year” bar chart (crisp takeaway)
Execs often want the most recent snapshot. Filter to the latest year (e.g., 2024) and show a simple bar chart. This is perfect for a one-sentence “so what?”.

In [184]:
latest_year = census_data["Year"].max()
latest = census_data.query("Year== @latest_year").copy()

fig3 = px.bar(
    latest,
    x="Variable",
    y="Population",
    text="Population",
    labels= {"Population":"Population (estimate)","Variable":"Category"},
    title=f"Latest Year Comparison: {latest_year}<br><sup>{SUBTITLE}</sup>"
)

fig3.update_traces(texttemplate="%{text:,}",textposition="outside")
fig3.update_layout(
    margin=dict(l=40,r=40,t=90,b=40),
    uniformtext_minsize=10,
    uniformtext_mode="hide"
)
fig3.update_yaxes(separatethousands=True)
fig3.show()
