. Background and Introduction:
Topic: The economic outcomes of college majors and their impact on job prospects and income inequality.
Goal: Advocate for informed decision-making in higher education and highlight the importance of supporting students in fields with lower economic returns.
Key Question: Which major categories yield the highest median earnings, and how do they compare in terms of unemployment rates?
2. Questions to Address:
Which major categories offer the highest and lowest median earnings?
How does the unemployment rate vary across different majors, and what are the trade-offs between earnings and job stability?
How do gender distributions within these majors correlate with median earnings?
3. Visualizations and Rationale:

Visualization 1: Animated Bar Chart for Median Earnings by Major Category.
Purpose: Show disparities in median earnings, emphasizing economic inequality between fields.
Design Decision: Use an animated bar chart to guide viewers through each category, allowing for a focused analysis of each group.

Visualization 2: Interactive Scatter Plot for Median Earnings vs. Unemployment Rate.
Purpose: Illustrate the trade-offs between potential earnings and job stability, advocating for informed career decisions.
Design Decision: Include hover and filter features to let users explore data for specific majors and categories interactively.
4. Implementation Plan:
Tools: Use Plotly for creating interactive and animated visualizations in Python.
Data Preparation:
Load the recent-grads.csv dataset.
Clean and preprocess the data as needed.
Code for Visualizations:

In [1]:
pip install dash

Collecting dash
  Downloading dash-2.18.2-py3-none-any.whl.metadata (10 kB)
Collecting Werkzeug<3.1 (from dash)
  Downloading werkzeug-3.0.6-py3-none-any.whl.metadata (3.7 kB)
Collecting dash-html-components==2.0.0 (from dash)
  Downloading dash_html_components-2.0.0-py3-none-any.whl.metadata (3.8 kB)
Collecting dash-core-components==2.0.0 (from dash)
  Downloading dash_core_components-2.0.0-py3-none-any.whl.metadata (2.9 kB)
Collecting dash-table==5.0.0 (from dash)
  Downloading dash_table-5.0.0-py3-none-any.whl.metadata (2.4 kB)
Collecting retrying (from dash)
  Downloading retrying-1.3.4-py3-none-any.whl.metadata (6.9 kB)
Downloading dash-2.18.2-py3-none-any.whl (7.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m84.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)
Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)
Downloadi

In [2]:
!pip install jupyter-dash


Collecting jupyter-dash
  Downloading jupyter_dash-0.4.2-py3-none-any.whl.metadata (3.6 kB)
Collecting ansi2html (from jupyter-dash)
  Downloading ansi2html-1.9.2-py3-none-any.whl.metadata (3.7 kB)
Collecting jedi>=0.16 (from ipython->jupyter-dash)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading jupyter_dash-0.4.2-py3-none-any.whl (23 kB)
Downloading ansi2html-1.9.2-py3-none-any.whl (17 kB)
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jedi, ansi2html, jupyter-dash
Successfully installed ansi2html-1.9.2 jedi-0.19.2 jupyter-dash-0.4.2



Step 1: Code for Animated Bar Chart
python
Copy code


In [3]:
import plotly.express as px
import pandas as pd

# Load the dataset
df = pd.read_csv('sample_data/recent-grads.csv')

# Prepare the data for animation using a logical frame sequence (e.g., by rank or earnings)
df['Animation_Frame'] = df['Major_category']  # Replace with a relevant column if needed

# Create an animated bar chart for narrative storytelling
fig_bar = px.bar(
    df,
    x='Median',
    y='Major',
    color='Major_category',
    animation_frame='Animation_Frame',
    orientation='h',
    title='Median Earnings by Major (Animated Narrative)',
    labels={'Median': 'Median Earnings ($)', 'Major': 'Major'},
    text='Median'  # Display median earnings on bars
)

# Adjust animation settings for better pacing
fig_bar.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 2500  # Slow down to 2.5 seconds per frame

# Add annotations for key insights (example for a single frame)
fig_bar.add_annotation(
    x=80000,  # Example coordinate
    y='Engineering',  # Example major
    text="Highest earning major category",
    showarrow=True,
    arrowhead=2
)

# Customize layout for storytelling
fig_bar.update_layout(
    xaxis_title='Median Earnings ($)',
    yaxis_title='Major',
    annotations=[  # Add custom annotations if needed
        dict(
            x=70000, y='Math & Statistics', text="High earning but fewer graduates",
            showarrow=True, arrowhead=2, ax=-50, ay=0
        )
    ]
)

# Show the figure with narrative features
fig_bar.show()


code for interacctive scatter plt

In [4]:
import plotly.express as px
import pandas as pd

# Load the dataset
df = pd.read_csv('sample_data/recent-grads.csv')

# Handle missing values in columns used for size or any other relevant field
df['Total'].fillna(df['Total'].median(), inplace=True)

# Create an animated scatter plot for median earnings vs. unemployment rate
fig_scatter = px.scatter(
    df,
    x='Median',
    y='Unemployment_rate',
    size='Total',
    color='Major_category',
    animation_frame='Major_category',  # Adjust or use another column for the sequence
    hover_name='Major',
    title='Median Earnings vs. Unemployment Rate by Major (Animated)',
    labels={'Median': 'Median Earnings ($)', 'Unemployment_rate': 'Unemployment Rate'},
    size_max=40  # Adjust size for better visualization
)

# Adjust animation settings for a slower and clearer pace
fig_scatter.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 2500  # Slow down to 2.5 seconds per frame

# Add annotations to guide storytelling
fig_scatter.add_annotation(
    x=80000,  # Example coordinate for annotation
    y=0.05,   # Example unemployment rate coordinate
    text="High-earning majors with low unemployment",
    showarrow=True,
    arrowhead=2
)

# Update layout for enhanced storytelling and clarity
fig_scatter.update_layout(
    xaxis_title='Median Earnings ($)',
    yaxis_title='Unemployment Rate',
    annotations=[  # Add multiple annotations if needed
        dict(
            x=40000, y=0.10, text="Lower earnings with higher unemployment",
            showarrow=True, arrowhead=2, ax=-50, ay=-50
        )
    ]
)

# Show the animated scatter plot
fig_scatter.show()



A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.





In [5]:
import plotly.express as px

# Clean the data
sunburst_data=pd.read_csv('sample_data/recent-grads.csv')
cleaned_data = sunburst_data[sunburst_data['Total'] > 0].dropna(subset=['Median', 'Total'])

# Create the Sunburst Chart
sunburst_fig = px.sunburst(
    cleaned_data,
    path=['Major_category', 'Major'],
    values='Total',
    color='Median',
    color_continuous_scale='Viridis',
    title='Economic Outcomes of College Majors',
    hover_data={'Median': True, 'ShareWomen': True}
)

# Update layout for better presentation
sunburst_fig.update_layout(
    margin=dict(t=50, l=25, r=25, b=25),
    coloraxis_colorbar=dict(title="Median Earnings ($)"),
    title_font_size=16
)

# Display the chart
sunburst_fig.show()


In [6]:
import pandas as pd
import dash
from dash import dcc, html, Input, Output
import plotly.express as px

# Load the dataset
file_path = 'recent-grads.csv'
data = pd.read_csv(file_path)

# Filter out rows with zero or missing values in 'Total'
data = data[data['Total'] > 0].dropna(subset=['Total', 'Median'])

# Prepare the dataset for the Treemap
treemap_data = data.groupby(['Major_category', 'Major']).agg({
    'Total': 'sum',
    'Median': 'mean'
}).reset_index()

# Initialize the Dash app
app = dash.Dash(__name__)

# Layout of the app
app.layout = html.Div([
    html.H1("Treemap of College Majors: Total Graduates vs. Median Earnings"),
    dcc.Dropdown(
        id='major-category-dropdown',
        options=[{'label': category, 'value': category} for category in treemap_data['Major_category'].unique()],
        placeholder="Select a Major Category",
        value=None,  # Default value (None shows all categories)
        multi=False
    ),
    dcc.Graph(id='treemap-chart')
])

# Callback to update the Treemap based on dropdown selection
@app.callback(
    Output('treemap-chart', 'figure'),
    [Input('major-category-dropdown', 'value')]
)
def update_treemap(selected_category):
    # Filter the data based on the selected Major Category
    filtered_data = treemap_data[treemap_data['Major_category'] == selected_category] if selected_category else treemap_data

    # Create the Treemap
    fig = px.treemap(
        filtered_data,
        path=['Major_category', 'Major'],  # Hierarchy levels
        values='Total',                   # Size of each block
        color='Median',                   # Color based on median earnings
        color_continuous_scale='Viridis', # Color scale
        title=f"Treemap: {selected_category if selected_category else 'All Majors'}",
        hover_data={'Median': True}       # Additional hover info
    )

    # Update layout for clarity
    fig.update_layout(
        margin=dict(t=50, l=25, r=25, b=25),
        coloraxis_colorbar=dict(title="Median Earnings ($)")
    )

    return fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)


<IPython.core.display.Javascript object>