### U.S. College Enrollment Trends: 2019 - 2023
##### Purposes:
- Create an interactive dash application framed from an explanatory perspective, with Plotly Dash
- Explore enrollment trends of US degree-granting institutions (colleges, universities, and technical and vocational institutions).
- Display enrollment changes across groups:
    - Education level: Undergraduate (UG) vs. Graduate (GR)
    - Gender: Women vs. Men
    - Study status: Full-Time (FT) vs. Part-Time (PT)

##### Data: 
The enrollment data are from the Integrated Postsecondary Education Data System (IPEDS) provided by the U.S. National Center for Education Statistics (NCES). Source: https://nces.ed.gov/ipeds/summarytables 
- Limited to all Title IV degree-granting institutions in the U.S.: N ≥ 3838
- The numbers of enrollment are summary data of all institutes included.
- The enrollments are the fall terms of 2019 o 2023. 
- Enrollment App: The default model includes the enrollment of all groups. Groups' data can be selected by clicking the checkbox. 


In [168]:
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import dash
from dash import Dash, dcc, html, Output, Input, dash_table  # dcc - Dash Core Components
pio.renderers.default = "notebook_connected" 


In [169]:
# Read the datasets
df1_EnrolByGroups = pd.read_csv("Data-1EnrollmentBy3Groups.csv")

# Set display width for better readability  
pd.set_option('display.width', 600)   

# # Information of the 1st dataset
# print("Dataset 1: Enrollment by Groups " + str(df1_EnrolByGroups.shape))
print(df1_EnrolByGroups.head(), "\n")
# display(df1_EnrolByGroups.info())

# df1_EnrolByGroups.isnull().sum()  # Check for missing values
# df1_EnrolByGroups.dropna(inplace=True)  # Drop rows with missing values


   Year Education_Level Gender Study_Status Enrollment
0  2019  Undergraduates    Men    Full-Time  4,543,556
1  2020  Undergraduates    Men    Full-Time  4,265,661
2  2021  Undergraduates    Men    Full-Time  4,139,458
3  2022  Undergraduates    Men    Full-Time  4,139,295
4  2023  Undergraduates    Men    Full-Time  4,266,721 



In [170]:
# Convert the "Enrollment" column to integer
# Note: The "Enrollment" column have commas. Values like "4,543,556" are stored as strings, , not numbers. 
df1_EnrolByGroups["Enrollment"] = (
    df1_EnrolByGroups["Enrollment"]
    .replace({",": ""}, regex=True)   # Remove commas
    .astype(int))                     # Convert to integer

print(df1_EnrolByGroups.head(), "\n")

print(df1_EnrolByGroups["Year"].unique())  # Data type - integer! Maybe need to convert to 'string'
print(df1_EnrolByGroups["Education_Level"].unique())
print(df1_EnrolByGroups["Gender"].unique(), "\n")

df1_EnrolByGroups["Year"] = df1_EnrolByGroups["Year"].astype(str)  # Convert to string for plotting
print(df1_EnrolByGroups["Year"].unique()) 


   Year Education_Level Gender Study_Status  Enrollment
0  2019  Undergraduates    Men    Full-Time     4543556
1  2020  Undergraduates    Men    Full-Time     4265661
2  2021  Undergraduates    Men    Full-Time     4139458
3  2022  Undergraduates    Men    Full-Time     4139295
4  2023  Undergraduates    Men    Full-Time     4266721 

[2019 2020 2021 2022 2023]
['Undergraduates' 'Graduates']
['Men' 'Women'] 

['2019' '2020' '2021' '2022' '2023']


In [171]:
# Total enrollment by year
df2_total = pd.read_csv("Data-2TotalEnrollment.csv")
print(df2_total.head())


   Year N_Institutes Total-Enrollment Undergraduates (UG) Total   UG-Women UG-Women %     UG-Men UG-Men % Graduates (GR) Total   GR-Women GR-Women %     GR-Men GR-Men % Women-Total (UG + GR) Men-Total (UG + GR) Difference (W - M)
0  2019        3,838       19,630,178                16,557,539  9,408,089     56.82%  7,149,450   43.18%            3,072,639  1,858,200     60.48%  1,214,439   39.52%            11,266,289           8,363,889          2,902,400
1  2020        3,894       19,027,410                15,884,559  9,219,464     58.04%  6,665,095   41.96%            3,142,851  1,922,913     61.18%  1,219,938   38.82%            11,142,377           7,885,033          3,257,344
2  2021        3,893       18,658,756                15,447,557  8,923,678     57.77%  6,523,879   42.23%            3,211,199  1,967,212     61.26%  1,243,987   38.74%            10,890,890           7,767,866          3,123,024
3  2022        3,928       18,583,497                15,399,866  8,823,139     5

In [None]:
# enrollment_dash_app.py

df1 = df1_EnrolByGroups.copy()

# Initialize app
app = dash.Dash(__name__)
app.title = "U.S. College Enrollment Trends"

# App Layout
app.layout = html.Div([
    html.H1("U.S. College Enrollment Trends 2019 - 2023", style={"color": "#1f77b4",'textAlign':'center'}),
    html.P("Display enrollment changes of U.S. degree-granting institutes over the years \
        by education level, gender, and study status.", style={'textAlign':'center','fontSize': 20}),
    html.P("Data source: The Integrated Postsecondary Education Data System (IPEDS) of the National Center for \
        Education Statistics (NCES)", style={"marginLeft": '248px', "fontSize": 15, "fontStyle": "italic"}),

    html.Hr(),  # Section divider for selection filters
    html.Div([
        html.Div([
            html.Label("Year Selection:", style={"color": 'darkblue', 'fontSize': 14, 'fontWeight': 'bold', 
                        "marginLeft": '500px', "marginRight": '8px'}),
            dcc.Checklist(
                ['2019', '2020', '2021', '2022', '2023'],  # Strings
                ['2019', '2020', '2021', '2022', '2023'],  # Default to all years
                inline=True,
                id="year-checklist")
        ], style={"display": 'flex', "alignItems": 'center', "color": 'darkblue'})  
    ], style={"width": "100%", "textAlign": "left", "padding": "10px"}),  #"justifyContent": "center"

    html.Div([
        html.Div([
            html.Label("Gender: ", style={"color": 'darkblue', 'fontSize': 14, 'fontWeight': 'bold', 
                        "marginLeft": '500px', "marginRight": '8px'}),
            dcc.Checklist(
                ['Women', 'Men'],
                ['Women', 'Men'],  # Default 
                inline=True,
                id="gender-checklist")
        ], style={"display": 'flex', "alignItems": 'left', 'fontSize': 14,})
    ], style={"width": "100%", "textAlign": "left", "padding": "10px"}), 

    html.Div([
        html.Div([
            html.Label("Study Status: ", style={"color": 'darkblue', 'fontSize': 14, 'fontWeight': 'bold', 
                        "marginLeft": '500px', "marginRight": '8px'}),
            dcc.Checklist(
                ['Full-Time', 'Part-Time'],
                ['Full-Time', 'Part-Time'],  # Default 
                inline=True,
                id="status-checklist")
        ], style={"display": 'flex', "alignItems": 'left', 'fontSize': 14,})
    ], style={ "textAlign": "left", "padding": "10px"}),
    
    # Secetion of plots
    html.Hr(style={"marginLeft": '450px', "marginRight": '520px'}),
    html.Div(id="multi-plot-container"),  # Container for all plots

    # Section of table 
    html.Hr(style={"marginLeft": '430px', "marginRight": '430px'}),
    # html.Hr(),
    html.H3("Summary Table: Total Enrollment by Education Level and Gender", style={"textAlign": "center"}),

    dash_table.DataTable(
        id='summary-table',
        columns=[{"name": i, "id": i} for i in df2_total.columns],
        data=df2_total.to_dict("records"),
        style_table={"overflowX": "auto", "margin": "2px"},
        style_cell={"textAlign": "center", "padding": "2px", "fontFamily": "Arial", "fontSize": 11},
        style_header={"backgroundColor": "lightgrey", "fontWeight": "bold"}
    ), 
],style={"marginBottom": "40px"})

@app.callback(
    Output("multi-plot-container", "children"),
    Input("year-checklist", "value"),
    Input("gender-checklist", "value"),
    Input("status-checklist", "value")
)
def update_graphs(selected_years, selected_gender, selected_status):
    # Step 1: Filter df1
    filtered_df = df1[
        df1["Year"].isin(selected_years) &
        df1["Gender"].isin(selected_gender) &
        df1["Study_Status"].isin(selected_status)
    ]

    plots = []
    final_line_figs = []

    for level in ["Undergraduates", "Graduates"]:
        df_sub = filtered_df[filtered_df["Education_Level"] == level]

        # Step 2: Group data
        data1 = df_sub.groupby("Year")["Enrollment"].sum().reset_index()
        data2 = (df_sub.groupby(["Year", "Gender"])["Enrollment"]
            .sum().reset_index())
        # Percentage of women by year
        total_by_year = data2.groupby("Year")["Enrollment"].sum().reset_index(name="Total")
        women_by_year = data2[data2["Gender"] == "Women"].groupby("Year")["Enrollment"].sum().reset_index(name="Women")
        data3 = pd.merge(total_by_year, women_by_year, on="Year", how="left")
        data3["Pct_Women"] = 100 * data3["Women"] / data3["Total"]

        # Step 3: Create line plot
        line_fig = px.line(
            data1,
            x="Year",
            y="Enrollment",
            markers=True,
            title=f"{level} - Total Enrollment Over Years",
            text="Enrollment" # value to be used as number label
        )
        line_fig.update_traces(
            mode='lines+markers+text',      # ensure text is shown
            textposition='top center',      # adjust if needed (try bottom too)
            texttemplate='%{text:,}',       # format with comma
            textfont=dict(size=10)
        )
        # Calculate y-axis range with padding
        y_min = data1["Enrollment"].min() * 0.98
        y_max = data1["Enrollment"].max() * 1.03

        line_fig.update_layout(
            template="plotly_white",
            width=700, height=350,
            margin=dict(t=40, l=100),
            yaxis=dict(range=[y_min, y_max]),  # Set y-axis range
        )

        # Step 4: Create bar plot
        bar_fig = px.bar(
            data2,
            x="Year",
            y="Enrollment",
            color="Gender",
            barmode="group",
            title=f"{level} - Enrollment by Gender"
        )
        bar_fig.update_layout(
            template="plotly_white",
            width=600, height=350,
            margin=dict(t=40)
        )

        # Step 5: Create line plot - Percentage
        line_fig2_w = px.line(
            data3,
            x="Year",
            y="Pct_Women",
            markers=True,
            title=f"{level} - % Women Enrollment by Year",
            text="Pct_Women"  # value to be used as percentage label
        )
        line_fig2_w.update_traces(
            mode='lines+markers+text',      # ensure text is shown
            textposition='top center',      # adjust if needed (try bottom too)
            texttemplate='%{text:.1f}%',     # format as percentage with one decimal
            textfont=dict(size=10)
        )
        line_fig2_w.update_layout(
            yaxis=dict(range=[50, 66]),  # Set y-axis range for percentage
            yaxis_title="Percentage of women (%)")
        final_line_figs.append(line_fig2_w)  # Collect for final display

        # Step 6: Add both plots side by side in one row
        row1 = html.Div([
            dcc.Graph(figure=line_fig, style={"display": "inline-block", "width": "48%"}),
            dcc.Graph(figure=bar_fig, style={"display": "inline-block", "width": "48%"})
        ], style={"display": "flex", "justifyContent": "space-between", "marginBottom": "30px"})

        plots.append(row1)

    # Add a section title or horizontal line before percentage plots
    plots.append(html.Hr())
    plots.append(html.H3("Percentage of Female Students Over Years", style={"textAlign": "center", "color": "darkblue"}))

    # Step 7: Callback for percentage line plots
    row_final = html.Div([
        dcc.Graph(figure=final_line_figs[0], style={"display": "inline-block", "width": "48%"}),
        dcc.Graph(figure=final_line_figs[1], style={"display": "inline-block", "width": "48%"})
    ], style={"display": "flex", "justifyContent": "space-between", "marginBottom": "40px"})

    plots.append(row_final)

    # Step 8: Return all rows (4+2 plots)
    return plots

# Run App
if __name__ == "__main__":
    app.run(debug=True)  # Use debug=True for development


#### Summary of U.S. College Enrollment 2019 - 2023
- **Undergraduates:** Overall enrollment decreased continuously from Fall 2019 to Fall 2022. In Fall 2023, enrollment rebounded to approximately the 2020 level.
- **Graduates:** The total enrollment increased consistently from Fall 2019 to Fall 2021. Since then, it has fluctuated around 3.2 million students. 
- **By Gender:** More than 57% of undergraduates and approximately 61% of graduates were female. Across all institutions (n ≥ 3,838), female enrollment exceeded male enrollment by at least 2.9 million students.
- **By study status:** Overall, more than 60% of enrolled students were full-time.


