# **Azerbaijani School Graduate Enrollment Indicators (1995-2023)**


This dataset compiles key enrollment indicators of school graduates from 1995 to 2023, presenting a comprehensive view of their performance in entrance exams and subsequent acceptance into higher education institutions. It includes data on both male and female graduates, offering insights into gender-specific trends and performances. The dataset features a generalized rating system that compares the school graduates' performance against the republic's average level, providing a nuanced understanding of educational outcomes over nearly three decades.

Encompassing data from all schools within the Azerbaijani region, the dataset, consisting of approximately 83,000 rows, is an invaluable asset for predictive analytics, facilitating the forecast of future university acceptance rates based on historical patterns. The dataset is structured into rows, each representing a unique combination of a school and academic year, with columns for each recorded indicator, such as attendance, average scores, and acceptance rates into different types of higher education institutions.


[Click for much more information](https://huggingface.co/datasets/nijatzeynalov/az-school-graduate-enrollment)

# **Import Libraries**

In [2]:

import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px



# **Load and Check Data**

In [3]:
!pip3 install datasets
from huggingface_hub.repocard import RepoCard

from datasets import load_dataset

dataset = load_dataset("nijatzeynalov/az-school-graduate-enrollment")

Collecting datasets
  Downloading datasets-2.19.1-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub>=0.21.2 (from datasets)
  Downloading huggingface_hub-0.23.0-py3-none-a

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/4.32k [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


Downloading data:   0%|          | 0.00/13.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/82868 [00:00<?, ? examples/s]

In [4]:
dataset

DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0', 'school_name', 'region', 'school_code', 'year', 'rating_b', 'rating_g', 'attendance_b', 'attendance_g', 'attendance_mean_points_b', 'attendance_mean_points_g', 'accepted_mean_points_b', 'accepted_mean_points_g', 'accepted_scholarship_b', 'accepted_scholarship_g', 'accepted_tution_b', 'accepted_tution_g', 'accepted_private_b', 'accepted_private_g', 'accepted_b', 'accepted_g', 'advanced_graduate'],
        num_rows: 82868
    })
})

In [5]:
dataset.keys()

dict_keys(['train'])

In [6]:
df = pd.DataFrame(dataset['train'])

In [7]:
df.head()

Unnamed: 0.1,Unnamed: 0,school_name,region,school_code,year,rating_b,rating_g,attendance_b,attendance_g,attendance_mean_points_b,...,accepted_mean_points_g,accepted_scholarship_b,accepted_scholarship_g,accepted_tution_b,accepted_tution_g,accepted_private_b,accepted_private_g,accepted_b,accepted_g,advanced_graduate
0,0,2 N-li orta məktəb,BAKI ŞƏHƏRİ BİNƏQƏDİ RAYONU,12002,1997,1.899,0.736,29.0,45.0,206.666,...,234.881,12.0,3.0,8.0,6.0,2.0,5.0,22.0,14.0,
1,1,3 saylı orta məktəb,Bakı ş. Binəqədi rayonu,12003,1995,1.261,1.042,18.0,51.0,162.94,...,302.001,6.0,10.0,3.0,7.0,0.0,0.0,9.0,17.0,1.0
2,2,3 saylı orta məktəb,Bakı ş. Binəqədi rayonu,12003,1996,0.852,1.101,23.0,49.0,153.986,...,271.886,4.0,12.0,3.0,6.0,0.0,1.0,7.0,19.0,1.0
3,3,3 saylı orta məktəb,Bakı ş. Binəqədi rayonu,12003,1997,1.603,0.637,49.0,56.0,177.551,...,272.372,9.0,5.0,19.0,7.0,5.0,1.0,33.0,13.0,2.0
4,4,3 saylı orta məktəb,Bakı ş. Binəqədi rayonu,12003,1998,2.34,1.154,27.0,42.0,248.414,...,243.409,6.0,6.0,9.0,11.0,5.0,3.0,20.0,20.0,2.0


In [8]:
print(dataset.keys())

dict_keys(['train'])


In [9]:
df = pd.DataFrame(dataset['train'])





In [10]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [11]:
df.head()

Unnamed: 0.1,Unnamed: 0,school_name,region,school_code,year,rating_b,rating_g,attendance_b,attendance_g,attendance_mean_points_b,attendance_mean_points_g,accepted_mean_points_b,accepted_mean_points_g,accepted_scholarship_b,accepted_scholarship_g,accepted_tution_b,accepted_tution_g,accepted_private_b,accepted_private_g,accepted_b,accepted_g,advanced_graduate
0,0,2 N-li orta məktəb,BAKI ŞƏHƏRİ BİNƏQƏDİ RAYONU,12002,1997,1.899,0.736,29.0,45.0,206.666,148.907,248.523,234.881,12.0,3.0,8.0,6.0,2.0,5.0,22.0,14.0,
1,1,3 saylı orta məktəb,Bakı ş. Binəqədi rayonu,12003,1995,1.261,1.042,18.0,51.0,162.94,200.323,243.712,302.001,6.0,10.0,3.0,7.0,0.0,0.0,9.0,17.0,1.0
2,2,3 saylı orta məktəb,Bakı ş. Binəqədi rayonu,12003,1996,0.852,1.101,23.0,49.0,153.986,198.486,268.215,271.886,4.0,12.0,3.0,6.0,0.0,1.0,7.0,19.0,1.0
3,3,3 saylı orta məktəb,Bakı ş. Binəqədi rayonu,12003,1997,1.603,0.637,49.0,56.0,177.551,156.324,236.288,272.372,9.0,5.0,19.0,7.0,5.0,1.0,33.0,13.0,2.0
4,4,3 saylı orta məktəb,Bakı ş. Binəqədi rayonu,12003,1998,2.34,1.154,27.0,42.0,248.414,175.572,317.317,243.409,6.0,6.0,9.0,11.0,5.0,3.0,20.0,20.0,2.0


In [12]:
df = df.drop(columns=['Unnamed: 0'])

In [13]:
df.tail()

Unnamed: 0,school_name,region,school_code,year,rating_b,rating_g,attendance_b,attendance_g,attendance_mean_points_b,attendance_mean_points_g,accepted_mean_points_b,accepted_mean_points_g,accepted_scholarship_b,accepted_scholarship_g,accepted_tution_b,accepted_tution_g,accepted_private_b,accepted_private_g,accepted_b,accepted_g,advanced_graduate
82863,Şıxbağı kənd orta məktəbi,Zərdab rayonu,71037,2021,0.373,1.124,3.0,2.0,116.267,283.5,230.9,463.6,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,
82864,Şıxbağı kənd orta məktəbi,Zərdab rayonu,71037,2022,0.0,1.195,2.0,3.0,226.6,283.667,0.0,374.3,0.0,2.0,0.0,0.0,0.0,0.0,0.0,2.0,
82865,Zərdab peşə liseyi,Zərdab rayonu,71501,1998,1.452,0.0,1.0,0.0,145.835,0.0,145.835,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,
82866,Zərdab peşə liseyi,Zərdab rayonu,71501,2003,2.393,0.0,1.0,0.0,289.75,0.0,289.75,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,
82867,Zərdab peşə liseyi,Zərdab rayonu,71501,2005,0.0,0.0,1.0,0.0,24.893,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


In [14]:
df.describe()

Unnamed: 0,school_code,year,rating_b,rating_g,attendance_b,attendance_g,attendance_mean_points_b,attendance_mean_points_g,accepted_mean_points_b,accepted_mean_points_g,accepted_scholarship_b,accepted_scholarship_g,accepted_tution_b,accepted_tution_g,accepted_private_b,accepted_private_g,accepted_b,accepted_g,advanced_graduate
count,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,82868.0,29726.0
mean,60278.638111,2009.38927,0.725867,0.68156,9.271154,8.71031,175.102391,179.725372,225.004205,217.353743,1.643469,1.596551,1.811821,1.603828,0.489163,0.408131,3.94513,3.608643,3.839904
std,24327.744593,7.802115,0.729381,0.739055,13.054346,13.18864,105.894708,123.538289,173.46124,191.206355,3.275764,3.283747,3.38756,3.178868,1.225231,1.116467,6.949755,6.757616,5.552227
min,11013.0,1995.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
25%,43008.0,2003.0,0.0,0.0,2.0,1.0,100.089,84.17325,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
50%,62022.0,2010.0,0.638,0.553,5.0,4.0,172.75,177.8145,270.704,272.0825,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0
75%,80064.0,2016.0,1.128,1.166,10.0,10.0,240.0,266.2845,360.6,384.94075,2.0,2.0,2.0,2.0,0.0,0.0,4.0,4.0,4.0
max,99022.0,2022.0,5.95,6.195,407.0,172.0,695.0,680.0,695.0,700.0,56.0,52.0,59.0,61.0,32.0,33.0,106.0,98.0,130.0


In [15]:
df.shape

(82868, 21)

# **Variable Description**

**school_name:** The official name of the school.

**region:** The geographical region or administrative area where the school is located.

**school_code**: A unique identifier assigned to the school, often used for administrative and tracking purposes.

**year:** The academic year for which the data is reported.

**rating_b:** The generalized rating for male graduates based on their entrance exam results compared to the republic's average level.

**rating_g:** The generalized rating for female graduates based on their entrance exam results compared to the republic's average level.

**attendance_b**: The number of male graduates who attended the entrance exams.

**attendance_g:** The number of female graduates who attended the entrance exams.

**attendance_mean_points_b:** The average score obtained by male graduates in the entrance exams.

**attendance_mean_points_g:** The average score obtained by female graduates in the entrance exams.

**accepted_mean_points_b:** The average score of male graduates who were accepted into institutions.

**accepted_mean_points_g:** The average score of female graduates who were accepted into institutions.

**accepted_scholarship_b:** The number of male graduates accepted on a scholarship basis.

**accepted_scholarship_g:** The number of female graduates accepted on a scholarship basis.

**accepted_tution_b:** The number of male graduates accepted on a tuition-paying basis.

**accepted_tution_g:** The number of female graduates accepted on a tuition-paying basis.

**accepted_private_b:** The number of male graduates accepted into private institutions.

**accepted_private_g:** The number of female graduates accepted into private institutions.

**accepted_b:** The total number of male graduates accepted into any form of higher education institution.

**accepted_g:** The total number of female graduates accepted into any form of higher education institution.

**advanced_graduate:** Indicates graduates who achieved notably high results, potentially including distinctions or honors.

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 82868 entries, 0 to 82867
Data columns (total 21 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   school_name               82868 non-null  object 
 1   region                    82868 non-null  object 
 2   school_code               82868 non-null  int64  
 3   year                      82868 non-null  int64  
 4   rating_b                  82868 non-null  float64
 5   rating_g                  82868 non-null  float64
 6   attendance_b              82868 non-null  float64
 7   attendance_g              82868 non-null  float64
 8   attendance_mean_points_b  82868 non-null  float64
 9   attendance_mean_points_g  82868 non-null  float64
 10  accepted_mean_points_b    82868 non-null  float64
 11  accepted_mean_points_g    82868 non-null  float64
 12  accepted_scholarship_b    82868 non-null  float64
 13  accepted_scholarship_g    82868 non-null  float64
 14  accept

In [17]:

null_summary = df.isnull().sum()/ len(df) * 100
null_summary.sort_values(ascending=False)


advanced_graduate           64.128494
accepted_mean_points_g       0.000000
accepted_g                   0.000000
accepted_b                   0.000000
accepted_private_g           0.000000
accepted_private_b           0.000000
accepted_tution_g            0.000000
accepted_tution_b            0.000000
accepted_scholarship_g       0.000000
accepted_scholarship_b       0.000000
school_name                  0.000000
region                       0.000000
attendance_mean_points_g     0.000000
attendance_mean_points_b     0.000000
attendance_g                 0.000000
attendance_b                 0.000000
rating_g                     0.000000
rating_b                     0.000000
year                         0.000000
school_code                  0.000000
accepted_mean_points_b       0.000000
dtype: float64

In [18]:

df.duplicated().sum()

0

In [19]:
df.dropna(subset=['advanced_graduate'], inplace=True)

In [20]:
df.isnull().sum()

school_name                 0
region                      0
school_code                 0
year                        0
rating_b                    0
rating_g                    0
attendance_b                0
attendance_g                0
attendance_mean_points_b    0
attendance_mean_points_g    0
accepted_mean_points_b      0
accepted_mean_points_g      0
accepted_scholarship_b      0
accepted_scholarship_g      0
accepted_tution_b           0
accepted_tution_g           0
accepted_private_b          0
accepted_private_g          0
accepted_b                  0
accepted_g                  0
advanced_graduate           0
dtype: int64

In [21]:
df.shape

(29726, 21)

In [22]:
# df['school_code'].nunique()

In [23]:
# df['region'].nunique()

In [24]:
# df['year'].nunique()

In [25]:
# df['school_name'].nunique()

In [26]:
# df[['region','school_name']]



In [27]:
# df['region'].unique()

## **There are many cardinal values.For simplicity,I did analyze for only Baku city,not for whole country.So I created df_baku dataset. There are given districts of Baku city.**

In [28]:
df_baku = df[df['region'].str.contains('Bakı')]


In [29]:
# df_baku['region'].unique()

In [30]:
df_baku = df_baku[df_baku['region'] != 'Bakı şəhəri']



In [31]:
df_baku['region'].replace('Bakı ş. ','', regex=True, inplace=True)

In [32]:
# df_baku['region'].unique()

#**Exploratory Data Analysis (EDA) using visualizations**



In [34]:
# Melt the DataFrame to a long format for Plotly Express
df_melted_attendance = df_baku.melt(id_vars='region', value_vars=['attendance_b', 'attendance_g'],
                                               var_name='Gender', value_name='Attendance')

# Replace 'attendance_b' and 'attendance_g' with more descriptive labels
df_melted_attendance['Gender'] = df_melted_attendance['Gender'].replace({'attendance_b': 'Male', 'attendance_g': 'Female'})

# Sort the DataFrame by 'Attendance' in descending order
df_melted_attendance = df_melted_attendance.sort_values(by='Attendance', ascending=False)

# Create the stacked bar chart with custom colors
fig = px.histogram(df_melted_attendance, x='region', y='Attendance', color='Gender',
             title='Attendance of Graduates by Region and Gender',
             labels={'region': 'Region', 'Attendance': 'Attendance', 'Gender': 'Gender'},
             color_discrete_map={'Male': '#FF5733', 'Female': '#33FF57'})

# Show the figure
fig.show()

*From the chart ,we can see the graduates of  schools located in Pirallahi district belong to the category that participates in the entrance exams the least.This is the exact opposite for Khatai district.*

In [33]:

# Melt the DataFrame to a long format for Plotly Express
df_melted = df_baku.melt(id_vars='region', value_vars=['rating_b', 'rating_g'],
                         var_name='Gender', value_name='Rating')

# Replace 'rating_b' and 'rating_g' with more descriptive labels
df_melted['Gender'] = df_melted['Gender'].replace({'rating_b': 'Male', 'rating_g': 'Female'})

# Create the stacked bar chart
fig = px.histogram(df_melted, x='region', y='Rating', color='Gender',
             title='Generalized Rating by Region and Gender',
             labels={'region': 'Region', 'Rating': 'Rating', 'Gender': 'Gender'},color_discrete_map={'Male': '#1f77b4', 'Female': '#ff7f0e'})

# Show the figure
fig.show()


In [35]:
df_melted_scholarship = df_baku.melt(id_vars='region', value_vars=['accepted_scholarship_b', 'accepted_scholarship_g'],
                                     var_name='Gender', value_name='Accepted')

df_melted_tuition = df_baku.melt(id_vars='region', value_vars=['accepted_tution_b', 'accepted_tution_g'],
                                 var_name='Gender', value_name='Accepted')

# Replace column names with more descriptive labels
df_melted_scholarship['Gender'] = df_melted_scholarship['Gender'].replace({
    'accepted_scholarship_b': 'Male (Scholarship)',
    'accepted_scholarship_g': 'Female (Scholarship)'
})

df_melted_tuition['Gender'] = df_melted_tuition['Gender'].replace({
    'accepted_tuition_b': 'Male (Tuition)',
    'accepted_tuition_g': 'Female (Tuition)'
})

# Sort the melted DataFrames by 'Accepted' in descending order
df_melted_scholarship = df_melted_scholarship.sort_values(by='Accepted', ascending=False)
df_melted_tuition = df_melted_tuition.sort_values(by='Accepted', ascending=False)

# Create stacked bar charts
fig1 = px.histogram(df_melted_scholarship, x='region', y='Accepted', color='Gender',
              title='Accepted Male vs. Female Graduates on Scholarship Basis',
              labels={'region': 'Region', 'Accepted': 'Accepted', 'Gender': 'Gender'})

fig2 = px.histogram(df_melted_tuition, x='region', y='Accepted', color='Gender',
              title='Accepted Male vs. Female Graduates on Tuition-Paying Basis',
              labels={'region': 'Region', 'Accepted': 'Accepted', 'Gender': 'Gender'})

# Display charts in one row
fig1.show()
fig2.show()

In [36]:
# Melt the DataFrame to a long format for Plotly Express
df_melted_private = df_baku.melt(id_vars='region', value_vars=['accepted_private_b', 'accepted_private_g'],
                                 var_name='Gender', value_name='Accepted')

df_melted_total = df_baku.melt(id_vars='region', value_vars=['accepted_b', 'accepted_g'],
                               var_name='Gender', value_name='Accepted')

# Replace column names with more descriptive labels
df_melted_private['Gender'] = df_melted_private['Gender'].replace({
    'accepted_private_b': 'Male ',
    'accepted_private_g': 'Female'
})

df_melted_total['Gender'] = df_melted_total['Gender'].replace({
    'accepted_b': 'Male ',
    'accepted_g': 'Female '
})

# Sort the melted DataFrames by 'Accepted' in descending order
df_melted_private = df_melted_private.sort_values(by='Accepted', ascending=False)
df_melted_total = df_melted_total.sort_values(by='Accepted', ascending=False)

# Create stacked bar charts
fig1 = px.histogram(df_melted_private, x='region', y='Accepted', color='Gender',
              title='Accepted Male vs. Female Graduates into Private Institutions ',
              labels={'region': 'Region', 'Accepted': 'Accepted', 'Gender': 'Gender'},
              color_discrete_map={'Male ': '#4682B4', 'Female': '#FF69B4'})

fig2 = px.histogram(df_melted_total, x='region', y='Accepted', color='Gender',
              title='Total Accepted Male vs. Female Graduates into Higher Education ',
              labels={'region': 'Region', 'Accepted': 'Accepted', 'Gender': 'Gender'},
              color_discrete_map={'Male ': '#4682B4', 'Female ': '#FF69B4'})

# Display charts in one row
fig1.show()
fig2.show()

In [38]:

import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Create subplots with 1 row and 2 columns
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]],
                    subplot_titles=['Average Scores of Accepted Male Graduates by Region',
                                    'Average Scores of Accepted Female Graduates by Region'])

# Add donut chart for accepted_mean_points_b with a color palette
fig.add_trace(go.Pie(labels=df_baku['region'], values=df_baku['accepted_mean_points_b'],
                     marker=dict(colors=px.colors.sequential.Blues), hole=0.4), 1, 1)

# Add donut chart for accepted_mean_points_g with a color palette
fig.add_trace(go.Pie(labels=df_baku['region'], values=df_baku['accepted_mean_points_g'],
                     marker=dict(colors=px.colors.sequential.Pinkyl), hole=0.4), 1, 2)
# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent+name")
fig.update_layout(title_text='Average Scores of Accepted Graduates by Region',
                  legend_title='Region',
                  legend=dict(yanchor="top", y=0.9, xanchor="right", x=0.58),
                  annotations=[
                      dict(text='Male', x=0.22, y=0.5, font_size=20, showarrow=False),
                      dict(text='Female', x=0.78, y=0.5, font_size=20, showarrow=False)
                  ])

# Show the figure
fig.show()

*From the chart we  can see that average scores of accepted male and female graduates are the most for schools located in Khatai district.*

In [44]:
# Aggregate data by year
df_agg = df_baku.groupby('year').agg({'accepted_b':'sum', 'accepted_g':'sum'}).reset_index()

# Smooth the data using rolling average
df_agg['accepted_b_smooth'] = df_agg['accepted_b'].rolling(window=5, min_periods=1).mean()
df_agg['accepted_g_smooth'] = df_agg['accepted_g'].rolling(window=5, min_periods=1).mean()

# Create line graph with smoothed data
fig = px.line(df_agg, x='year', y=['accepted_b_smooth', 'accepted_g_smooth'],
              title='Total Number of Accepted Graduates by Year ',
              labels={'year': 'Year', 'value': 'Number of Graduates', 'variable': 'Gender'},
              color_discrete_map={'accepted_b_smooth': 'blue', 'accepted_g_smooth': 'red'})

# Show the figure
fig.show()

*Majority  of accepted graduates were in the year of 2021 .But for this year,accepted female graduates were the majority.*

In [35]:
# Aggregate data by year
df_agg = df_baku.groupby('year').agg({'advanced_graduate':'sum'}).reset_index()

# Smooth the data using rolling average
df_agg['advanced_graduate_smooth'] = df_agg['advanced_graduate'].rolling(window=5, min_periods=1).mean()

# Create line graph with smoothed data
fig = px.line(df_agg, x='year', y='advanced_graduate_smooth',
              title='Advanced Graduates by Year (Smoothed)',
              labels={'year': 'Year', 'advanced_graduate_smooth': 'Number of Advanced Graduates'},
              color_discrete_sequence=['#FF69B4'])

# Show the figure
fig.show()