# Interactive Dashboard with HoloViz

| Date | User | Change Type | Remarks |  
| ---- | ---- | ----------- | ------- |
| 20/03/2025   | Martin | Create  | Created notebook | 
| 21/03/2025   | Martin | Create  | Completed data cleaning process | 
| 22/03/2025   | Martin | Create  | Completed data cleaning process | 

# Content

* [Introduction](#introduction)
* [Setup](#setup)
* [The Data](#the-data)
* [The Dashboard](#the-dashboard)

# Introduction

This notebook complements the associated article to create interactive dashboards using components from the HoloViz Suite, more specifically, Panel and hvPlot. These are data visualisations that allow for user interaction to dive deeper into the data.

In this tutorial, we will perform some data cleaning prior to the visualisation tutorial and explore the Graduate Employment Survey (GES) conducted by Universities in Singapore from 2013-2022.

# Setup

If you have `poetry` installed you can install install dependencies directly after cloning the project:

```
poetry install
```

Otherwise you can use the `requirements.txt` inside the repo using `pip` after creating a virtual environment

```
pip install -r requirements.txt
```

# The Data

_More details about the data can be found in the article or from the [source](https://data.gov.sg/datasets/d_3c55210de27fcccda2ed0c63fdd2b352/view)_

The Graduate Employment Survey (GES) is jointly conducted by NTU, NUS, SMU, SIT, SUTD and SUSS (Universities in Singapore) annually to survey the employment conditions of graduates about six months after their final examinations. The results are published by the Ministry of Education (MOE)

Here we will perform some data cleaning to ensure a good format for visualisation later.

In [17]:
import pandas as pd
import numpy as np
import re
import hvplot.pandas
import panel as pn
from datetime import datetime

pn.extension()

In [18]:
df = pd.read_csv("data/GraduateEmploymentSurveyNTUNUSSITSMUSUSSSUTD.csv")
df.head()

Unnamed: 0,year,university,school,degree,employment_rate_overall,employment_rate_ft_perm,basic_monthly_mean,basic_monthly_median,gross_monthly_mean,gross_monthly_median,gross_mthly_25_percentile,gross_mthly_75_percentile
0,2013,Nanyang Technological University,College of Business (Nanyang Business School),Accountancy and Business,97.4,96.1,3701,3200,3727,3350,2900,4000
1,2013,Nanyang Technological University,College of Business (Nanyang Business School),Accountancy (3-yr direct Honours Programme),97.1,95.7,2850,2700,2938,2700,2700,2900
2,2013,Nanyang Technological University,College of Business (Nanyang Business School),Business (3-yr direct Honours Programme),90.9,85.7,3053,3000,3214,3000,2700,3500
3,2013,Nanyang Technological University,College of Business (Nanyang Business School),Business and Computing,87.5,87.5,3557,3400,3615,3400,3000,4100
4,2013,Nanyang Technological University,College of Engineering,Aerospace Engineering,95.3,95.3,3494,3500,3536,3500,3100,3816


First we'll perform some basic data sanitisation - removing NA values and declaring the column data types

In [19]:
# Remove the rows with na (str)
df = df[df['employment_rate_overall'] != 'na']

# Specify the datatypes of columns
cols_to_change = [col for col in df.columns if col not in 
                  ['year', 'university', 'school', 'degree']]
df = df.astype(dtype={
  col: 'float64' for col in cols_to_change
})

Next, we observe that for each University there might be different names for the degrees and schools, and for degrees there are distinctions between the level of degree obtained (Hons, Cum Laude, etc.). As such, we attempt to remove these distinctions and separate them into different columns to ensure better separation

_Note: The cleaning might not be 100% accurate, so please excuse some missed distinctions_

In [20]:
# Change the datatype of year column
df['year'] = pd.to_datetime(df['year'], format="%Y")

In [21]:
# Cleaning the school column
# Remove any details from brackets
df['school'] = df['school'].str.replace(r'\(.*?\)', '', regex=True)

# Remove any special characters from the back
df['school'] = df['school'].str.replace(r'[\*|\\|\#]+', '', regex=True)

# Remove the white space after dash
df['school'] = df['school'].str.replace(r'-\s', '-', regex=True)

# Remove leading and trailing whitespace
df['school'] = df['school'].str.strip()

In [22]:
# Cleaning the degree column
# Remove special characters from the back
df['degree'] = df['degree'].str.replace(r'[\*|\\|\#|\^|\.]+', '', regex=True)

# Extract out if they were honours or cum laude programs
df['advanced'] = np.where(df['degree'].str.contains(r'Honours|\(Hons\)|Cum\s+Laude'), 1, 0)
remove_advanced = r'\s+with\s+Honours|\(Hons\)|\(?Cum\sLaude\sand\sabove\)?'
df['degree'] = df['degree'].str.replace(remove_advanced, '', regex=True)

# Remove the length of degree
df['degree'] = df['degree'].str.replace(r'\([^()]*\d[^()]*\)', '', regex=True)

# Remove non-degree related terms
df['degree'] = df['degree'].str.replace(r'\(LLB\)|\(MBBS\)|\(Land\)', '', regex=True)

# Some degree types are hidden between brackets so we extract them
temp = df['degree'].str.extract(r'\(([^)]+)\)')
df.loc[temp[~temp[0].isna()].index, 'degree'] = temp[~temp[0].isna()][0]

# Some degrees are also only expressed after the word "in"
temp = df['degree'].str.extract(r'\bin\b\s+(.*?)$')
df.loc[temp[~temp[0].isna()].index, 'degree'] = temp[~temp[0].isna()][0]

# Remove term "Bachelor of"
df['degree'] = df['degree'].str.replace(r'Bachelor\sof\s?', '', regex=True, case=False)

# Replace some special characters with their word equivalents
df['degree'] = df['degree'].str.replace('&', 'and')
df['degree'] = df['degree'].str.replace('/', ' and ')
df['degree'] = df['degree'].str.replace('with', '')
df['degree'] = df['degree'].str.replace(r'\s+', ' ', regex=True)
df['degree'] = df['degree'].str.replace(r's$', '', regex=True)

# Remove leading and trailing whitespace
df['degree'] = df['degree'].str.strip()

# Reset the index
df = df.reset_index(drop=True)

In [23]:
df.head()

Unnamed: 0,year,university,school,degree,employment_rate_overall,employment_rate_ft_perm,basic_monthly_mean,basic_monthly_median,gross_monthly_mean,gross_monthly_median,gross_mthly_25_percentile,gross_mthly_75_percentile,advanced
0,2013-01-01,Nanyang Technological University,College of Business,Accountancy and Busines,97.4,96.1,3701.0,3200.0,3727.0,3350.0,2900.0,4000.0,0
1,2013-01-01,Nanyang Technological University,College of Business,Accountancy,97.1,95.7,2850.0,2700.0,2938.0,2700.0,2700.0,2900.0,1
2,2013-01-01,Nanyang Technological University,College of Business,Business,90.9,85.7,3053.0,3000.0,3214.0,3000.0,2700.0,3500.0,1
3,2013-01-01,Nanyang Technological University,College of Business,Business and Computing,87.5,87.5,3557.0,3400.0,3615.0,3400.0,3000.0,4100.0,0
4,2013-01-01,Nanyang Technological University,College of Engineering,Aerospace Engineering,95.3,95.3,3494.0,3500.0,3536.0,3500.0,3100.0,3816.0,0


---

# The Dashboard

Now that the data is prepared, we'll create individual plots while describing how each of them can be used to interpret some details about the data set. Then we'll combine into a complete dashboard for viewing.

In [24]:
# You can load the data using this function if you want to jump straight to the dashboard

## Create the control scheme

We create controls across the different columns that will allow the users to filter the data according what they want to observe

In [36]:
# Create various selection groups
universities = list(df['university'].unique())
years = list(df['year'].unique())
degrees = list(df['degree'].unique())

In [40]:
# Checkbox group for universities
## Allows users to select multiple universities at the same time to compare them
uni_checkbox_group = pn.widgets.CheckBoxGroup(
  name="Select Universities:",
  value=universities,
  options=universities,
)

## Allow users to select the range of years they want to see
year_slider = pn.widgets.DateRangeSlider(
  name='Graduate Date Range',
  start=years[0],
  end=years[-1],
  value=(years[0], years[-1]),
  step=1,
  format="%Y"
)

## Allow users to select multiple degrees at the same time
degree_multi_select = pn.widgets.MultiChoice(
  name='Degrees to Compare',
  options=degrees
)

pn.Column(
    pn.Row(uni_checkbox_group),
    pn.Row(year_slider, degree_multi_select)
)

BokehModel(combine_events=True, render_bundle={'docs_json': {'fcacf88d-a287-42f5-b2c5-d15ffdc22123': {'version…

## Plot 1: Line Plot 

We'll first build a line plot that showcases the changing numerical details (employment percentage, mean monthly income, etc.) across the different years. The plot should be able to filter according to school and degree.

This will allow the user to filter and compare the job acquisition performance of the same degree across different schools.

In [None]:
df.hvplot(x='year', )