# Service performance data

## Introduction - what is this for

This notebook will ingest and process service-related data into ready-to-use csv files for visualization purposes or further analysis.

The following datasets will be consulted:

**GC Service Inventory and Service Performance**: An inventory of Government of Canada services, their associated service standards and performance<br>
https://open.canada.ca/data/en/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c

**Departmental Plans and Departmental Results Reports**: Expenditures and Full Time Equivalents (FTE) by Program and by Organization<br>
https://open.canada.ca/data/en/dataset/a35cf382-690c-4221-a971-cf0fd189a46f/resource/64774bc1-c90a-4ae2-a3ac-d9b50673a895

### Utilities built and shared specifically for this purpose:
https://github.com/gc-performance/utilities

**Department name variant list**: A list of every organization, department, agency, with their various associated names in order to align to a single numeric ID per department.  

**Program-service id correspondence**: Converting the long-form program names in the 2018 service inventory to the program id's from the Departmental Plans, Departmental Results Reports.

### Utilities from elsewhere online
**Inventory of federal organisations and interests**: A tidy list of organisation names that includes a single numeric ID. Is the basis for the variant list id. Built for GC Infobase.<br>
English: https://open.canada.ca/data/en/dataset/a35cf382-690c-4221-a971-cf0fd189a46f/resource/7c131a87-7784-4208-8e5c-043451240d95

French: https://open.canada.ca/data/en/dataset/a35cf382-690c-4221-a971-cf0fd189a46f/resource/45069fe9-abe3-437f-97dd-3f64958bfa85

### Conventions

Whenever a 4-digit year represents a fiscal year, the 4-digit year is the calendar year during which the fiscal year **ended**

## Setting up environment

In [1]:
import pandas as pd
pd.set_option("display.max_rows", 100)

import numpy as np
import re
import pytz
import os
from pathlib import Path

- Define the current date and time for use in output filename timestamps

In [2]:
# Specify date and time in correct timezone
timezone = pytz.timezone('America/Montreal')
current_datetime = pd.Timestamp.now(tz=timezone)
current_datetime_str = current_datetime.strftime("%Y-%m-%d_%H:%M:%S")
print(f'Current datetime: {current_datetime_str}')

Current datetime: 2025-01-17_09:52:49


Define some helper functions
- String cleaner for matching text values on to one another
- Split, uppercase, sort function for multiple choice columns that need to be normalized
- Clean percentage value

In [3]:
# Helper functions
# String cleaner function
def normalize_string(s):
    # Remove all non-alphanumeric characters (special characters and spaces)
    s = re.sub(r'[^A-Za-z0-9]', '', s)
    # Convert to uppercase
    return s.upper()

# Define a transformation function to split column on commas, convert to uppercase, return sorted to string
def split_and_uppercase_to_sorted_string(value):
    return ', '.join(sorted(val.replace(' ','').upper() for val in value.split(',')))

# Clean and normalize percentages on 0-100
def clean_percentage(value):
    try:
        # Remove non-numeric characters (e.g., '%') and convert to float
        numeric_value = float(str(value).replace('%', '').strip())
        return numeric_value / 100  # Normalize to 0-1
    except (ValueError, TypeError):
        return None  # Handle invalid entries

In [4]:
# Download utility tables
# Department variant list
org_var = pd.read_csv("https://raw.githubusercontent.com/gc-performance/utilities/master/goc-org-variants.csv").set_index('org_name_variant')

# Program-service id correspondence
serv_prog = pd.read_csv("https://raw.githubusercontent.com/gc-performance/utilities/master/goc-service-program.csv")

# Department name list (Inventory of federal organisations and interests)
ifoi_en = pd.read_csv('https://open.canada.ca/data/dataset/a35cf382-690c-4221-a971-cf0fd189a46f/resource/7c131a87-7784-4208-8e5c-043451240d95/download/ifoi_roif_en.csv')
ifoi_fr = pd.read_csv('https://open.canada.ca/data/dataset/a35cf382-690c-4221-a971-cf0fd189a46f/resource/45069fe9-abe3-437f-97dd-3f64958bfa85/download/ifoi_roif_fr.csv')

In [5]:
# Set up bilingual department name reference table

dept_en = ifoi_en.iloc[:,:3]
dept_en['department_en'] = dept_en.iloc[:,2].fillna(dept_en.iloc[:,1])

dept_fr = ifoi_fr.iloc[:,:3]
dept_fr['department_fr'] = dept_fr.iloc[:,2].fillna(dept_fr.iloc[:,1])

dept = pd.merge(
    dept_en,
    dept_fr,
    on='OrgID',
)

dept = dept.loc[:, ['OrgID', 'department_en', 'department_fr']]
dept.rename(columns={'OrgID':'org_id'}, inplace=True)

## Combining historical and live service inventory and standard data

Data collection changed in 2024 to allow departments to publish their own datasets directly to Open Government. With this change came some minor differences in the format and content between the 2018-2023 historical dataset currently on open government. In order to use the full dataset with all years, the following script merges the historical and current service inventory and service standard datasets.

### Service inventory

In [6]:
# Download service inventory datasets
si_2018 = pd.read_csv(
    "https://open.canada.ca/data/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c/resource/3acf79c0-a5f5-4d9a-a30d-fb5ceba4b60a/download/service_inventory_2018-2023.csv", 
    na_values=[],
    keep_default_na=False
)

si_2024 = pd.read_csv(
    "https://open.canada.ca/data/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c/resource/c0cf9766-b85b-48c3-b295-34f72305aaf6/download/service.csv", 
    na_values=[],
    keep_default_na=False
)

# Compare columns
si_2018_columns = set(si_2018.columns)
si_2024_columns = set(si_2024.columns)

# print('Columns only in 2018:', si_2018_columns-si_2024_columns)
# print('Columns only in 2024:', si_2024_columns-si_2018_columns)

In [7]:
#Rename columns in 2018 dataset to align to the 2024 dataset's conventions
rename_2018_si_columns = {
    'client_feedback':'client_feedback_channel',
    'e_registration':'os_account_registration',
    'e_authentication':'os_authentication',
    'e_application':'os_application',
    'e_decision':'os_decision',
    'e_issuance':'os_issuance',
    'e_feedback':'os_issue_resolution_feedback',
    'online_comments_en':'os_comments_client_interaction_en',
    'online_comments_fr':'os_comments_client_interaction_fr',
    'last_year_of_service_review':'last_service_review',
    'last_year_of_service_improvement_based_on_client_feedback':'last_service_improvement',
    'use_of_CRA_number':'sin_usage',
    'use_of_SIN_number':'cra_bn_identifier_usage',
    'calls_received':'num_phone_enquiries',
    'telephone_applications':'num_applications_by_phone',
    'web_visits':'num_website_visits',
    'online_applications':'num_applications_online',
    'in_person_applications':'num_applications_in_person',
    'postal_mail_applications':'num_applications_by_mail',
    'email_applications':'num_applications_by_email',
    'fax_applications':'num_applications_by_fax',
    'other_applications':'num_applications_by_other',
    'total_applications':'num_applications_total',
    'service_url_en':'service_uri_en',
    'service_url_fr':'service_uri_fr'
}

si_2018.rename(columns=rename_2018_si_columns, inplace=True)

si_2018_columns = set(si_2018.columns)
si_2024_columns = set(si_2024.columns)

# print('Columns only in 2018:', si_2018_columns-si_2024_columns)
# print('Columns only in 2024:', si_2024_columns-si_2018_columns)


In [8]:
# Add org_id to both datasets
si_2018_tidy = pd.merge(si_2018, org_var, left_on='department_name_en', right_on='org_name_variant')
si_2024_tidy = pd.merge(si_2024, org_var, left_on='owner_org', right_on='org_name_variant')

# Drop specific org name fields from both
si_2018_tidy = si_2018_tidy.drop(columns=['department_name_en', 'department_name_fr'])
si_2024_tidy = si_2024_tidy.drop(columns=['owner_org', 'owner_org_title'])

# Merge in en/fr department names from dept table
si_2018_tidy = pd.merge(
    si_2018_tidy,
    dept,
    on='org_id',
    how='left'
)

si_2024_tidy = pd.merge(
    si_2024_tidy,
    dept,
    on='org_id',
    how='left'
)

# Add program_id to 2018 dataset
# Collapse all the program id's in the serv_prog table for unique combinations of fiscal_yr and service_id
collapsed_serv_prog = (
    serv_prog.groupby(['fiscal_yr', 'service_id'], as_index=False)
    .agg({'program_id': lambda x: ','.join(sorted(x))})
)

# Merge the collapsed program id table into the 2018 dataset
si_2018_tidy = pd.merge(si_2018_tidy, collapsed_serv_prog, on=['fiscal_yr', 'service_id'], how='left')

# Add missing columns to both datasets
# Determine the set of columns for both datasets
si_2018_columns = set(si_2018_tidy.columns)
si_2024_columns = set(si_2024_tidy.columns)

# Loop through the columns that do not appear in the other dataset, create the relevant field
# Columns in 2018, but not in 2024
for col in si_2018_columns-si_2024_columns: 
    si_2024_tidy[col] = None

# Columns in 2024, but not in 2018
for col in si_2024_columns-si_2018_columns: 
    si_2018_tidy[col] = None


# Append / concatenate the datasets to one another
si = pd.concat([si_2018_tidy, si_2024_tidy], ignore_index=True)


# Normalize values across multiple-choice columns
# and the associated mapping of values from one dataset to the other (2018 : 2024)
replace_map_si = {
    'SOCIETAL':'SOCIETY', # Service recipient type    
    'PERSONS':'PERSON', # Client target groups
}

# Service type: Multiple values, split on comma, uppercase, sort
si['service_type'] = si['service_type'].apply(split_and_uppercase_to_sorted_string)

# Service recipient type: Single value, uppercase, replace values
si['service_recipient_type'] = si['service_recipient_type'].str.upper()
si['service_recipient_type'] = si['service_recipient_type'].replace(replace_map_si, regex=True)

# Service scope: Multiple values, split on comma, uppercase, sort, replace values
si['service_scope'] = si['service_scope'].apply(split_and_uppercase_to_sorted_string)
si['service_scope'] = si['service_scope'].replace(replace_map_si)

# Client target groups: Multiple values, split on comma, uppercase, sort, replace values
si['client_target_groups'] = si['client_target_groups'].apply(split_and_uppercase_to_sorted_string)
si['client_target_groups'] = si['client_target_groups'].replace(replace_map_si, regex=True)

# Client feedback channel: Multiple values, split on comma, uppercase, sort
si['client_feedback_channel'] = si['client_feedback_channel'].apply(split_and_uppercase_to_sorted_string)

# Service fee: Single value, uppercase
si['service_fee'] = si['service_fee'].str.upper()

# Last service review, improvement: Single values, replacing "NA", "N" with blanks 
si['last_service_review'] = si['last_service_review'].replace({np.nan: None, 'N':None})
si['last_service_improvement'] = si['last_service_improvement'].replace({np.nan: None, 'N':None})

si.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8911 entries, 0 to 8910
Data columns (total 51 columns):
 #   Column                                               Non-Null Count  Dtype 
---  ------                                               --------------  ----- 
 0   fiscal_yr                                            8911 non-null   object
 1   service_id                                           8911 non-null   object
 2   service_name_en                                      8911 non-null   object
 3   service_name_fr                                      8911 non-null   object
 4   service_description_en                               8911 non-null   object
 5   service_description_fr                               8911 non-null   object
 6   service_type                                         8911 non-null   object
 7   service_recipient_type                               8911 non-null   object
 8   service_scope                                        8911 non-null   object
 9

### Service standards

In [9]:
# Download service standard datasets
ss_2018 = pd.read_csv(
    "https://open.canada.ca/data/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c/resource/272143a7-533e-42a1-b72d-622116474a21/download/service_standards_2018-2023.csv", 
    na_values=[],
    keep_default_na=False
)

ss_2024 = pd.read_csv(
    "https://open.canada.ca/data/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c/resource/8736cd7e-9bf9-4a45-9eee-a6cb3c43c07e/download/service-std.csv", 
    na_values=[],
    keep_default_na=False
)

# Compare columns
ss_2018_columns = set(ss_2018.columns)
ss_2024_columns = set(ss_2024.columns)

# print('Columns only in 2018:', ss_2018_columns-ss_2024_columns)
# print('Columns only in 2024:', ss_2024_columns-ss_2018_columns)

In [10]:
# Rename columns in 2018 dataset to align to the 2024 dataset's conventions
rename_2018_ss_columns = {
    'service_std_id':'service_standard_id',
    'service_std_en':'service_standard_en',
    'service_std_fr':'service_standard_fr',
    'service_std_type':'type',    
    'standard_channel_comment_en':'channel_comments_en',
    'standard_channel_comment_fr':'channel_comments_fr',
    'service_std_target':'target',
    'standard_comment_en':'comments_en',
    'standard_comment_fr':'comments_fr',
    'service_std_url_en':'standards_targets_uri_en',
    'service_std_url_fr':'standards_targets_uri_fr',
    'realtime_result_url_en':'performance_results_uri_en',
    'realtime_result_url_fr':'performance_results_uri_fr'
}

ss_2018.rename(columns=rename_2018_ss_columns, inplace=True)

# Compare columns
ss_2018_columns = set(ss_2018.columns)
ss_2024_columns = set(ss_2024.columns)

print('Columns only in 2018:', ss_2018_columns-ss_2024_columns)
print('Columns only in 2024:', ss_2024_columns-ss_2018_columns)

Columns only in 2018: {'gcss_tool_fiscal_yr', 'department_name_fr', 'department_name_en', 'target_type'}
Columns only in 2024: {'owner_org', 'owner_org_title'}


In [11]:
# Add org_id to both datasets
ss_2018_tidy = pd.merge(ss_2018, org_var, left_on='department_name_en', right_on='org_name_variant')
ss_2024_tidy = pd.merge(ss_2024, org_var, left_on='owner_org', right_on='org_name_variant')

# Drop specific org name fields from both
ss_2018_tidy = ss_2018_tidy.drop(columns=['department_name_en', 'department_name_fr'])
ss_2024_tidy = ss_2024_tidy.drop(columns=['owner_org', 'owner_org_title'])

# Merge in en/fr department names from dept table
ss_2018_tidy = pd.merge(
    ss_2018_tidy,
    dept,
    on='org_id',
    how='left'
)

ss_2024_tidy = pd.merge(
    ss_2024_tidy,
    dept,
    on='org_id',
    how='left'
)

# Add missing columns to both datasets
# Determine the set of columns for both datasets
ss_2018_columns = set(ss_2018_tidy.columns)
ss_2024_columns = set(ss_2024_tidy.columns)

# Loop through the columns that do not appear in the other dataset, create the relevant field
# Columns in 2018, but not in 2024
for col in ss_2018_columns-ss_2024_columns: 
    ss_2024_tidy[col] = None

# Columns in 2024, but not in 2018
for col in ss_2024_columns-ss_2018_columns: 
    ss_2018_tidy[col] = None

# Service standard target & performance: 2018 has this as percentage (over 100), 2024 is as a decimal fraction
# Convert 2018 to decimal fraction using percentage cleaner fuction defined above
ss_2018_tidy['target'] = ss_2018_tidy['target'].apply(clean_percentage)
ss_2018_tidy['performance'] = ss_2018_tidy['performance'].apply(clean_percentage)

# Append / concatenate the datasets to one another
ss = pd.concat([ss_2018_tidy, ss_2024_tidy], ignore_index=True)

# Normalize values across multiple-choice columns
# and the associated mapping of values from one dataset to the other (2018 : 2024)
replace_map_ss = {
    'Timeliness':'TML', # Service standard type    
    'Accuracy':'ACY', # Service standard type
    'Access':'ACS', # Service standard type
    'Other':'OTH', # Service standard type
}


# Service standard type: single value, replace values
ss['type'] = ss['type'].replace(replace_map_ss)

# Service standard channel: single value, uppercase
ss['channel'] = ss['channel'].str.upper()

ss.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12729 entries, 0 to 12728
Data columns (total 27 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   fiscal_yr                   12729 non-null  object
 1   service_id                  12729 non-null  object
 2   service_name_en             12729 non-null  object
 3   service_name_fr             12729 non-null  object
 4   service_standard_id         12729 non-null  object
 5   service_standard_en         12729 non-null  object
 6   service_standard_fr         12729 non-null  object
 7   type                        12729 non-null  object
 8   gcss_tool_fiscal_yr         10425 non-null  object
 9   channel                     12729 non-null  object
 10  channel_comments_en         12729 non-null  object
 11  channel_comments_fr         12729 non-null  object
 12  target_type                 10425 non-null  object
 13  target                      12565 non-null  ob

## Download additional datasets for use with service inventory

In [12]:
# Download additional datasets
# Departmental Plans and Departmental Results Reports (Main estimates part III)
rbpo = pd.read_csv("https://open.canada.ca/data/dataset/a35cf382-690c-4221-a971-cf0fd189a46f/resource/64774bc1-c90a-4ae2-a3ac-d9b50673a895/download/rbpo_rppo_en.csv")

# Public Accounts: Operating costs by core responsibility
op_cost = pd.read_csv("https://donnees-data.tpsgc-pwgsc.gc.ca/ba1/respessentielles-coreresp/respessentielles-coreresp.csv")


## Tools and tables set up for later use

In [13]:
# Service ID list
service_id_list = si.loc[si.groupby('service_id')['fiscal_yr'].idxmax(), ['service_id', 'service_name_en', 'service_name_fr', 'fiscal_yr', 'department_en', 'department_fr']]

In [14]:
# Correspondence table between core responsibilities and program id
core_resp_program = rbpo.loc[:, ['organization_id', 'core_responsibility', 'program_id', 'program_name', 'fy_ef']]

In [15]:
# align fy format to service inventory, tidy up some tables
fy_cleanup = {'FY ': '', '-': '-20', '/':'-'}

rbpo['fy_ef'] = rbpo['fy_ef'].replace(fy_cleanup, regex=True)
rbpo.rename(columns={
    'fy_ef': 'fiscal_yr',
    'core_responsibility': 'core_responsibility_en'}, inplace=True)

core_resp_program['fy_ef'] = core_resp_program['fy_ef'].replace(fy_cleanup, regex=True)
core_resp_program = core_resp_program.rename(columns={'fy_ef': 'fiscal_yr'})

op_cost['FSCL_YR'] = op_cost['FSCL_YR'].replace(fy_cleanup, regex=True)

In [16]:
# get org id into op_cost table
op_cost = op_cost.set_index('DEPT_EN_DESC').join(org_var).reset_index()

op_cost.rename(columns={
    'FSCL_YR': 'fiscal_yr',
    'DEPT_EN_DESC': 'department_name_en',
    'CR_EN_NM': 'core_responsibility_en',
    'OP_ATHRTY_CY_AMT': 'operating_costs',
    'org_id': 'organization_id'}, inplace=True)

# get rid of extra cols
op_cost = op_cost.loc[:, ['fiscal_yr', 'department_name_en','organization_id', 'core_responsibility_en', 'operating_costs']]

## Tables for specific indicators

### Applications for service
Given a service, what is the volume of interactions (applications) by channel and fiscal year?

In [17]:
# Unpivot (i.e. melt) application volume columnns

# list of columns that contain application / interaction volumes
# These also represent the channel through which the interaction took place

app_cols = [
    'num_applications_by_phone', 
    'num_applications_online', 
    'num_applications_in_person', 
    'num_applications_by_mail', 
    'num_applications_by_email', 
    'num_applications_by_fax', 
    'num_applications_by_other'
]

si_vol = pd.melt(si, id_vars=['fiscal_yr', 'service_id'], value_vars=app_cols, var_name='channel', value_name='volume')

# remove "_applications" from the channel column to get a clean channel name
si_vol['channel'] = si_vol['channel'].str.replace('num_applications_', '', regex=True).str.replace('by_', '', regex=True)

# remove 'NaN', 'ND' values in volume
si_vol = si_vol.dropna(subset=['volume'])
si_vol = si_vol[si_vol['volume'] != 'ND']
si_vol = si_vol[si_vol['volume'] != 'NA']

# only take entries where the volume is > 0
si_vol['volume'] = pd.to_numeric(si_vol['volume'])
si_vol = si_vol[si_vol['volume'] > 0]

### Online interaction points
Given a service, which online interaction points are activated as of the latest fiscal year those services reported?

In [18]:
# Unpivot (i.e. melt) online interaction point columns

# list of columns that represent online interaction point activation
oip_cols = [
    'os_account_registration', 
    'os_authentication', 
    'os_application', 
    'os_decision', 
    'os_issuance', 
    'os_issue_resolution_feedback', 
]

si_oip = pd.melt(si, id_vars=['fiscal_yr', 'service_id'], value_vars=oip_cols, var_name='online_interaction_point', value_name='activation')

# add a column to indicate the sort position of the online interaction point
si_oip['online_interaction_point_sort'] = si_oip['online_interaction_point'].apply(lambda x: oip_cols.index(x)+1)

# remove "os_" from the online interaction point column to get a clean name
si_oip['online_interaction_point'] = si_oip['online_interaction_point'].str.replace('os_', '')

# dump old years, only take latest year
si_oip = si_oip.loc[si_oip.groupby(['service_id', 'online_interaction_point'])['fiscal_yr'].idxmax()].sort_values(by=['service_id', 'online_interaction_point_sort'])

### Timeliness service standard performance

Given a service, what is the volume of interactions that met the target vs not, by fiscal year?

In [19]:
# Filter the DataFrame for rows where (service standard type) 'type' is 'TML' (Timeliness), group by 'fiscal_yr' 
# and 'service_id', sum the 'volume_meeting_target' and 'total_volume' columns, and reset the index.

ss_tml_perf_vol = ss.loc[ss['type'] == 'TML'].groupby(['fiscal_yr', 'service_id'])[['volume_meeting_target', 'total_volume']].sum().reset_index()

ss_tml_perf_vol['total_volume'] = pd.to_numeric(ss_tml_perf_vol['total_volume'], errors='coerce').fillna(0)
ss_tml_perf_vol['volume_meeting_target'] = pd.to_numeric(ss_tml_perf_vol['volume_meeting_target'], errors='coerce').fillna(0)


ss_tml_perf_vol['volume_not_meeting_target'] = ss_tml_perf_vol['total_volume']-ss_tml_perf_vol['volume_meeting_target']

### Total number of services

Given a fiscal year, how many services were reported?

In [20]:
si_fy_service_count = si.groupby(['fiscal_yr'])['service_id'].count().reset_index()

### Total number of service interactions
Given a fiscal year, how many service interactions were reported?

In [21]:
si_fy_interaction_sum = si_vol.groupby(['fiscal_yr'])['volume'].sum().reset_index()

## MAF score calculation for Client-centric service design and delivery
Determining the results of MAF scores

References to methodology can be found here
https://www.canada.ca/en/treasury-board-secretariat/services/management-accountability-framework/maf-methodologies/2022-2023-im-it.html#toc-1

In [23]:
# setting up the score bins and corresponding results for use with pd.cut
score_bins = [0, 50, 80, 101]
score_results = ['low', 'medium', 'high']

#### Question 1: Existence of service standards
As service standards are required under the Policy on Service and Digital, what is the percentage of services that have service standards?

In [36]:
maf1 = si.loc[:, ['fiscal_yr', 'service_id', 'department_en', 'org_id']]
maf1['service_std_tf'] = si[['fiscal_yr', 'service_id']].isin(ss[['fiscal_yr', 'service_id']].to_dict(orient='list')).all(axis=1)

maf1_num = maf1.groupby(['fiscal_yr', 'department_en', 'org_id'])['service_id'].count().reset_index()
maf1_denom = maf1.groupby(['fiscal_yr', 'department_en','org_id'])['service_std_tf'].sum().reset_index()

maf1 = pd.merge(
    maf1_num,
    maf1_denom,
    on=['fiscal_yr', 'department_en', 'org_id'],
    how='left'
).rename(columns={'service_id':'service_count', 'service_std_tf':'service_with_std_count'})

maf1['maf1_score'] = (maf1['service_with_std_count']/maf1['service_count'])*100
# maf1['maf1_result'] = pd.cut(maf1['maf1_score'], bins=score_bins, labels=score_results, right=False)

maf1

Unnamed: 0,fiscal_yr,department_en,org_id,service_count,service_with_std_count,maf1_score,maf1_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,539,4,3,75.000000,medium
1,2018-2019,Agriculture and Agri-Food Canada,1,30,30,100.000000,high
2,2018-2019,Atlantic Canada Opportunities Agency,12,5,3,60.000000,medium
3,2018-2019,Canada Border Services Agency,26,42,27,64.285714,medium
4,2018-2019,Canada Economic Development for Quebec Regions,141,3,3,100.000000,high
...,...,...,...,...,...,...,...
451,2023-2024,Treasury Board of Canada Secretariat,326,33,29,87.878788,high
452,2023-2024,Veterans Affairs Canada,139,30,20,66.666667,medium
453,2023-2024,Veterans Review and Appeal Board,333,1,1,100.000000,high
454,2023-2024,Women and Gender Equality Canada,246,3,3,100.000000,high


#### Question 2: Service standard targets
What is the percentage of service standards that met their target?

In [37]:
maf2 = ss.loc[:, ['fiscal_yr', 'service_standard_id', 'department_en', 'org_id', 'target_met']].dropna()

maf2_num = maf2[maf2['target_met']=='Y'].groupby(['fiscal_yr', 'department_en', 'org_id'])['service_standard_id'].count().reset_index()
maf2_denom = maf2.groupby(['fiscal_yr', 'department_en', 'org_id'])['service_standard_id'].count().reset_index()

maf2 = pd.merge(
    maf2_num,
    maf2_denom,
    suffixes=['_met','_total'],
    on=['fiscal_yr', 'department_en', 'org_id'],
    how='left'
)

maf2['maf2_score'] = (maf2['service_standard_id_met']/maf2['service_standard_id_total'])*100
# maf2['maf2_result'] = pd.cut(maf2['maf2_score'], bins=score_bins, labels=score_results, right=False)

maf2

Unnamed: 0,fiscal_yr,department_en,org_id,service_standard_id_met,service_standard_id_total,maf2_score,maf2_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,539,1,2,50.000000,medium
1,2018-2019,Agriculture and Agri-Food Canada,1,88,123,71.544715,medium
2,2018-2019,Atlantic Canada Opportunities Agency,12,4,7,57.142857,medium
3,2018-2019,Canada Border Services Agency,26,26,43,60.465116,medium
4,2018-2019,Canada Economic Development for Quebec Regions,141,1,7,14.285714,low
...,...,...,...,...,...,...,...
330,2023-2024,Statistics Canada,313,34,37,91.891892,high
331,2023-2024,Transport Canada,138,108,282,38.297872,low
332,2023-2024,Treasury Board of Canada Secretariat,326,29,42,69.047619,medium
333,2023-2024,Veterans Affairs Canada,139,18,24,75.000000,medium


#### Question 3: Real-time performance for service standards

As real-time performance reporting is required under the Directive on Service and Digital, what is the extent to which real-time performance reporting for services is published?

Real-time URL data is unreliable

#### Question 4: Service standards reviews

What is the percentage of service standards which have been reviewed?

GCSS review field is no longer being collected as of 2023-24 dataset

#### Question 5: Online end-to-end
As online end-to-end availability of services is required under the Policy on Service and Digital, what is the percentage of applicable services that can be completed online end-to-end?

In [38]:
oip_cols = [
    'os_account_registration', 
    'os_authentication', 
    'os_application', 
    'os_decision', 
    'os_issuance', 
    'os_issue_resolution_feedback', 
]

# Melt the DataFrame
maf5 = pd.melt(si, id_vars=['fiscal_yr', 'service_id', 'department_en', 'org_id'], value_vars=oip_cols, var_name='online_interaction_point', value_name='activation')

# Create boolean columns for activation states
maf5['activation_y'] = (maf5['activation'] == 'Y')
maf5['activation_n'] = (maf5['activation'] == 'N')
maf5['activation_na'] = (maf5['activation'] == 'NA')

# Group by and sum the activation columns
maf5 = maf5.groupby(['fiscal_yr', 'department_en', 'org_id', 'service_id'])[['activation_y', 'activation_n', 'activation_na']].sum().reset_index()

# Determine conditions for online_e2e
conditions = [
    (maf5['activation_na'] == 6),  # All interaction points are NaN
    (maf5['activation_n'] > 0)      # Some interaction points are 'N'
]
choices = [None, False]

maf5['online_e2e'] = np.select(conditions, choices, default=True).astype(bool)

# remove all Nan/Nones
maf5 = maf5.dropna(subset=['online_e2e'])

# Determine department-level counts for online e2e services and all services
maf5 = maf5.groupby(['fiscal_yr', 'department_en', 'org_id']).agg(
    online_e2e_count=('online_e2e', 'sum'), # this is wizardry to me... still not sure what is happening
    service_count=('service_id', 'nunique')
).reset_index()

# Determine score and associated result
maf5['maf5_score'] = (maf5['online_e2e_count']/maf5['service_count'])*100
# maf5['maf5_result'] = pd.cut(maf5['maf5_score'], bins=score_bins, labels=score_results, right=False)

maf5



Unnamed: 0,fiscal_yr,department_en,org_id,online_e2e_count,service_count,maf5_score,maf5_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,539,0,4,0.000000,low
1,2018-2019,Agriculture and Agri-Food Canada,1,9,30,30.000000,low
2,2018-2019,Atlantic Canada Opportunities Agency,12,1,5,20.000000,low
3,2018-2019,Canada Border Services Agency,26,2,42,4.761905,low
4,2018-2019,Canada Economic Development for Quebec Regions,141,0,3,0.000000,low
...,...,...,...,...,...,...,...
451,2023-2024,Treasury Board of Canada Secretariat,326,21,33,63.636364,medium
452,2023-2024,Veterans Affairs Canada,139,16,30,53.333333,medium
453,2023-2024,Veterans Review and Appeal Board,333,0,1,0.000000,low
454,2023-2024,Women and Gender Equality Canada,246,3,3,100.000000,high


#### Question 6: Online client interaction points
As online end-to-end availability of services is required under the Policy on Service and Digital, what is the percentage of client interaction points that are available online for services?

In [39]:
oip_cols = [
    'os_account_registration', 
    'os_authentication', 
    'os_application', 
    'os_decision', 
    'os_issuance', 
    'os_issue_resolution_feedback', 
]

# Melt the DataFrame
maf6 = pd.melt(si, id_vars=['fiscal_yr', 'service_id', 'department_en', 'org_id'], value_vars=oip_cols, var_name='online_interaction_point', value_name='activation').dropna()

maf6['activation'] = (maf6['activation'] == 'Y')

maf6 = maf6.groupby(['fiscal_yr', 'department_en', 'org_id']).agg(
    activated_point_count=('activation', 'sum'), # this is wizardry to me... still not sure what is happening
    point_count=('service_id', 'count')
).reset_index()

# Determine score and associated result
maf6['maf6_score'] = (maf6['activated_point_count']/maf6['point_count'])*100
# maf6['maf6_result'] = pd.cut(maf6['maf6_score'], bins=score_bins, labels=score_results, right=False)


maf6

Unnamed: 0,fiscal_yr,department_en,org_id,activated_point_count,point_count,maf6_score,maf6_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,539,2,24,8.333333,low
1,2018-2019,Agriculture and Agri-Food Canada,1,107,180,59.444444,medium
2,2018-2019,Atlantic Canada Opportunities Agency,12,11,30,36.666667,low
3,2018-2019,Canada Border Services Agency,26,32,252,12.698413,low
4,2018-2019,Canada Economic Development for Quebec Regions,141,4,18,22.222222,low
...,...,...,...,...,...,...,...
451,2023-2024,Treasury Board of Canada Secretariat,326,122,198,61.616162,medium
452,2023-2024,Veterans Affairs Canada,139,123,180,68.333333,medium
453,2023-2024,Veterans Review and Appeal Board,333,5,6,83.333333,high
454,2023-2024,Women and Gender Equality Canada,246,18,18,100.000000,high


#### Question 7: ICT Accessibility
As accessibility is required under the Policy on Service and Digital, what is the percentage of services available online that have been assessed for ICT accessibility?

Accessibility data from the service inventory is garbage, and we are no longer collecting it

#### Question 8: Client feedback
As ensuring client feedback is used to inform continuous improvement of services is a requirement under the Directive on Service and Digital, what is the percentage of services which have used client feedback to improve services in the last year?

In [40]:
maf8 = si.loc[:,['fiscal_yr', 'service_id', 'org_id', 'department_en', 'last_service_review', 'last_service_improvement']]

maf8['report_yr'] = pd.to_numeric(maf8['fiscal_yr'].str.split('-').str[1], errors='coerce').astype(int)
maf8['last_service_improvement_yr'] = pd.to_numeric(maf8['last_service_improvement'].str.split('-').str[1], errors='coerce')

maf8['yrs_since_last_service_improvement'] = maf8['report_yr']-maf8['last_service_improvement_yr']
maf8['last_service_improvement_within_1_yr'] = maf8['yrs_since_last_service_improvement'] <= 1

maf8 = maf8.groupby(['fiscal_yr', 'department_en', 'org_id']).agg(
    improved_services_count=('last_service_improvement_within_1_yr', 'sum'),
    service_count=('service_id', 'nunique')
).reset_index()

# Determine score and associated result
maf8['maf8_score'] = (maf8['improved_services_count']/maf8['service_count'])*100
maf8['maf8_result'] = pd.cut(maf8['maf8_score'], bins=score_bins, labels=score_results, right=False)

maf8

Unnamed: 0,fiscal_yr,department_en,org_id,improved_services_count,service_count,maf8_score,maf8_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,539,0,4,0.00000,low
1,2018-2019,Agriculture and Agri-Food Canada,1,0,30,0.00000,low
2,2018-2019,Atlantic Canada Opportunities Agency,12,0,5,0.00000,low
3,2018-2019,Canada Border Services Agency,26,0,42,0.00000,low
4,2018-2019,Canada Economic Development for Quebec Regions,141,0,3,0.00000,low
...,...,...,...,...,...,...,...
451,2023-2024,Treasury Board of Canada Secretariat,326,23,33,69.69697,medium
452,2023-2024,Veterans Affairs Canada,139,15,30,50.00000,medium
453,2023-2024,Veterans Review and Appeal Board,333,0,1,0.00000,low
454,2023-2024,Women and Gender Equality Canada,246,3,3,100.00000,high


### Service reviews
What fraction of services have met the requirement to be reviewed in the past 5 years?

In [22]:
si_reviews = si.loc[:,['fiscal_yr', 'service_id', 'org_id', 'last_service_review', 'last_service_improvement']]

si_reviews['report_yr'] = pd.to_numeric(si_reviews['fiscal_yr'].str.split('-').str[1], errors='coerce').astype(int)

si_reviews['last_service_review_yr'] = pd.to_numeric(si_reviews['last_service_review'].str.split('-').str[1], errors='coerce')
si_reviews['yrs_since_last_service_review'] = si_reviews['report_yr']-si_reviews['last_service_review_yr']
si_reviews['last_service_review_within_5_yrs'] = si_reviews['yrs_since_last_service_review'] <= 5

si_reviews['last_service_improvement_yr'] = pd.to_numeric(si_reviews['last_service_improvement'].str.split('-').str[1], errors='coerce')
si_reviews['yrs_since_last_service_improvement'] = si_reviews['report_yr']-si_reviews['last_service_improvement_yr']
si_reviews['last_service_improvement_within_5_yrs'] = si_reviews['yrs_since_last_service_improvement'] <= 5

si_reviews = si_reviews.groupby(['fiscal_yr']).agg(
    total_services = ('service_id', 'count'),
    services_reviewed_in_past_5_yrs = ('last_service_review_within_5_yrs', 'sum'),
    servives_improved_in_past_5_yrs = ('last_service_improvement_within_5_yrs', 'sum')
    ).reset_index()

## Combining other datasets with service inventory and service standards

### Spending and FTEs for programs responsible for service delivery

Given a service, what are the number of actual and planned FTEs by fiscal year for the program responsible for service delivery? What is the actual and planned spending?

In [28]:
# Reformat program data table to be easier to work with, filter out irrelevant information

# Define columns related to measures: spending and FTEs (planned and actual)
fte_spend_cols = [
    'planned_spending_1', 'actual_spending', 'planned_spending_2', 'planned_spending_3',
    'planned_ftes_1', 'actual_ftes', 'planned_ftes_2', 'planned_ftes_3'
]

# Melt (unpivot) the DataFrame to long format
rbpo_melted = pd.melt(
    rbpo, 
    id_vars=['fiscal_yr', 'organization_id', 'program_id', 'core_responsibility_en'], 
    value_vars=fte_spend_cols, 
    var_name='plan_actual_yr', 
    value_name='measure'
)

# Split 'plan_actual_yr' into separate columns for planned/actual, spending/FTEs, and year adjustment
rbpo_melted[['planned_actual', 'spending_fte', 'yr_adjust']] = rbpo_melted['plan_actual_yr'].str.split('_', expand=True)
rbpo_melted['yr_adjust'] = rbpo_melted['yr_adjust'].fillna('1').astype(int) - 1

# Calculate 'measure_yr' and 'report_yr' from 'fiscal_yr' and 'yr_adjust'
rbpo_melted['measure_yr'] = rbpo_melted['fiscal_yr'].str.split('-').str[1].astype(int) + rbpo_melted['yr_adjust']
rbpo_melted['report_yr'] = rbpo_melted['fiscal_yr'].str.split('-').str[1].astype(int)

# Get the latest fiscal year from the Service inventory (four digit fy, year of end of fy)
latest_si_fy = si['fiscal_yr'].str.split('-').str[1].astype(int).max()

# Separate actuals and future planned data (beyond the latest service fiscal year)
rbpo_melted_actuals = rbpo_melted[rbpo_melted['planned_actual'] == 'actual']
rbpo_melted_planned = rbpo_melted[
    (rbpo_melted['planned_actual'] == 'planned') & (rbpo_melted['report_yr'] > latest_si_fy)
]

# Sort and drop duplicate planned entries, keeping the latest by 'report_yr'
rbpo_melted_planned = rbpo_melted_planned.sort_values(
    by=['report_yr', 'organization_id', 'program_id', 'spending_fte'], 
    ascending=False
).drop_duplicates(subset=['measure_yr','organization_id', 'program_id', 'spending_fte'])

# Concatenate actuals and planned entries, drop any remaining NaNs
rbpo_melted = pd.concat([rbpo_melted_planned, rbpo_melted_actuals]).dropna()

# Pivot to get a wide format table with spending/FTE columns, aggregating with 'sum'
rbpo_melted = rbpo_melted.pivot_table(
    index=['organization_id', 'core_responsibility_en', 'program_id', 'report_yr', 'measure_yr', 'planned_actual'], 
    columns=['spending_fte'], 
    values='measure', 
    aggfunc='sum'
).sort_values(
    by=['organization_id', 'program_id', 'report_yr','measure_yr']
).reset_index()

# Set up a fiscal year column  to be able to include years beyond the service inventory when joining.
# if measure year > latest service fy, = latest service fy

rbpo_melted.loc[rbpo_melted['measure_yr']>latest_si_fy, 'si_link_yr'] = latest_si_fy
rbpo_melted.loc[rbpo_melted['measure_yr']<=latest_si_fy, 'si_link_yr'] = rbpo_melted['measure_yr']

rbpo_melted['si_link_yr'] = rbpo_melted['si_link_yr'].astype(int) 

In [29]:
# Set new multi-index for service inventory, drop existing collapsed program id column (temp1)
temp1 = si.set_index(['fiscal_yr','service_id']).drop(columns='program_id')

# Get the program_id into the service inventory
# Set index for service-program correspondence table (temp2)
temp2 = serv_prog.set_index(['fiscal_yr', 'service_id'])

# Join the service inventory (temp1) and the program correspondence table (temp2) 
temp3 = temp1.join(temp2)

# then clean up this expanded service inventory (temp3) by resetting the index and dropping NaNs
temp3 = temp3[temp3['program_id'].notna()].reset_index()

# Generate a 4-digit year in the expanded service inventory (temp3) to link to the program data
temp3['si_link_yr'] = temp3['fiscal_yr'].str.split('-').str[1].astype(int)

# Set a new multi-index for the expanded service inventory (temp3) and rename org_id to align to the program table
temp3 = temp3.rename(columns={'org_id': 'organization_id'}).set_index(['si_link_yr', 'organization_id', 'program_id'])

# Set index for program data (temp4) 
temp4 = rbpo_melted.set_index(['si_link_yr', 'organization_id', 'program_id'])

# then join with expanded service inventory
service_fte_spending = temp3.join(temp4, lsuffix='_si', rsuffix='_program').reset_index()

## Export data to CSV

In [30]:
# Define the DataFrames to export to csv and their corresponding names
csv_exports = {
    "si": si,
    "ss": ss,
    "si_vol": si_vol,
    "si_oip": si_oip,
    "ss_tml_perf_vol": ss_tml_perf_vol,
    "si_fy_interaction_sum": si_fy_interaction_sum,
    "si_fy_service_count": si_fy_service_count,
    "si_reviews": si_reviews,
    "service_fte_spending": service_fte_spending,
    "service_id_list": service_id_list
}

# Loop through the dictionary
for name, df in csv_exports.items():
    # Generate the filename using the key (string name)
    fn = f"{name}.csv"
    
    # Export the DataFrame to CSV
    df.to_csv(fn, index=False, sep=';')
    
    # Append the timestamp at the end of the file
    with open(fn, 'a') as timestamped_file:
        timestamped_file.write(f"\nTimestamp:{current_datetime_str}\n")
