# Service performance data

## Introduction - what is this for

This notebook will ingest and process service-related data into ready-to-use csv files for visualization purposes or further analysis.

The following datasets will be consulted:

**GC Service Inventory and Service Performance**: An inventory of Government of Canada services, their associated service standards and performance<br>
https://open.canada.ca/data/en/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c

**Departmental Plans and Departmental Results Reports**: Expenditures and Full Time Equivalents (FTE) by Program and by Organization<br>
https://open.canada.ca/data/en/dataset/a35cf382-690c-4221-a971-cf0fd189a46f/resource/64774bc1-c90a-4ae2-a3ac-d9b50673a895


Utilities built and shared specifically for this purpose:<br>
https://github.com/gc-performance/utilities

**Department list**: A list of every organization, department, agency, with their various associated names in order to align to a single numeric ID per department.  
**Program-service id correspondence**: Converting the long-form program names in the service inventory to the program id's from the Departmental Plans, Departmental Results Reports.

### Conventions;

Whenever a 4-digit year represents a fiscal year, the 4-digit year is the calendar year during which the fiscal year **ended**

## Setting up environment

In [16]:
import pandas as pd
pd.set_option("display.max_rows", 100)

import numpy as np
import re
import pytz
import os

# Service inventory and service standards
si_2018 = pd.read_csv("https://open.canada.ca/data/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c/resource/3acf79c0-a5f5-4d9a-a30d-fb5ceba4b60a/download/service_inventory_2018-2023.csv")
ss_2018 = pd.read_csv("https://open.canada.ca/data/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c/resource/272143a7-533e-42a1-b72d-622116474a21/download/service_standards_2018-2023.csv")
si_2024 = pd.read_csv("https://open.canada.ca/data/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c/resource/c0cf9766-b85b-48c3-b295-34f72305aaf6/download/service.csv")
ss_2024 = pd.read_csv("https://open.canada.ca/data/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c/resource/8736cd7e-9bf9-4a45-9eee-a6cb3c43c07e/download/service-std.csv")

# Departmental Plans and Departmental Results Reports (Main estimates part III)
rbpo = pd.read_csv("https://open.canada.ca/data/dataset/a35cf382-690c-4221-a971-cf0fd189a46f/resource/64774bc1-c90a-4ae2-a3ac-d9b50673a895/download/rbpo_rppo_en.csv")

# Public Accounts: Operating costs by core responsibility
op_cost = pd.read_csv("https://donnees-data.tpsgc-pwgsc.gc.ca/ba1/respessentielles-coreresp/respessentielles-coreresp.csv")

# Utility tables
# Department list
org_var = pd.read_csv("https://raw.githubusercontent.com/gc-performance/utilities/master/goc-org-variants.csv").set_index('org_name_variant')

# Program-service id correspondence
serv_prog = pd.read_csv("https://raw.githubusercontent.com/gc-performance/utilities/master/goc-service-program.csv")

# define string cleaner function
def normalize_string(s):
    # Remove all non-alphanumeric characters (special characters and spaces)
    s = re.sub(r'[^A-Za-z0-9]', '', s)
    # Convert to uppercase
    return s.upper()



In [21]:
# Combining historical and live service inventory and standard data
si_2018.columns.tolist()

['fiscal_yr',
 'service_id',
 'department_name_en',
 'department_name_fr',
 'service_name_en',
 'service_name_fr',
 'service_description_en',
 'service_description_fr',
 'service_type',
 'service_recipient_type',
 'service_scope',
 'client_target_groups',
 'program_name_en',
 'program_name_fr',
 'client_feedback',
 'service_fee',
 'last_GBA',
 'ident_platform',
 'ident_platform_comments',
 'e_registration',
 'e_authentication',
 'e_application',
 'e_decision',
 'e_issuance',
 'e_feedback',
 'online_comments_en',
 'online_comments_fr',
 'how_has_the_service_been_assessed_for_accessibility',
 'last_year_of_service_review',
 'last_year_of_service_improvement_based_on_client_feedback',
 'use_of_CRA_number',
 'use_of_SIN_number',
 'calls_received',
 'telephone_applications',
 'web_visits',
 'online_applications',
 'in_person_applications',
 'postal_mail_applications',
 'email_applications',
 'fax_applications',
 'other_applications',
 'special_remarks_en',
 'special_remarks_fr',
 'service_u

In [None]:
# Some manipulations to use later
# Latest service names
latest_service_names = si.loc[si.groupby('service_id')['fiscal_yr'].idxmax(), ['service_id', 'service_name_en', 'service_name_fr']]

# Correspondence table between core responsibilities and program id
core_resp_program = rbpo.loc[:, ['organization_id', 'core_responsibility', 'program_id', 'program_name', 'fy_ef']]


# align fy format to service inventory, tidy up some tables
fy_cleanup = {'FY ': '', '-': '-20', '/':'-'}

rbpo['fy_ef'] = rbpo['fy_ef'].replace(fy_cleanup, regex=True)
rbpo.rename(columns={
    'fy_ef': 'fiscal_yr',
    'core_responsibility': 'core_responsibility_en'}, inplace=True)

core_resp_program['fy_ef'] = core_resp_program['fy_ef'].replace(fy_cleanup, regex=True)
core_resp_program = core_resp_program.rename(columns={'fy_ef': 'fiscal_yr'})

op_cost['FSCL_YR'] = op_cost['FSCL_YR'].replace(fy_cleanup, regex=True)



# get org id into op_cost table
op_cost = op_cost.set_index('DEPT_EN_DESC').join(org_var).reset_index()

op_cost.rename(columns={
    'FSCL_YR': 'fiscal_yr',
    'DEPT_EN_DESC': 'department_name_en',
    'CR_EN_NM': 'core_responsibility_en',
    'OP_ATHRTY_CY_AMT': 'operating_costs',
    'org_id': 'organization_id'}, inplace=True)

# get rid of extra cols
op_cost = op_cost.loc[:, ['fiscal_yr', 'department_name_en','organization_id', 'core_responsibility_en', 'operating_costs']]


In [2]:
# Specify timezone
timezone = pytz.timezone('America/Montreal')

# Get current date and time in the specified timezone
current_datetime = pd.Timestamp.now(tz=timezone)

# Convert to string
current_datetime_str = current_datetime.strftime("%Y-%m-%d_%H:%M:%S")
print(current_datetime_str)

2024-11-13_16:16:48


In [3]:
# Get the current working directory
cwd = os.getcwd()
print("Current working directory:", cwd)

# Confirm the working directory
os.chdir('/home/jovyan/shared/service-data')
cwd = os.getcwd()

print("Updated working directory:", cwd)

Current working directory: /home/jovyan/shared/service-data
Updated working directory: /home/jovyan/shared/service-data


### Applications for service
Given a service, what is the volume of interactions (applications) by channel and fiscal year?

In [4]:
# Unpivot (i.e. melt) application volume columnns

# list of columns that contain application / interaction volumes
# These also represent the channel through which the interaction took place

app_cols = [
    'telephone_applications', 
    'online_applications', 
    'in_person_applications', 
    'postal_mail_applications', 
    'email_applications', 
    'fax_applications', 
    'other_applications'
]

si_vol = pd.melt(si, id_vars=['fiscal_yr', 'service_id'], value_vars=app_cols, var_name='channel', value_name='volume')

# remove "_applications" from the channel column to get a clean channel name
si_vol['channel'] = si_vol['channel'].str.replace('_applications', '')

# remove 'NaN' values in volume
si_vol = si_vol.dropna(subset=['volume'])

# define filename with timestamp, export to csv
fn = 'si_vol_'+current_datetime_str+'.csv'
si_vol.to_csv(fn, index=False, header=True, sep=';')

### Online interaction points
Given a service, which online interaction points are activated?

In [5]:
# Unpivot (i.e. melt) online interaction point columns

# list of columns that represent online interaction point activation
oip_cols = [
    'e_registration', 
    'e_authentication', 
    'e_application', 
    'e_decision', 
    'e_issuance', 
    'e_feedback', 
]

si_oip = pd.melt(si, id_vars=['fiscal_yr', 'service_id'], value_vars=oip_cols, var_name='online_interaction_point', value_name='activation')

# add a column to indicate the sort position of the online interaction point
si_oip['online_interaction_point_sort'] = si_oip['online_interaction_point'].apply(lambda x: oip_cols.index(x)+1)

# remove "e_" from the online interaction point column to get a clean name
si_oip['online_interaction_point'] = si_oip['online_interaction_point'].str.replace('e_', '')

# might dump old years, only take latest year
si_oip = si_oip.loc[si_oip.groupby(['service_id', 'online_interaction_point'])['fiscal_yr'].idxmax()].sort_values(by=['service_id', 'online_interaction_point_sort'])

# define filename with timestamp, export to csv
fn = 'si_oip_'+current_datetime_str+'.csv'
si_oip.to_csv(fn, index=False, header=True, sep=';')

### Timeliness service standard performance

Given a service, what is the volume of interactions that met the target vs not, by fiscal year?

In [6]:
# Filter the DataFrame for rows where 'service_std_type' is 'Timeliness', group by 'fiscal_yr' 
# and 'service_id', sum the 'volume_meeting_target' and 'total_volume' columns, and reset the index.

ss_tml_perf_vol = ss.loc[ss['service_std_type'] == 'Timeliness'].groupby(['fiscal_yr', 'service_id'])[['volume_meeting_target', 'total_volume']].sum().reset_index()

ss_tml_perf_vol['volume_not_meeting_target'] = ss_tml_perf_vol['total_volume']-ss_tml_perf_vol['volume_meeting_target']

# define filename with timestamp, export to csv
fn = 'ss_tml_perf_vol_'+current_datetime_str+'.csv'
ss_tml_perf_vol.to_csv(fn, index=False, header=True, sep=';')

### MAF score calculation for Client-centric service design and delivery
Determining the results of MAF scores

References to methodology can be found here
https://www.canada.ca/en/treasury-board-secretariat/services/management-accountability-framework/maf-methodologies/2022-2023-im-it.html#toc-1

#### Question 1: Existence of service standards
As service standards are required under the Policy on Service and Digital, what is the percentage of services that have service standards?

In [7]:
# setting up the score bins and corresponding results for use with pd.cut
score_bins = [0, 50, 80, 101]
score_results = ['low', 'medium', 'high']

In [8]:
maf1 = si.loc[:, ['fiscal_yr', 'service_id', 'department_name_en']]
maf1['service_std_tf'] = si[['fiscal_yr', 'service_id']].isin(ss[['fiscal_yr', 'service_id']].to_dict(orient='list')).all(axis=1)

maf1_num = maf1.groupby(['fiscal_yr', 'department_name_en'])['service_id'].count().reset_index()
maf1_denom = maf1.groupby(['fiscal_yr', 'department_name_en'])['service_std_tf'].sum().reset_index()

maf1 = pd.merge(
    maf1_num,
    maf1_denom,
    on=['fiscal_yr', 'department_name_en'],
    how='left'
).rename(columns={'service_id':'service_count', 'service_std_tf':'service_with_std_count'})

maf1['maf1_score'] = (maf1['service_with_std_count']/maf1['service_count'])*100
maf1['maf1_result'] = pd.cut(maf1['maf1_score'], bins=score_bins, labels=score_results, right=False)

maf1

Unnamed: 0,fiscal_yr,department_name_en,service_count,service_with_std_count,maf1_score,maf1_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,4,3,75.000000,medium
1,2018-2019,Agriculture and Agri-Food Canada,30,30,100.000000,high
2,2018-2019,Atlantic Canada Opportunities Agency,5,3,60.000000,medium
3,2018-2019,Canada Border Services Agency,42,27,64.285714,medium
4,2018-2019,Canada Economic Development for Quebec Regions,3,3,100.000000,high
...,...,...,...,...,...,...
373,2022-2023,Transportation Safety Board of Canada,4,0,0.000000,low
374,2022-2023,Treasury Board of Canada Secretariat,27,22,81.481481,high
375,2022-2023,Veterans Affairs Canada,30,20,66.666667,medium
376,2022-2023,Veterans Review and Appeal Board,1,1,100.000000,high


#### Question 2: Service standard targets
What is the percentage of service standards that met their target?

In [9]:
maf2 = ss.loc[:, ['fiscal_yr', 'service_std_id', 'department_name_en', 'target_met']].dropna()

maf2_num = maf2[maf2['target_met']=='Y'].groupby(['fiscal_yr', 'department_name_en'])['service_std_id'].count().reset_index()
maf2_denom = maf2.groupby(['fiscal_yr', 'department_name_en'])['service_std_id'].count().reset_index()

maf2 = pd.merge(
    maf2_num,
    maf2_denom,
    suffixes=['_met','_total'],
    on=['fiscal_yr', 'department_name_en'],
    how='left'
)

maf2['maf2_score'] = (maf2['service_std_id_met']/maf2['service_std_id_total'])*100
maf2['maf2_result'] = pd.cut(maf2['maf2_score'], bins=score_bins, labels=score_results, right=False)

maf2

Unnamed: 0,fiscal_yr,department_name_en,service_std_id_met,service_std_id_total,maf2_score,maf2_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,1,2,50.000000,medium
1,2018-2019,Agriculture and Agri-Food Canada,88,112,78.571429,medium
2,2018-2019,Atlantic Canada Opportunities Agency,4,7,57.142857,medium
3,2018-2019,Canada Border Services Agency,26,43,60.465116,medium
4,2018-2019,Canada Economic Development for Quebec Regions,1,7,14.285714,low
...,...,...,...,...,...,...
275,2022-2023,Statistics Canada,36,37,97.297297,high
276,2022-2023,Transport Canada,112,204,54.901961,medium
277,2022-2023,Treasury Board of Canada Secretariat,29,30,96.666667,high
278,2022-2023,Veterans Affairs Canada,18,24,75.000000,medium


#### Question 3: Real-time performance for service standards

As real-time performance reporting is required under the Directive on Service and Digital, what is the extent to which real-time performance reporting for services is published?

Real-time URL data is unreliable

#### Question 4: Service standards reviews

What is the percentage of service standards which have been reviewed?

GCSS review field is no longer being collected as of 2023-24 dataset

#### Question 5: Online end-to-end
As online end-to-end availability of services is required under the Policy on Service and Digital, what is the percentage of applicable services that can be completed online end-to-end?

In [10]:
oip_cols = [
    'e_registration', 
    'e_authentication', 
    'e_application', 
    'e_decision', 
    'e_issuance', 
    'e_feedback'
]

# Melt the DataFrame
maf5 = pd.melt(si, id_vars=['fiscal_yr', 'service_id', 'department_name_en'], value_vars=oip_cols, var_name='online_interaction_point', value_name='activation')

# Create boolean columns for activation states
maf5['activation_y'] = (maf5['activation'] == 'Y')
maf5['activation_n'] = (maf5['activation'] == 'N')
maf5['activation_nan'] = maf5['activation'].isna()

# Group by and sum the activation columns
maf5 = maf5.groupby(['fiscal_yr', 'department_name_en', 'service_id'])[['activation_y', 'activation_n', 'activation_nan']].sum().reset_index()

# Determine conditions for online_e2e
conditions = [
    (maf5['activation_nan'] == 6),  # All interaction points are NaN
    (maf5['activation_n'] > 0)      # Some interaction points are 'N'
]
choices = [None, False]

maf5['online_e2e'] = np.select(conditions, choices, default=True).astype(bool)

# remove all Nan/Nones
maf5 = maf5.dropna(subset=['online_e2e'])

# Determine department-level counts for online e2e services and all services
maf5 = maf5.groupby(['fiscal_yr', 'department_name_en']).agg(
    online_e2e_count=('online_e2e', 'sum'), # this is wizardry to me... still not sure what is happening
    service_count=('service_id', 'nunique')
).reset_index()

# Determine score and associated result
maf5['maf5_score'] = (maf5['online_e2e_count']/maf5['service_count'])*100
maf5['maf5_result'] = pd.cut(maf5['maf5_score'], bins=score_bins, labels=score_results, right=False)

maf5



Unnamed: 0,fiscal_yr,department_name_en,online_e2e_count,service_count,maf5_score,maf5_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,0,4,0.000000,low
1,2018-2019,Agriculture and Agri-Food Canada,9,30,30.000000,low
2,2018-2019,Atlantic Canada Opportunities Agency,1,5,20.000000,low
3,2018-2019,Canada Border Services Agency,2,42,4.761905,low
4,2018-2019,Canada Economic Development for Quebec Regions,0,3,0.000000,low
...,...,...,...,...,...,...
373,2022-2023,Transportation Safety Board of Canada,0,4,0.000000,low
374,2022-2023,Treasury Board of Canada Secretariat,19,27,70.370370,medium
375,2022-2023,Veterans Affairs Canada,19,30,63.333333,medium
376,2022-2023,Veterans Review and Appeal Board,0,1,0.000000,low


#### Question 6: Online client interaction points
As online end-to-end availability of services is required under the Policy on Service and Digital, what is the percentage of client interaction points that are available online for services?

In [11]:
oip_cols = [
    'e_registration', 
    'e_authentication', 
    'e_application', 
    'e_decision', 
    'e_issuance', 
    'e_feedback'
]

# Melt the DataFrame
maf6 = pd.melt(si, id_vars=['fiscal_yr', 'service_id', 'department_name_en'], value_vars=oip_cols, var_name='online_interaction_point', value_name='activation').dropna()

maf6['activation'] = (maf6['activation'] == 'Y')

maf6 = maf6.groupby(['fiscal_yr', 'department_name_en']).agg(
    activated_point_count=('activation', 'sum'), # this is wizardry to me... still not sure what is happening
    point_count=('service_id', 'count')
).reset_index()

# Determine score and associated result
maf6['maf6_score'] = (maf6['activated_point_count']/maf6['point_count'])*100
maf6['maf6_result'] = pd.cut(maf6['maf6_score'], bins=score_bins, labels=score_results, right=False)


maf6

Unnamed: 0,fiscal_yr,department_name_en,activated_point_count,point_count,maf6_score,maf6_result
0,2018-2019,Administrative Tribunals Support Service of Ca...,2,6,33.333333,low
1,2018-2019,Agriculture and Agri-Food Canada,107,159,67.295597,medium
2,2018-2019,Atlantic Canada Opportunities Agency,11,13,84.615385,high
3,2018-2019,Canada Border Services Agency,32,179,17.877095,low
4,2018-2019,Canada Economic Development for Quebec Regions,4,13,30.769231,low
...,...,...,...,...,...,...
360,2022-2023,Transportation Safety Board of Canada,0,24,0.000000,low
361,2022-2023,Treasury Board of Canada Secretariat,105,116,90.517241,high
362,2022-2023,Veterans Affairs Canada,123,141,87.234043,high
363,2022-2023,Veterans Review and Appeal Board,5,6,83.333333,high


#### Question 7: ICT Accessibility
As accessibility is required under the Policy on Service and Digital, what is the percentage of services available online that have been assessed for ICT accessibility?

Accessibility data is garbage, and we are no longer collecting it through the serice inventory

## Combining other datasets with service inventory and service standards

### Spending and FTEs for programs responsible for service delivery

Given a service, what are the number of actual and planned FTEs by fiscal year for the program responsible for service delivery? What is the actual and planned spending?

In [12]:
rbpo.head()

Unnamed: 0,fiscal_yr,organization_id,organization,core_responsibility_en,program_id,program_name,planned_spending_1,actual_spending,planned_spending_2,planned_spending_3,planned_ftes_1,actual_ftes,planned_ftes_2,planned_ftes_3,planning_explanation,variance_explanation
0,2018-2019,1,Department of Agriculture and Agri-Food,Domestic and International Markets,BWN01,Trade and Market Expansion,53105701.0,52360723.82,53014508.0,53014508.0,171.0,183.0,171.0,171.0,,
1,2018-2019,1,Department of Agriculture and Agri-Food,Domestic and International Markets,BWN02,Sector Engagement and Development,33331249.0,34247456.65,30455570.0,30455570.0,179.0,184.0,179.0,179.0,,
2,2018-2019,1,Department of Agriculture and Agri-Food,Domestic and International Markets,BWN03,Farm Products Council of Canada,3048578.0,2520779.52,3048552.0,3048552.0,23.0,17.0,23.0,23.0,,Actual spending was lower than planned spendin...
3,2018-2019,1,Department of Agriculture and Agri-Food,Domestic and International Markets,BWN04,Dairy Programs,94238832.0,99881679.91,83258832.0,78288832.0,36.0,46.0,37.0,37.0,,Actual full-time equivalents were higher than ...
4,2018-2019,1,Department of Agriculture and Agri-Food,Domestic and International Markets,BWN05,Canadian Pari-Mutuel Agency,0.0,-317141.62,-216000.0,-53000.0,31.0,31.0,31.0,31.0,,The Canadian Pari-Mutuel Agency is not funded ...


In [13]:

# Reformat program data table to be easier to work with, filter out irrelevant information

# Define columns related to measures: spending and FTEs (planned and actual)
fte_spend_cols = [
    'planned_spending_1', 'actual_spending', 'planned_spending_2', 'planned_spending_3',
    'planned_ftes_1', 'actual_ftes', 'planned_ftes_2', 'planned_ftes_3'
]

# Melt (unpivot) the DataFrame to long format
rbpo_melted = pd.melt(
    rbpo, 
    id_vars=['fiscal_yr', 'organization_id', 'program_id', 'core_responsibility_en'], 
    value_vars=fte_spend_cols, 
    var_name='plan_actual_yr', 
    value_name='measure'
)

# Split 'plan_actual_yr' into separate columns for planned/actual, spending/FTEs, and year adjustment
rbpo_melted[['planned_actual', 'spending_fte', 'yr_adjust']] = rbpo_melted['plan_actual_yr'].str.split('_', expand=True)
rbpo_melted['yr_adjust'] = rbpo_melted['yr_adjust'].fillna('1').astype(int) - 1

# Calculate 'measure_yr' and 'report_yr' from 'fiscal_yr' and 'yr_adjust'
rbpo_melted['measure_yr'] = rbpo_melted['fiscal_yr'].str.split('-').str[1].astype(int) + rbpo_melted['yr_adjust']
rbpo_melted['report_yr'] = rbpo_melted['fiscal_yr'].str.split('-').str[1].astype(int)

# Get the latest fiscal year from the Service inventory (four digit fy, year of end of fy)
latest_si_fy = si['fiscal_yr'].str.split('-').str[1].astype(int).max()

# Separate actuals and future planned data (beyond the latest service fiscal year)
rbpo_melted_actuals = rbpo_melted[rbpo_melted['planned_actual'] == 'actual']
rbpo_melted_planned = rbpo_melted[
    (rbpo_melted['planned_actual'] == 'planned') & (rbpo_melted['report_yr'] > latest_si_fy)
]

# Sort and drop duplicate planned entries, keeping the latest by 'report_yr'
rbpo_melted_planned = rbpo_melted_planned.sort_values(
    by=['report_yr', 'organization_id', 'program_id', 'spending_fte'], 
    ascending=False
).drop_duplicates(subset=['measure_yr','organization_id', 'program_id', 'spending_fte'])

# Concatenate actuals and planned entries, drop any remaining NaNs
rbpo_melted = pd.concat([rbpo_melted_planned, rbpo_melted_actuals]).dropna()

# Pivot to get a wide format table with spending/FTE columns, aggregating with 'sum'
rbpo_melted = rbpo_melted.pivot_table(
    index=['organization_id', 'core_responsibility_en', 'program_id', 'report_yr', 'measure_yr', 'planned_actual'], 
    columns=['spending_fte'], 
    values='measure', 
    aggfunc='sum'
).sort_values(
    by=['organization_id', 'program_id', 'report_yr','measure_yr']
).reset_index()

# Set up a fiscal year column  to be able to include years beyond the service inventory when joining.
# if measure year > latest service fy, = latest service fy

rbpo_melted.loc[rbpo_melted['measure_yr']>latest_si_fy, 'si_link_yr'] = latest_si_fy
rbpo_melted.loc[rbpo_melted['measure_yr']<=latest_si_fy, 'si_link_yr'] = rbpo_melted['measure_yr']

rbpo_melted['si_link_yr'] =rbpo_melted['si_link_yr'].astype(int) 

rbpo_melted.head()

spending_fte,organization_id,core_responsibility_en,program_id,report_yr,measure_yr,planned_actual,ftes,spending,si_link_yr
0,1,Domestic and International Markets,BWN01,2019,2019,actual,183.0,52360723.82,2019
1,1,Domestic and International Markets,BWN01,2020,2020,actual,191.0,58764607.74,2020
2,1,Domestic and International Markets,BWN01,2021,2021,actual,190.0,49132118.39,2021
3,1,Domestic and International Markets,BWN01,2022,2022,actual,181.0,47150140.18,2022
4,1,Domestic and International Markets,BWN01,2023,2023,actual,183.0,56134845.18,2023


In [14]:
# Get org_id into service inventory
# Set index for department and organization variant data
temp1 = si.set_index('department_name_en')
temp2 = org_var

# Join on the department name and include org_id, then set new multi-index
temp3 = temp1.join(temp2)[['service_id', 'fiscal_yr', 'org_id']].set_index(['service_id', 'fiscal_yr'])

# Get the program_id into the service inventory
# Set index for service-program correspondence table
temp4 = serv_prog.set_index(['fiscal_yr', 'service_id'])

# Join the service inventory and the program correspondence table then clean up by resetting the index and dropping NaNs
temp5 = temp3.join(temp4).reset_index().dropna()

# Generate a 4-digit year in the service inventory to link to the program data
temp5['si_link_yr'] = temp5['fiscal_yr'].str.split('-').str[1].astype(int)

# Set a new multi-index for the expanded service inventory and rename org_id to align to the program table
temp5 = temp5.rename(columns={'org_id': 'organization_id'}).set_index(['si_link_yr', 'organization_id', 'program_id'])

# Set index for program data and join with expanded service inventory
temp6 = rbpo_melted.set_index(['si_link_yr', 'organization_id', 'program_id'])

service_fte_spending = temp5.join(temp6, lsuffix='_si', rsuffix='_program').reset_index()

# define filename with timestamp, export to csv
fn = 'service_fte_spending_'+current_datetime_str+'.csv'
service_fte_spending.to_csv(fn, index=False, header=True, sep=';')
