# Ontario's public sector salary disclosure - 2019 EDA
### "The names, positions, salaries and total taxable benefits of public sector employees paid \\(\$\\)100,000 or more in a calendar year."

![](https://www.google.com/maps/vt/data=kS8SDz9HFAu62DnpOEORnLF2W6ROdr8py5iRqfV_mZYKO9gbhUTCvi-n4Wnc2LvCKNnysx_-9VKF04x4NRiC88Tkw8-qmWczhEh_WP-7z6edGCGHMEkNrMVCOL0L-tOl08D5jtElq-9rz51rl5wJyzjsqlcPdJYirf5P-Zo_u9UkGAOJhW_WewNTafh4nM951Q_y_KR_7uV8SF4uPup-hjn3f4mSmFpzGWbPN5QRfKNDII0)

<a id="table-of-contents"></a>

## Table of contents

<p style="line-height: 1.6em;">
    <a href="#background">1. Background and summary</a><br>
    <a href="#number-employees">2. Number of employees</a><br>
    <a href="#salaries">3. Salaries</a><br>
    <a href="#taxable-benefits">4. Taxable benefits</a><br>
    <a href="#salaries-and-taxable-benefits">5. Relationship between salaries and taxable benefits</a><br>
    <a href="#sectors">6. Sectors</a><br>
    <a href="#employers">7. Employers</a><br>
    <a href="#job-titles">8. Job titles</a><br>
    <a href="#names">9. Names</a><br>
    <a href="#misc">10. Miscellaneous</a><br>
    <a href="#references">11. References</a><br>
</p>

<a id="background"></a>
<!-- [Return to table of contents](#table-of-contents) -->

# Background and summary
> *The Public Sector Salary Disclosure Act, 1996* makes Ontario’s public sector more open and accountable to taxpayers. The act requires organizations that receive public funding from the Province of Ontario to make public, by March 31 each year, the names, positions, salaries and total taxable benefits of employees paid \\(\$\\)100,000 or more in the previous calendar year.

\- https://www.ontario.ca/page/public-sector-salary-disclosure

The public sector salary disclosure is also known as the Ontario [sunshine list](https://en.wikipedia.org/wiki/Sunshine_list) and is published by March 31st annually.<sup><a href="#references">[1]</a></sup> Note that this data is not an exhaustive list of all public sector employees that earn more than \\(\$\\)100,000. In particular, not all organizations are covered by the Public Sector Salary Disclosure Act (PSSDA).

The following organizations are covered by the PSSDA:<sup><a href="#references">[2]</a></sup>
- The Government of Ontario, crown agencies, municipalities, hospitals, boards of public health, school boards, universities, colleges, Hydro One, and Ontario Power Generation.
- Non-profit organizations receiving \\(\$\\)1 million in funding or more.
- Organizations receiving between \\(\$\\)120,000 and \\(\$\\)1 million in funding are covered if the funding they receive is 10 per cent or more of their gross revenues.

Directly from the [Government of Ontario website](https://www.ontario.ca/page/public-sector-salary-disclosure-background-and-faq#section-8):
> The \\(\$\\)100,000 figure means salary before taxes, and does not include taxable benefits. However, for those who are paid \\(\$\\)100,000 or more, the total value of these taxable benefits must be disclosed.

A taxable benefit is a benefit from an employer to an employee. Examples of taxable benefits include:<sup><a href="#references">[3]</a><a href="#references">[4]</a></sup>
> - tips
- boarding, lodging, rent-free or low-rent housing
- travel expenses for personal travel
- personal use of an employer’s automobile
- gifts over \\(\$\\)500 per year
- use of vacation property owned by the company
- holiday trips
- prizes and awards
- life insurance premiums
- costs of employer-paid courses for personal interest not related to work

In this notebook, I aim to explore the data and document my findings. To learn how the data was cleaned, see my other notebook [here](https://www.kaggle.com/sahidvelji/cleaning-the-ontario-sunshine-list-data).

### Summary

- There are 166,977 employees on the 2019 list.
- \\(\$\\)100,000 in 1996 is equivalent to \\(\$\\)153,325.82 in 2019. Only 23,646 employees would make the 2019 list at that threshold.
- Five of the top 10 highest paid public employees on the list are Ontario Power Generation executives.
- Former Ontario Power Generation President and CEO Jeffrey Lyash topped the list, earning \\(\$\\)938,846.
- Former President and CEO of The Hospital For Sick Children Michael Apkon earned \\(\$\\)333,918 in taxable benefits, more than 3 times as much as anyone else on the list.
- The top 10 highest paid employees are all men. 
- Only 3 of the top 20 highest paid employees are women. 
- At 14th on the list, Maureen Jensen is the highest paid woman.
- The employer with the most employees on the list is Ontario Power Generation at 8,043 employees.
- The most common job title on the list is "Professor".
- At least 10% of employees on the list are professors.

Here's a sample of the data. Each row refers to an employee that earned at least \\(\$\\)100,000 in 2019. I decided to only explore the 2019 data for now because of inconsistencies across calendar years. I discuss these inconsistencies in my other [notebook](https://www.kaggle.com/sahidvelji/cleaning-the-ontario-sunshine-list-data), where I cleaned the data.

In [None]:
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
import plotly.express as px
from IPython.display import display, HTML

px.defaults.template = 'plotly_white'
px.defaults.color_discrete_sequence = ['steelblue']
MODE_BAR_BUTTONS = ['toImage', 'zoom2d', 'pan2d', 'select2d', 'lasso2d',
                    'zoomIn2d', 'zoomOut2d', 'autoScale2d', 'resetScale2d',
                    'toggleSpikelines', 'hoverClosestCartesian', 'hoverCompareCartesian']
CONFIG = {
    'modeBarButtonsToRemove': ['pan2d', 'select2d', 'lasso2d', 'toggleSpikelines']
}

pssd = pd.read_csv('/kaggle/input/the-ontario-sunshine-list/pssd.csv').convert_dtypes()
pss = {year: pssd[pssd['Calendar Year'] == year].copy() for year in range(1996, 2020)}

def display_html(df, cols=None, num_rows=0):
    if num_rows != 0:
        df_to_display = df.head(num_rows)
    else:
        df_to_display = df
    
    df_html = df_to_display.to_html(columns=cols, index=False, na_rep='',
                              escape=False, render_links=True)
    display(HTML(df_html))

In [None]:
display_html(pss[2019].sample(5, random_state=13))

<a id="number-employees"></a>
[Return to table of contents](#table-of-contents)

# Number of employees
Before diving into the 2019 data, we will take a look at the number of employees on the list for each calendar year since 1996.

In [None]:
num_employees = (pssd.groupby('Calendar Year')
                 .size()
                 .to_frame(name='NumEmployees')
                 .reset_index()
                )
fig = px.scatter(num_employees,
                 x='Calendar Year',
                 y='NumEmployees'
                )
fig.update_traces(mode='lines+markers',
                  hovertemplate=
                  '<b>%{x}</b><br>'+
                  'Number of employees: <b>%{y}</b>'
                 )
fig.update_layout(title='Number of employees on the sunshine list by calendar year',
                  xaxis_title='Calendar Year',
                  yaxis_title="Number of employees",
                  yaxis_tickformat=',',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="x",
                  yaxis_zerolinecolor='grey',
                  yaxis_zerolinewidth=1
                 )
fig.show(config=CONFIG)

In [None]:
pss[2019]['Salary Paid'].ge(153325.82).sum()

We learn that the number of Ontario's public sector employees earning more than \\(\$\\)100,000 has grown quickly from 4501 employees in 1996
to 166,977 employees in 2019. The province of Alberta publishes a similar list which is adjusted for inflation.<sup><a href="#references">[5]</a></sup>
On the other hand, Ontario's \\(\$\\)100,000 threshold has remained in place since 1996.<sup><a href="#references">[1]</a><a href="#references">[2]</a></sup>
According to the Bank of Canada inflation calculator, \\(\$\\)100,000 in 1996 is equivalent to \\(\$\\)153,325.82 in 2019.<sup><a href="#references">[6]</a></sup>
With a threshold of \\(\$\\)153,325.82, only 23,646 employees would make the 2019 list.

<a id="salaries"></a>
[Return to table of contents](#table-of-contents)

# Salaries

In [None]:
px.defaults.color_discrete_sequence = ['lightseagreen']

In [None]:
pss[2019].insert(3, 'Name', pss[2019]['First Name'].str.cat(pss[2019]['Last Name'], sep=' '))

In [None]:
fig = px.histogram(pss[2019],
                   x='Salary Paid',
                   marginal='box'
                  )
fig.update_layout(title='Distribution of salary paid',
                  xaxis_title='Salary Paid',
                  yaxis_title="Count",
                  xaxis_tickformat='$,',
                  yaxis_tickformat=',',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14
                 )
# Hist
fig.data[0].hovertemplate = 'Salary Paid: <b>%{x}</b><br>'+\
                            'Count: <b>%{y}</b>'
# Box
fig.data[1].hovertemplate = 'Salary Paid: <b>%{x:$,.0f}</b>'

fig.show(config=CONFIG)

In [None]:
f"Mean: ${pss[2019]['Salary Paid'].mean():,.2f}, Median: ${pss[2019]['Salary Paid'].median():,.2f}"

The distribution of salaries has a mean of \\(\$\\)127,388.59 and a median of \\(\$\\)115,279.21.
As expected, the distribution of salaries is highly skewed. We could visualize the data on a logarithmic scale,
but since a person can receive \\(\$\\)0.00 in taxable benefits, this approach won't work for the next section.
We will opt for a more general approach by plotting a cumulative histogram instead.<br><br>

There are two prominent peaks in the salary distribution beyond \\(\$\\)200k.
It is very likely that these are due to standard salaries put in place.
We will determine which salaries are most common and see if we can make sense of these two peaks.

In [None]:
n, bins, _ = plt.hist(pss[2019]['Salary Paid'], bins=500,
                      density=True,
                      histtype='step',
                      cumulative=True
                     )
plt.close()
fig = px.scatter(x=bins,
                 y=np.insert(n, 0, 0)
                )
fig.update_traces(mode='lines',
                  line_shape='hvh',
                  hovertemplate=
                  'Percentage of data: <b>%{y}</b><br>'+
                  'Salary Paid: <b>%{x}</b>'
                 )
fig.update_layout(title='Cumulative distribution of salary paid',
                  xaxis_title='Salary Paid',
                  yaxis_title='Percentage of data',
                  xaxis_tickformat='$,.0f',
                  yaxis_tickformat='%',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  yaxis_zerolinecolor='grey',
                  yaxis_zerolinewidth=1
                 )
fig.show(config=CONFIG)

The cumulative histogram gives us the percentage of data that falls below the given salary. For instance, from the plot, we see that approximately 33% of employees on the list earn \\(\$\\)108,388 or less.

In [None]:
common_salaries = (pss[2019]['Salary Paid']
                   .value_counts()
                   .head(10)
                   .to_frame()
                   .reset_index()
                   .rename(columns={'index': 'Salary Paid', 'Salary Paid': 'Count'})
                  )
common_salaries['Salary Paid'] = common_salaries['Salary Paid'].apply(lambda salary: f'{salary:,.2f}')
fig = px.bar(common_salaries,
             x='Count',
             y='Salary Paid',
             orientation='h'
            )
fig.update_traces(textposition='outside',
                  texttemplate='%{x}',
                  cliponaxis=False,
                  hovertemplate=
                  'Count: <b>%{x}</b><br>'+
                  'Salary Paid: <b>%{y}</b>'
                 )
fig.update_layout(title='Top 10 most common salaries',
                  yaxis_title='Salary Paid',
                  xaxis_title='Count',
                  xaxis_tickformat=',',
                  yaxis_tickprefix='$',
                  yaxis_type='category',
                  yaxis_autorange='reversed',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y"
                 )
fig.show(config=CONFIG)

Earlier, we noticed two peaks in the salary distribution beyond \\(\$\\)200k. Here, we discover that the two salaries that contribute to those peaks are \\(\$\\)227,382.93 and \\(\$\\)306,475.79. It turns out that the latter salary is standard among judges at the Ontario Court Of Justice.

In [None]:
display_html(pss[2019][pss[2019]['Salary Paid'] == 306475.79].sample(10, random_state=13),
             cols=['Sector', 'Name', 'Salary Paid', 'Taxable Benefits', 'Employer', 'Job Title'])

The salary of \\(\$\\)227,382.93 seems to be standard among attorneys at the Ministry of the Attorney General.

In [None]:
display_html(pss[2019][pss[2019]['Salary Paid'] == 227382.93].sample(10, random_state=13),
            cols=['Sector', 'Name', 'Salary Paid', 'Taxable Benefits', 'Employer', 'Job Title'])

Next, we will examine the top 20 highest paid public employees.

In [None]:
top_earners = pss[2019].nlargest(20, 'Salary Paid')
fig = px.bar(top_earners,
             x='Salary Paid',
             y='Name',
             orientation='h',
             custom_data=['Employer', 'Job Title']
            )
fig.update_traces(hovertemplate=
                  'Name:       <b>%{y}</b><br>'+
                  'Employer: <b>%{customdata[0]}</b><br>'+
                  'Job Title:   <b>%{customdata[1]}</b><br>'+
                  'Salary:      <b>%{x}</b>',
                  textposition='outside',
                  texttemplate='%{x}',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 20 highest paid public employees',
                  yaxis_title='Name',
                  yaxis_autorange='reversed',
                  xaxis_title='Salary Paid',
                  xaxis_tickformat='$,.0f',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y"                  
                 )
fig.show(config=CONFIG)

Five of the top 10 highest paid public employees on the list are Ontario Power Generation executives. The top two earners on the list both held the title "President and Chief Executive Officer" at Ontario Power Generation in 2019. It turns out that Kenneth Hartwick took over from Jeffrey Lyash after Lyash left the company on 31 March 2019.<sup><a href="#references">[7]</a></sup>

Also worth noting is that the top 10 highest paid employees are all men. Only 3 of the top 20 highest paid employees are women. At 14th on the list, Maureen Jensen is the highest paid woman.

<a id="taxable-benefits"></a>
[Return to table of contents](#table-of-contents)

# Taxable benefits

In [None]:
px.defaults.color_discrete_sequence = ['indianred']

In [None]:
fig = px.histogram(pss[2019],
                   x='Taxable Benefits',
                   marginal='box'
                  )
fig.update_layout(title='Distribution of taxable benefits',
                  xaxis_title='Taxable Benefits',
                  yaxis_title="Count",
                  xaxis_tickformat='$,',
                  yaxis_tickformat=',',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  xaxis_range=[0, 350000]
                 )
# Hist
fig.data[0].hovertemplate = 'Taxable Benefits: <b>%{x}</b><br>'+\
                            'Count: <b>%{y}</b>'
# Box
fig.data[1].hovertemplate = 'Taxable Benefits: <b>%{x:$,.0f}</b>'

fig.show(config=CONFIG)

In [None]:
f"Mean: ${pss[2019]['Taxable Benefits'].mean():,.2f}, Median: ${pss[2019]['Taxable Benefits'].median():,.2f}"

The distribution of taxable benefits has a mean of \\(\$\\)826.58 and a median of \\(\$\\)370.28. The taxable benefits distribution is even more skewed than the salary distribution. One employee earned \\(\$\\)333,918 in taxable benefits.

In [None]:
n, bins, _ = plt.hist(pss[2019]['Taxable Benefits'],
                      bins=500,
                      density=True,
                      histtype='step',
                      cumulative=True
                     )
plt.close()
fig = px.scatter(x=bins,
                 y=np.insert(n, 0, 0)
                )
fig.update_traces(mode='lines',
                  line_shape='hvh',
                  hovertemplate=
                  'Percentage of data: <b>%{y}</b><br>'+
                  'Taxable Benefits: <b>%{x}</b>'
                 )
fig.update_layout(title='Cumulative distribution of taxable benefits',
                  xaxis_title='Taxable Benefits',
                  yaxis_title='Percentage of data',
                  xaxis_tickformat='$,.0f',
                  yaxis_tickformat='%',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  yaxis_zerolinecolor='grey',
                  yaxis_zerolinewidth=1
                 )
fig.show(config=CONFIG)

From the plot, we see that about 95% of employees on the list earn \\(\$\\)3,339 or less in taxable benefits. The outlier of \\(\$\\)333,918 is just over 100 times that amount.

In [None]:
common_taxable_benefits = (pss[2019]['Taxable Benefits']
                           .value_counts()
                           .head(10)
                           .to_frame()
                           .reset_index()
                           .rename(columns={'index': 'Taxable Benefits', 'Taxable Benefits': 'Count'})
                          )
common_taxable_benefits['Taxable Benefits'] = common_taxable_benefits['Taxable Benefits'].apply(lambda tax_benefit: f'{tax_benefit:,.2f}')
fig = px.bar(common_taxable_benefits,
             x='Count',
             y='Taxable Benefits',
             orientation='h'
            )
fig.update_traces(textposition='outside',
                  texttemplate='%{x}',
                  cliponaxis=False,
                  hovertemplate=
                  'Count: <b>%{x}</b><br>'+
                  'Taxable Benefits: <b>%{y}</b>'
                 )
fig.update_layout(title='Top 10 most common taxable benefits amounts',
                  yaxis_title='Taxable Benefits',
                  xaxis_title='Count',
                  xaxis_tickformat=',',
                  yaxis_tickprefix='$',
                  yaxis_type='category',
                  yaxis_autorange='reversed',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y"
                 )
fig.show(config=CONFIG)

As expected, the most common taxable benefit received is \\(\$\\)0.00. The next most common amount is \\(\$\\)79.93 and it seems like this is common among Ontario school board employees.

In [None]:
display_html(pss[2019][pss[2019]['Taxable Benefits'] == 79.93].sample(10, random_state=13),
            cols=['Sector', 'Name', 'Salary Paid', 'Taxable Benefits', 'Employer', 'Job Title'])

In [None]:
top_taxable_benefits = pss[2019].nlargest(20, 'Taxable Benefits')
fig = px.bar(top_taxable_benefits,
             x='Taxable Benefits',
             y='Name',
             orientation='h',
             custom_data=['Employer', 'Job Title']
            )
fig.update_traces(hovertemplate=
                  'Name:                  <b>%{y}</b><br>'+
                  'Employer:            <b>%{customdata[0]}</b><br>'+
                  'Job Title:              <b>%{customdata[1]}</b><br>'+
                  'Taxable benefits: <b>%{x}</b>',
                  textposition='outside',
                  texttemplate='%{x}',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 20 highest taxable benefit recipients',
                  yaxis_title='Name',
                  yaxis_autorange='reversed',
                  xaxis_title='Taxable Benefits',
                  xaxis_tickformat='$,.0f',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode='y'
                 )
fig.show(config=CONFIG)

Michael Apkon, the former president and CEO of the Hospital For Sick Children, received more than three times the amount in taxable benefits compared to anyone else. It's also worth noting that 5 of the top 10 are hospital executives.

<a id="salaries-and-taxable-benefits"></a>
[Return to table of contents](#table-of-contents)

# Relationship between salaries and taxable benefits

In [None]:
px.defaults.color_discrete_sequence = ['darkorange']

Here, we explore the relationship between salary paid and taxable benefits received.

In [None]:
fig = px.scatter(pss[2019],
                 x='Salary Paid',
                 y='Taxable Benefits',
                 trendline='ols'
                )

fig.data[0].hovertemplate=('Salary Paid:         <b>%{x}</b><br>'
                           'Taxable Benefits: <b>%{y}</b>'
                          )

fig.update_layout(title='Relationship between salary paid and taxable benefits',
                  yaxis_title='Taxable Benefits',
                  xaxis_title='Salary Paid',
                  xaxis_tickformat='$,.0f',
                  yaxis_tickformat='$,.0f',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14
                 )
fig.show(config=CONFIG)

In [None]:
cor, p_val = scipy.stats.spearmanr(pss[2019]['Salary Paid'], pss[2019]['Taxable Benefits'])
cor, p_val

There are many outliers and the $R^2$ value is very low at approximately 0.057. Based on the Spearman correlation coefficient of 0.3126, there is a weak positive correlation between salary paid and taxable benefits received.

<a id="sectors"></a>
[Return to table of contents](#table-of-contents)

# Sectors

In [None]:
px.defaults.color_discrete_sequence = ['dodgerblue']

In [None]:
sector_employers = (pss[2019].loc[~pss[2019]['Sector'].str.contains('seconded', case=False), ['Sector', 'Employer']]
                    .drop_duplicates()
                    .reset_index(drop=True)
                   )
sector_employers_counts = (sector_employers['Sector']
                           .value_counts()
                           .to_frame()
                           .reset_index()
                           .rename(columns={'Sector': 'NumEmployers', 'index': 'Sector'})
                          )

fig = px.bar(sector_employers_counts,
             x='NumEmployers',
             y='Sector',
             orientation='h'
            )
fig.update_traces(hovertemplate=
                  'Number of employers: <b>%{x}</b><br>'+
                  'Sector: <b>%{y}</b>',
                  textposition='outside',
                  texttemplate='%{x}',
                  cliponaxis=False
                 )
fig.update_layout(title='Number of employers by sector',
                  xaxis_title='Number of employers',
                  yaxis_title='Sector',
                  yaxis_autorange='reversed',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat=','
                 )
fig.show(config=CONFIG)

In [None]:
f"Number of employers on the list: {pss[2019]['Employer'].nunique():,}"

For reference, there are 1,898 employers on the list. This means that more than half of employers on the list are part of the "Other Public Sector Employers" sector. Ontario Power Generation is a Crown corporation responsible for approximately half of the electricity generation in Ontario.<sup><a href="#references">[8]</a></sup> On this list, it is considered a sector of its own. The Judiciary sector has just two employers on the list: the Ontario Court Of Justice and the Superior Court Of Justice.

In [None]:
sector_employees = (pss[2019]['Sector']
                    .value_counts()
                    .to_frame()
                    .reset_index()
                    .rename(columns={'index': 'Sector', 'Sector': 'NumEmployees'})
                   )
seconded_sector_counts = sector_employees[sector_employees['Sector'].str.contains('seconded', case=False)]
sector_counts = sector_employees[~sector_employees['Sector'].str.contains('seconded', case=False)]

fig = px.bar(sector_counts,
             x='NumEmployees',
             y='Sector',
             orientation='h'
            )
fig.update_traces(hovertemplate=
                  'Number of employees: <b>%{x}</b><br>'+
                  'Sector: <b>%{y}</b>',
                  textposition='outside',
                  texttemplate='%{x}',
                  cliponaxis=False
                 )
fig.update_layout(title='Number of employees by sector',
                  xaxis_title='Number of employees',
                  yaxis_title='Sector',
                  yaxis_autorange='reversed',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat=','
                 )
fig.show(config=CONFIG)

By far, most employees on the list are employees of the "Municipalities & Services" sector. There are almost three times as many employees in the Universities sector than in the Colleges sector on the list. Also, if we were to combine Universities, Colleges, and School Boards into a sector called "Education", it would be the largest sector in terms of number of employees on the list.

In [None]:
not_seconded_sectors = pss[2019][~pss[2019]['Sector'].str.contains('seconded', case=False)]
fig = px.box(not_seconded_sectors,
             x='Sector',
             y='Salary Paid',
             log_y=True,
             height=700,
             custom_data=['Name', 'Employer', 'Job Title']
            )
fig.update_traces(boxmean=True,
                  hovertemplate=
                  'Name:          <b>%{customdata[0]}</b><br>'+
                  'Employer:    <b>%{customdata[1]}</b><br>'+
                  'Job Title:      <b>%{customdata[2]}</b><br>'+
                  'Salary Paid: <b>%{y}</b>',
                 )
fig.update_layout(title='Distribution of salary paid by sector',
                  xaxis_title='Sector',
                  yaxis_title='Salary Paid',
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  yaxis_tickformat='$,.0f'
                 )
fig.show(config=CONFIG)

The Judiciary sector is the only sector without outliers in the salary distribution. As we observed earlier, judges at the Ontario Court of Justice are paid a standard salary of \\(\$\\)306,475.79. The Judiciary sector also has the highest median salary, while the School Boards sector has the lowest median salary.

We will now examine the highest paid employees in a few sectors.

In [None]:
def plot_top_ten_sector(sector):
    top_ten_sector = not_seconded_sectors[not_seconded_sectors.Sector == sector].nlargest(10, 'Salary Paid')
    fig = px.bar(top_ten_sector,
                 x='Salary Paid',
                 y='Name',
                 orientation='h',
                 custom_data=['Employer', 'Job Title']
                )
    fig.update_traces(hovertemplate=
                      'Name:       <b>%{y}</b><br>'+
                      'Employer: <b>%{customdata[0]}</b><br>'+
                      'Job Title:   <b>%{customdata[1]}</b><br>'+
                      'Salary:      <b>%{x}</b>',
                      textposition='outside',
                      texttemplate='%{x}',
                      cliponaxis=False
                     )
    fig.update_layout(title='Top 10 highest paid employees in the {} sector'.format(sector),
                      yaxis_title='Name',
                      yaxis_autorange='reversed',
                      xaxis_title='Salary Paid',
                      xaxis_tickformat='$,.0f',
                      hoverlabel_bgcolor="white",
                      hoverlabel_font_size=14,
                      hovermode="y"                  
                     )
    fig.show(config=CONFIG)

In [None]:
plot_top_ten_sector('Universities')

Six of the top 10 highest paid employees in the Universities sector are University of Toronto employees. Interestingly, three of the top 10 are professors of strategic management at the University of Toronto.

In [None]:
plot_top_ten_sector('Ontario Power Generation')

As noted earlier, Ontario Power Generation is considered both an employer and a sector on this list. Four of the top 10 highest paid employees at Ontario Power Generation are responsible for nuclear operations. Barbara Keenan at 9th on the list is the highest paid woman at Ontario Power Generation.

In [None]:
plot_top_ten_sector('Municipalities & Services')

Toronto police chief Mark Saunders is the highest paid public employee in the Municipalities & Services sector, earning more than the CEO of Toronto Transit Commission (TTC). Eight of the top 10 all have "Chief" in their job titles. The other two are City Managers.

<a id="employers"></a>
[Return to table of contents](#table-of-contents)

# Employers

In [None]:
px.defaults.color_discrete_sequence = ['plum']

In [None]:
employer_counts = (pss[2019]['Employer']
                   .value_counts()
                   .head(20)
                   .to_frame()
                   .reset_index()
                   .rename(columns={'index': 'Employer', 'Employer': 'NumEmployees'})
                   .merge(sector_employers, on='Employer')
                  )

fig = px.bar(employer_counts,
             x="NumEmployees",
             y="Employer",
             orientation='h',
             custom_data=['Sector']
            )
fig.update_traces(hovertemplate=
                  'Employer: <b>%{y}</b><br>'+
                  'Number of employees: <b>%{x}</b><br>'+
                  'Sector: <b>%{customdata[0]}</b>',
                  texttemplate='%{x}',
                  textposition='outside',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 20 number of employees by employer',
                  xaxis_title='Number of employees',
                  yaxis_title='Employer',
                  yaxis_autorange='reversed',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat=','
                 )
fig.show(config=CONFIG)

Toronto based employers seem to have the most employees on the list. Ontario Power Generation tops the list with 8,043 employees earning \\(\$\\)100,000 or more.

In [None]:
employee_counts_employer = (pss[2019]['Employer']
                            .value_counts()
                            .to_frame()
                            .reset_index()
                            .rename(columns={'index': 'Employer', 'Employer': 'NumEmployees'})
                           )
top_employers_median_salary = (pss[2019]
                               .groupby('Employer')['Salary Paid']
                               .median()
                               .to_frame()
                               .reset_index()
                               .rename(columns={'Salary Paid': 'Median Salary Paid'})
                               .merge(employee_counts_employer[employee_counts_employer.NumEmployees.ge(50)])
                               .nlargest(20, 'Median Salary Paid')
                              )

fig = px.bar(top_employers_median_salary,
             x="Median Salary Paid",
             y="Employer",
             orientation='h',
             custom_data=['NumEmployees']
            )
fig.update_traces(hovertemplate=
                  'Median salary paid: <b>%{x}</b><br>'+
                  'Employer: <b>%{y}</b><br>'+
                  'Number of employees: <b>%{customdata[0]:,}</b>',
                  texttemplate='%{x}',
                  textposition='outside',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 20 highest median salary paid by employers with <br>50 or more employees on the list',
                  xaxis_title='Median salary',
                  yaxis_title='Employer',
                  yaxis_autorange='reversed',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat='$,.0f'
                 )
fig.show(config=CONFIG)

The differences are not large, but the Ministry of the Attorney General has the highest median salary among employees on the list. Note that 13 employers of the top 20 on the list are universities.

<a id="job-titles"></a>
[Return to table of contents](#table-of-contents)

# Job titles

In [None]:
px.defaults.color_discrete_sequence = ['skyblue']

In [None]:
f"Unique job titles on the list: {pss[2019]['Job Title'].nunique():,}, Total number of employees on the list: {pss[2019].shape[0]:,}"

There are 32,868 unique job titles on the list. As a reminder, there are a total of 166,977 employees on the list.

In [None]:
job_title_counts = (pss[2019]['Job Title']
                    .value_counts()
                    .head(20)
                    .to_frame()
                    .reset_index()
                    .rename(columns={'index': 'Job Title', 'Job Title': 'NumEmployees'})
                   )
fig = px.bar(job_title_counts,
             x='NumEmployees',
             y='Job Title',
             orientation='h'
            )
fig.update_traces(hovertemplate=
                  'Job title: <b>%{y}</b><br>'+
                  'Count: <b>%{x}</b><br>',
                  texttemplate='%{x}',
                  textposition='outside',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 20 most common job titles',
                  xaxis_title='Count',
                  yaxis_title='Job title',
                  yaxis_autorange='reversed',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat=','
                 )
fig.show(config=CONFIG)

In [None]:
num_profs = pss[2019]['Job Title'].str.contains('professor|professeur', case=False).sum()
'Number of professors on the list: {:,}, Percentage of professors on the list: {:.1%}'.format(num_profs, num_profs / pss[2019].shape[0])

The most common job title on the list is "Professor". However, "Associate Professor", "Assistant Professor", and "Professeur(e)" are also one of the top 20 most common job titles on the list. There are 17,911 job titles containing "professor" or "professeur". This means that at least 10% of all employees on the list are professors. Professors could also be listed under job titles such as lecturer or instructor.

<a id="names"></a>
[Return to table of contents](#table-of-contents)

# Names

In [None]:
px.defaults.color_discrete_sequence = ['mediumseagreen']

In [None]:
first_names = (pss[2019]['First Name']
               .value_counts()
               .head(10)
               .to_frame()
               .reset_index()
               .rename(columns={'index': 'First Name', 'First Name': 'Count'})
              )

fig = px.bar(first_names,
             x='Count',
             y='First Name',
             orientation='h'
            )
fig.update_traces(hovertemplate=
                  'First name: <b>%{y}</b><br>'+
                  'Count: <b>%{x}</b><br>',
                  texttemplate='%{x}',
                  textposition='outside',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 10 most common first names',
                  xaxis_title='Count',
                  yaxis_title='First Name',
                  yaxis_autorange='reversed',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat=','
                 )
fig.show(config=CONFIG)

In [None]:
'Number of unique first names: {:,}'.format(pss[2019]['First Name'].nunique())

There are 27,006 unique first names on the list. The most common first name on the list is Michael, appearing 3,029 times. Of the top 10 most common first names on the list, Jennifer is the only female first name.

In [None]:
last_names = (pss[2019]['Last Name']
              .value_counts()
              .head(10)
              .to_frame()
              .reset_index()
              .rename(columns={'index': 'Last Name', 'Last Name': 'Count'})
             )

fig = px.bar(last_names,
             x='Count',
             y='Last Name',
             orientation='h'
            )
fig.update_traces(hovertemplate=
                  'Last name: <b>%{y}</b><br>'+
                  'Count: <b>%{x}</b><br>',
                  texttemplate='%{x}',
                  textposition='outside',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 10 most common last names',
                  xaxis_title='Count',
                  yaxis_title='Last Name',
                  yaxis_autorange='reversed',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat=','
                 )
fig.show(config=CONFIG)

In [None]:
'Number of unique last names: {:,}'.format(pss[2019]['Last Name'].nunique())

There are 59,590 unique last names on the list. Smith is by far the most common last name on the list.

In [None]:
names = (pss[2019]['Name']
         .value_counts()
         .head(10)
         .to_frame()
         .reset_index()
         .rename(columns={'index': 'Name', 'Name': 'Count'})
        )

fig = px.bar(names,
             x='Count',
             y='Name',
             orientation='h'
            )
fig.update_traces(hovertemplate=
                  'Name: <b>%{y}</b><br>'+
                  'Count: <b>%{x}</b><br>',
                  texttemplate='%{x}',
                  textposition='outside',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 10 most common names',
                  xaxis_title='Count',
                  yaxis_title='Name',
                  yaxis_autorange='reversed',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat=','
                 )
fig.show(config=CONFIG)

In [None]:
'Number of unique names: {:,}'.format(pss[2019]['Name'].nunique())

There are 158,546 unique names on the list. David Smith is the most common name and six of the top 10 most common names on the list have a last name of Smith.

<a id="misc"></a>
[Return to table of contents](#table-of-contents)

# Miscellaneous

In [None]:
px.defaults.color_discrete_sequence = ['lightcoral']

In [None]:
mayors = pss[2019][pss[2019]['Job Title'].str.contains('^Mayor$|Waterloo Mayor', case=False) & pss[2019]['Sector'].str.contains('municipalities', case=False)].sort_values('Salary Paid')
fig = px.bar(mayors,
             x='Salary Paid',
             y='Name',
             orientation='h',
             custom_data=['Employer']
            )
fig.update_traces(hovertemplate=
                  'Name:          <b>%{y}</b><br>'+
                  'Employer:    <b>%{customdata[0]}</b><br>'+
                  'Salary Paid: <b>%{x}</b>',                  
                  texttemplate='%{x}',
                  textposition='outside',
                  cliponaxis=False
                 )
fig.update_layout(title='Salaries of mayors in Ontario',
                  xaxis_title='Salary Paid',
                  yaxis_title='Name',
                  height=1000,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat='$,.0f'
                 )
fig.show(config=CONFIG)

Markham mayor Frank Scarpitti tops the list while Toronto mayor John Tory is third on the list.

Since I attend the University of Toronto, let's take a look at the top 20 highest paid employees.

In [None]:
uoft = pss[2019][pss[2019]['Employer'] == 'University Of Toronto'].nlargest(20, 'Salary Paid')
fig = px.bar(uoft,
             x='Salary Paid',
             y='Name',
             orientation='h',
             custom_data=['Employer', 'Job Title']
            )
fig.update_traces(hovertemplate=
                  'Name:       <b>%{y}</b><br>'+
                  'Employer: <b>%{customdata[0]}</b><br>'+
                  'Job Title:   <b>%{customdata[1]}</b><br>'+
                  'Salary:      <b>%{x}</b>',
                  textposition='outside',
                  texttemplate='%{x}',
                  cliponaxis=False
                 )
fig.update_layout(title='Top 20 highest paid employees at University of Toronto',
                  yaxis_title='Name',
                  yaxis_autorange='reversed',
                  xaxis_title='Salary Paid',
                  xaxis_tickformat='$,.0f',
                  height=700,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y"                  
                 )
fig.show(config=CONFIG)

Professors of strategic management, finance, and accounting make the top 20. The president Meric Gertler is only the 13th highest paid employee of the University of Toronto.

Getting more specific, I am a student at the Department of Mathematical and Computational Sciences at University of Toronto Mississauga (UTM). Below is a chart including salaries of all employees that earn \\(\$\\)100,000 or more at the Department of Mathematical and Computational Sciences at UTM.

In [None]:
mcs_employees = pss[2019][pss[2019]['Job Title'].str.contains('Mathematical and Computational Sciences', case=False)].sort_values('Salary Paid')
fig = px.bar(mcs_employees,
             x='Salary Paid',
             y='Name',
             orientation='h',
             custom_data=['Taxable Benefits']
            )
fig.update_traces(hovertemplate=
                  'Name:                   <b>%{y}</b><br>'+
                  'Salary Paid:          <b>%{x}</b><br>'+
                  'Taxable Benefits:  <b>%{customdata[0]:$,.0f}</b>',
                  texttemplate='%{x}',
                  textposition='outside',
                  cliponaxis=False
                 )
fig.update_layout(title='Salaries of employees at the department of Mathematical and <br>Computational Sciences at UTM',
                  xaxis_title='Salary Paid',
                  yaxis_title='Name',
                  height=1000,
                  hoverlabel_bgcolor="white",
                  hoverlabel_font_size=14,
                  hovermode="y",
                  xaxis_tickformat='$,.0f'
                 )
fig.show(config=CONFIG)

<a id="references"></a>
[Return to table of contents](#table-of-contents)

# References

1. https://www.ontario.ca/page/public-sector-salary-disclosure
1. https://www.ontario.ca/page/public-sector-salary-disclosure-background-and-faq
1. https://www.canada.ca/en/revenue-agency/services/forms-publications/publications/t4130/employers-guide-taxable-benefits-allowances.html
1. https://turbotax.intuit.ca/tips/common-taxable-benefits-in-canada-344
1. https://www.alberta.ca/public-sector-body-compensation-disclosure.aspx
1. https://www.bankofcanada.ca/rates/related/inflation-calculator/
1. https://www.opg.com/story/seasoned-energy-executive-ready-to-lead-opg/
1. https://www.opg.com/powering-ontario/our-generation/