## Business Understanding
The San Francisco Controller's Office has maintained an Employee Compensation database since fiscal year 2013.
This salary and benefits data available in *.csv is summarized and presented on the Employee Compensation report at http://openbook.sfgov.org.
Compensation is important in any HR and city government's efforts to obtain, maintain, and retain an effective work force.
Tracking and measuring employee compensation ensures equity, cost control, and compliance to government regulations.
This enables designing compensation plans that compete with established labour markets, rewarding employee contributions
for desired results, promoting acquisition and upgrades of knowledge and skills, supporting team work, and increasing
workforce engagement to go the extra mile for the city organization.

In [2]:
#Employee_Compensation  df
import pandas as pd

In [4]:
data_df = pd.read_csv('/Users/mtran/Documents/SMUDataSciences/2019-Summer/data/Employee_Compensation.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [5]:
data_df.shape

(831992, 22)

## Data Meaning Type

Reference https://dev.socrata.com/foundry/data.sfgov.org/88g8-5mnd
Year Type: Fiscal (July through June) or Calendar (January through December)
Year: An accounting period of 12 months. 
Organization Group: Org Group is a group of Departments.
Department: The primary organizational unit used by the City and County of San Francisco.
Union: Represent employees in collective bargaining agreements. 
Job Family: Combines similar Jobs into meaningful groups.
Job: Defined by the Human Resources classification unit.
Employee Identifier: Represents one employee.
Salaries: Normal salaries paid to permanent or temporary City employees.
Overtime: Amounts paid to City employees working in excess of 40 hours per week.
Other Salaries: Various irregular payments made to City employees including premium pay, incentive pay, or other one-time payments.
Total Salary: The sum of all salaries paid to City employees.
Retirement: City contributions to employee retirement plans.
Health and Dental: Pro-rated citywide average premiums to health and dental insurance plans (not employee-specific).
Other Benefits: Mandatory benefits paid on behalf of employees, such as Social Security (FICA and Medicare) contributions, unemployment insurance premiums, and minor discretionary benefits
Total Benefits: The sum of all benefits paid to City employees.
Total Compensation: The sum of all salaries and benefits paid to City employees.

In [None]:
data_df.dtypes

In [6]:
# Data Distribution
data_df.describe()

Unnamed: 0,Year,Organization Group Code,Union Code,Employee Identifier,Salaries,Overtime,Other Salaries,Total Salary,Retirement,Health and Dental,Other Benefits,Total Benefits,Total Compensation
count,831992.0,831992.0,831478.0,831992.0,831992.0,831992.0,831992.0,831992.0,831992.0,831992.0,831992.0,831992.0,831992.0
mean,2016.522378,3.883267,489.3665,4228317.0,51959.637518,4005.165054,2717.734969,58682.32924,10183.409841,9302.689867,3892.151278,23184.343952,81866.673192
std,1.880542,2.184496,331.603228,4275581.0,46842.544963,10917.224922,6300.872814,54150.859283,9738.314186,7441.110967,4023.968278,19837.415699,72569.541793
min,2013.0,1.0,1.0,1.0,-68771.78,-18458.15,-19131.1,-68771.78,-30621.43,-2940.47,-10636.5,-21295.15,-74082.61
25%,2015.0,2.0,250.0,27978.0,3850.23,0.0,0.0,4463.8,0.0,2162.4,302.58,1033.4775,6030.5575
50%,2017.0,4.0,535.0,56007.0,50464.435,0.0,134.34,54876.81,10132.48,10754.78,2958.58,24844.105,81447.9
75%,2018.0,6.0,790.0,8553170.0,82940.01,1838.2975,2554.8325,92894.3925,16833.685,13037.64,6257.01,39256.4075,132136.33
max,2028.0,7.0,990.0,10710140.0,631952.71,309897.2,342802.63,637457.58,118296.72,36609.5,37198.6,154070.45,790326.78


In [8]:
# Check Missing Data
total = data_df.isnull().sum().sort_values(ascending = False)
percent = (data_df.isnull().sum() / data_df.isnull().count() * 100).sort_values(ascending = False)
pd.concat([total, percent], axis = 1, keys = ['Total', 'Percent']).transpose()

Unnamed: 0,Department,Department Code,Union Code,Union,Job,Total Compensation,Job Family,Year,Organization Group Code,Organization Group,...,Total Benefits,Employee Identifier,Salaries,Overtime,Other Salaries,Total Salary,Retirement,Health and Dental,Other Benefits,Year Type
Total,408755.0,28944.0,514.0,514.0,3.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Percent,49.129679,3.47888,0.061779,0.061779,0.000361,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Data Quality

We need to decide on whether to use CY or FY dataset to cut half the rows.
Dept Code is inconsistent in CY and still has in blanks regardless of year
Dept is 50% blank but can be fixed using reference https://data.sfgov.org/City-Management-and-Ethics/Reference-Department-Code-List/j2hz-23ps/data
Union descriptions inconsistent
Job Family NULL and Unassigned
3 Missing Job
With the exception of Health & Benefits, there are are negative values in Salaries, Retirement, etc.
2028 as typo in year

## Simple Statistics

In [10]:
import pandas_profiling
pandas_profiling.ProfileReport(data_df)

0,1
Number of variables,22
Number of observations,831992
Total Missing (%),0.0%
Total size in memory,139.6 MiB
Average record size in memory,176.0 B

0,1
Numeric,9
Categorical,9
Boolean,0
Date,0
Text (Unique),0
Rejected,4
Unsupported,0

0,1
Distinct count,57
Unique (%),0.0%
Missing (%),100.0%
Missing (n),408755

0,1
DPH Public Health,91972
MTA Municipal Transprtn Agncy,62214
HSA Human Services Agency,34435
Other values (53),234616
(Missing),408755

Value,Count,Frequency (%),Unnamed: 3
DPH Public Health,91972,0.0%,
MTA Municipal Transprtn Agncy,62214,0.0%,
HSA Human Services Agency,34435,0.0%,
POL Police,33548,0.0%,
REC Recreation & Park Commsn,23744,0.0%,
AIR Airport Commission,19721,0.0%,
FIR Fire Department,17680,0.0%,
DPW GSA - Public Works,16410,0.0%,
SHF Sheriff,11040,0.0%,
ADM Gen Svcs Agency-City Admin,10733,0.0%,

0,1
Distinct count,113
Unique (%),0.0%
Missing (%),100.0%
Missing (n),28944

0,1
DPH,140389
MTA,95416
POL,51284
Other values (109),515959

Value,Count,Frequency (%),Unnamed: 3
DPH,140389,0.0%,
MTA,95416,0.0%,
POL,51284,0.0%,
REC,35733,0.0%,
DSS,34435,0.0%,
AIR,29987,0.0%,
__NOT_APPLICABLE__,28940,0.0%,
240658,28450,0.0%,
FIR,27798,0.0%,
DPW,25830,0.0%,

0,1
Distinct count,104180
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,4228300
Minimum,1
Maximum,10710137
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,5622
Q1,27978
Median,56007
Q3,8553200
95-th percentile,8598100
Maximum,10710137
Range,10710136
Interquartile range,8525200

0,1
Standard deviation,4275600
Coef of variation,1.0112
Kurtosis,-1.9934
Mean,4228300
MAD,4272900
Skewness,0.037932
Sum,3517925966428
Variance,1.8281e+13
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
8526577,32,0.0%,
8526576,31,0.0%,
8503228,28,0.0%,
8526569,26,0.0%,
21259,24,0.0%,
22162,24,0.0%,
46461,24,0.0%,
538,23,0.0%,
51801,23,0.0%,
2026,23,0.0%,

Value,Count,Frequency (%),Unnamed: 3
1,10,0.0%,
2,12,0.0%,
3,7,0.0%,
4,10,0.0%,
5,4,0.0%,

Value,Count,Frequency (%),Unnamed: 3
10690144,3,0.0%,
10690145,3,0.0%,
10690146,3,0.0%,
10700146,3,0.0%,
10710137,3,0.0%,

0,1
Distinct count,169534
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,9302.7
Minimum,-2940.5
Maximum,36610
Zeros (%),0.0%

0,1
Minimum,-2940.5
5-th percentile,0.0
Q1,2162.4
Median,10755.0
Q3,13038.0
95-th percentile,28733.0
Maximum,36610.0
Range,39550.0
Interquartile range,10875.0

0,1
Standard deviation,7441.1
Coef of variation,0.79989
Kurtosis,1.0748
Mean,9302.7
MAD,5813.1
Skewness,0.83467
Sum,7739800000
Variance,55370000
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
0.0,145325,0.0%,
7566.94,14913,0.0%,
12424.5,11315,0.0%,
13054.94,11096,0.0%,
13371.04,10962,0.0%,
12918.24,10024,0.0%,
10754.78,9094,0.0%,
12801.79,8672,0.0%,
13765.55,8483,0.0%,
10754.8,8244,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-2940.47,1,0.0%,
-1427.89,1,0.0%,
-1245.26,1,0.0%,
-847.92,1,0.0%,
-563.31,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
36369.92,8,0.0%,
36369.94,10,0.0%,
36369.96,4,0.0%,
36369.98,2,0.0%,
36609.5,2,0.0%,

0,1
Distinct count,1321
Unique (%),0.0%
Missing (%),100.0%
Missing (n),3

0,1
Transit Operator,53155
Special Nurse,28420
Registered Nurse,25890
Other values (1317),724524

Value,Count,Frequency (%),Unnamed: 3
Transit Operator,53155,0.0%,
Special Nurse,28420,0.0%,
Registered Nurse,25890,0.0%,
Firefighter,16862,0.0%,
Custodian,16211,0.0%,
Police Officer 3,15365,0.0%,
Public Service Trainee,15262,0.0%,
Police Officer,13908,0.0%,
Recreation Leader,13125,0.0%,
Public Svc Aide-Public Works,11748,0.0%,

0,1
Distinct count,1185
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
9163,53155
P103,28420
2320,25890
Other values (1182),724527

Value,Count,Frequency (%),Unnamed: 3
9163,53155,0.0%,
P103,28420,0.0%,
2320,25890,0.0%,
H002,16862,0.0%,
2708,16211,0.0%,
Q004,15365,0.0%,
9910,15262,0.0%,
Q002,13908,0.0%,
3279,13125,0.0%,
9916,11748,0.0%,

0,1
Distinct count,59
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
Nursing,82598
Street Transit,70075
Police Services,52573
Other values (56),626746

Value,Count,Frequency (%),Unnamed: 3
Nursing,82598,0.0%,
Street Transit,70075,0.0%,
Police Services,52573,0.0%,
Journeyman Trade,49438,0.0%,
Human Services,43396,0.0%,
Public Service Aide,35469,0.0%,
"Clerical, Secretarial & Steno",35249,0.0%,
Fire Services,33920,0.0%,
Housekeeping & Laundry,26609,0.0%,
"Budget, Admn & Stats Analysis",26164,0.0%,

0,1
Distinct count,59
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
2300,82598
9100,70075
Q000,52573
Other values (56),626746

Value,Count,Frequency (%),Unnamed: 3
2300,82598,0.0%,
9100,70075,0.0%,
Q000,52573,0.0%,
7300,49438,0.0%,
2900,43396,0.0%,
9900,35469,0.0%,
1400,35249,0.0%,
H000,33920,0.0%,
2700,26609,0.0%,
1800,26164,0.0%,

0,1
Distinct count,7
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
"Public Works, Transportation & Commerce",204082
General City Responsibilities,188665
Community Health,140389
Other values (4),298856

Value,Count,Frequency (%),Unnamed: 3
"Public Works, Transportation & Commerce",204082,0.0%,
General City Responsibilities,188665,0.0%,
Community Health,140389,0.0%,
Public Protection,124013,0.0%,
General Administration & Finance,58782,0.0%,
Human Welfare & Neighborhood Development,58269,0.0%,
Culture & Recreation,57792,0.0%,

0,1
Distinct count,7
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.8833
Minimum,1
Maximum,7
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,1
Q1,2
Median,4
Q3,6
95-th percentile,7
Maximum,7
Range,6
Interquartile range,4

0,1
Standard deviation,2.1845
Coef of variation,0.56254
Kurtosis,-1.4036
Mean,3.8833
MAD,1.9072
Skewness,0.22765
Sum,3230847
Variance,4.772
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
2,204082,0.0%,
7,188665,0.0%,
4,140389,0.0%,
1,124013,0.0%,
6,58782,0.0%,
3,58269,0.0%,
5,57792,0.0%,

Value,Count,Frequency (%),Unnamed: 3
1,124013,0.0%,
2,204082,0.0%,
3,58269,0.0%,
4,140389,0.0%,
5,57792,0.0%,

Value,Count,Frequency (%),Unnamed: 3
3,58269,0.0%,
4,140389,0.0%,
5,57792,0.0%,
6,58782,0.0%,
7,188665,0.0%,

0,1
Distinct count,406990
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3892.2
Minimum,-10636
Maximum,37199
Zeros (%),0.0%

0,1
Minimum,-10636.0
5-th percentile,0.0
Q1,302.58
Median,2958.6
Q3,6257.0
95-th percentile,10130.0
Maximum,37199.0
Range,47835.0
Interquartile range,5954.4

0,1
Standard deviation,4024
Coef of variation,1.0339
Kurtosis,6.5465
Mean,3892.2
MAD,3147.8
Skewness,1.7813
Sum,3238200000
Variance,16192000
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
0.0,151571,0.0%,
0.01,222,0.0%,
-0.01,144,0.0%,
0.02,127,0.0%,
0.04,95,0.0%,
0.06,91,0.0%,
-0.02,80,0.0%,
0.03,75,0.0%,
0.07,65,0.0%,
0.16,64,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-10636.5,1,0.0%,
-10469.09,1,0.0%,
-9857.74,1,0.0%,
-9717.34,1,0.0%,
-9697.77,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
36587.58,1,0.0%,
36632.28,3,0.0%,
36669.61,1,0.0%,
36815.39,1,0.0%,
37198.6,3,0.0%,

0,1
Distinct count,248360
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2717.7
Minimum,-19131
Maximum,342800
Zeros (%),0.0%

0,1
Minimum,-19131.0
5-th percentile,0.0
Q1,0.0
Median,134.34
Q3,2554.8
95-th percentile,14281.0
Maximum,342800.0
Range,361930.0
Interquartile range,2554.8

0,1
Standard deviation,6300.9
Coef of variation,2.3184
Kurtosis,129.72
Mean,2717.7
MAD,3555.5
Skewness,7.4911
Sum,2261100000
Variance,39701000
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
0.0,365313,0.0%,
250.0,3709,0.0%,
600.0,3643,0.0%,
1500.0,2741,0.0%,
624.0,2561,0.0%,
528.0,2394,0.0%,
626.4,1749,0.0%,
2.4,1638,0.0%,
40.0,1313,0.0%,
11.54,1027,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-19131.1,1,0.0%,
-7058.59,1,0.0%,
-6838.0,1,0.0%,
-4368.0,1,0.0%,
-4318.0,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
229344.52,1,0.0%,
239294.57,1,0.0%,
244899.02,1,0.0%,
336726.34,1,0.0%,
342802.63,1,0.0%,

0,1
Distinct count,203653
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,4005.2
Minimum,-18458
Maximum,309900
Zeros (%),0.0%

0,1
Minimum,-18458.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,1838.3
95-th percentile,24068.0
Maximum,309900.0
Range,328360.0
Interquartile range,1838.3

0,1
Standard deviation,10917
Coef of variation,2.7258
Kurtosis,42.271
Mean,4005.2
MAD,5978.4
Skewness,5.0723
Sum,3332300000
Variance,119190000
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
0.0,519527,0.0%,
2.49,124,0.0%,
47.5,94,0.0%,
0.02,87,0.0%,
0.01,62,0.0%,
283.85,61,0.0%,
41.33,53,0.0%,
12.94,53,0.0%,
4.84,48,0.0%,
370.01,45,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-18458.15,1,0.0%,
-12308.66,1,0.0%,
-1072.88,2,0.0%,
-611.75,1,0.0%,
-487.46,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
261717.31,1,0.0%,
264618.29,2,0.0%,
265139.1,1,0.0%,
304546.25,2,0.0%,
309897.2,1,0.0%,

0,1
Correlation,0.94634

0,1
Distinct count,389663
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,51960
Minimum,-68772
Maximum,631950
Zeros (%),0.0%

0,1
Minimum,-68772.0
5-th percentile,0.0
Q1,3850.2
Median,50464.0
Q3,82940.0
95-th percentile,133480.0
Maximum,631950.0
Range,700720.0
Interquartile range,79090.0

0,1
Standard deviation,46843
Coef of variation,0.90152
Kurtosis,0.4563
Mean,51960
MAD,39054
Skewness,0.74139
Sum,43230000000
Variance,2194200000
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
0.0,136959,0.0%,
59300.0,1103,0.0%,
80800.0,528,0.0%,
230.3,527,0.0%,
53667.9,520,0.0%,
32242.0,424,0.0%,
56531.0,388,0.0%,
53675.0,385,0.0%,
54703.0,339,0.0%,
404.6,332,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-68771.78,1,0.0%,
-33808.2,1,0.0%,
-18437.73,1,0.0%,
-17635.32,1,0.0%,
-9942.33,2,0.0%,

Value,Count,Frequency (%),Unnamed: 3
527343.14,1,0.0%,
533985.94,1,0.0%,
537847.86,3,0.0%,
630751.46,2,0.0%,
631952.71,1,0.0%,

0,1
Correlation,0.94109

0,1
Correlation,0.94733

0,1
Correlation,0.96934

0,1
Distinct count,130
Unique (%),0.0%
Missing (%),100.0%
Missing (n),514

0,1
"SEIU, Local 1021, Misc",154919
"SEIU - Miscellaneous, Local 1021",119630
"Prof & Tech Eng, Local 21",61708
Other values (126),495221

Value,Count,Frequency (%),Unnamed: 3
"SEIU, Local 1021, Misc",154919,0.0%,
"SEIU - Miscellaneous, Local 1021",119630,0.0%,
"Prof & Tech Eng, Local 21",61708,0.0%,
"Prof & Tech Engineers - Miscellaneous, Local 21",54387,0.0%,
"SEIU - Staff and Per Diem Nurses, Local 1021",31916,0.0%,
"SEIU, Local 1021, RN",28847,0.0%,
Police Officers' Association,27406,0.0%,
"TWU, Local 250-A, TransitOpr",26898,0.0%,
"Transport Workers - Transit Operators, Local 250-A",26257,0.0%,
POA,25480,0.0%,

0,1
Distinct count,71
Unique (%),0.0%
Missing (%),100.0%
Missing (n),514
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,489.37
Minimum,1
Maximum,990
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,21
Q1,250
Median,535
Q3,790
95-th percentile,911
Maximum,990
Range,989
Interquartile range,540

0,1
Standard deviation,331.6
Coef of variation,0.67762
Kurtosis,-1.6068
Mean,489.37
MAD,310.22
Skewness,-0.24403
Sum,406900000
Variance,109960
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
790.0,234079,0.0%,
21.0,110851,0.0%,
791.0,60763,0.0%,
253.0,53155,0.0%,
911.0,52886,0.0%,
250.0,45532,0.0%,
535.0,37013,0.0%,
798.0,32837,0.0%,
351.0,25745,0.0%,
261.0,24578,0.0%,

Value,Count,Frequency (%),Unnamed: 3
1.0,9050,0.0%,
2.0,1794,0.0%,
3.0,1233,0.0%,
4.0,2738,0.0%,
6.0,17595,0.0%,

Value,Count,Frequency (%),Unnamed: 3
930.0,1474,0.0%,
933.0,1035,0.0%,
965.0,534,0.0%,
969.0,26,0.0%,
990.0,44,0.0%,

0,1
Distinct count,8
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2016.5
Minimum,2013
Maximum,2028
Zeros (%),0.0%

0,1
Minimum,2013
5-th percentile,2013
Q1,2015
Median,2017
Q3,2018
95-th percentile,2019
Maximum,2028
Range,15
Interquartile range,3

0,1
Standard deviation,1.8805
Coef of variation,0.00093257
Kurtosis,-0.90593
Mean,2016.5
MAD,1.5972
Skewness,-0.46249
Sum,1677730486
Variance,3.5364
Memory size,6.3 MiB

Value,Count,Frequency (%),Unnamed: 3
2017,198519,0.0%,
2018,166914,0.0%,
2019,130340,0.0%,
2016,88478,0.0%,
2015,86067,0.0%,
2014,82291,0.0%,
2013,79380,0.0%,
2028,3,0.0%,

Value,Count,Frequency (%),Unnamed: 3
2013,79380,0.0%,
2014,82291,0.0%,
2015,86067,0.0%,
2016,88478,0.0%,
2017,198519,0.0%,

Value,Count,Frequency (%),Unnamed: 3
2016,88478,0.0%,
2017,198519,0.0%,
2018,166914,0.0%,
2019,130340,0.0%,
2028,3,0.0%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
Calendar,497457
Fiscal,334535

Value,Count,Frequency (%),Unnamed: 3
Calendar,497457,0.0%,
Fiscal,334535,0.0%,

Unnamed: 0,Year Type,Year,Organization Group Code,Organization Group,Department Code,Department,Union Code,Union,Job Family Code,Job Family,Job Code,Job,Employee Identifier,Salaries,Overtime,Other Salaries,Total Salary,Retirement,Health and Dental,Other Benefits,Total Benefits,Total Compensation
0,Calendar,2028,1,Public Protection,CRT,,792.0,Utd Pub EmpL790 SEIU-Crt Clrks,0,Untitled,420C,Deputy Court Clerk II,8540990,674.28,0.0,5.76,680.04,130.91,0.0,53.86,184.77,864.81
1,Calendar,2028,7,General City Responsibilities,229259,,792.0,Utd Pub EmpL790 SEIU-Crt Clrks,0,Untitled,420C,Deputy Court Clerk II,8540990,674.28,0.0,5.76,680.04,130.91,0.0,53.86,184.77,864.81
2,Fiscal,2028,1,Public Protection,CRT,,792.0,Utd Pub EmpL790 SEIU-Crt Clrks,0,Untitled,420C,Deputy Court Clerk II,8540990,674.28,0.0,5.76,680.04,130.91,0.0,53.86,184.77,864.81
3,Calendar,2019,4,Community Health,DPH,,250.0,"SEIU, Local 1021, Misc",7500,Semi-Skilled & General Labor,7524,Institution Utility Worker,8507272,47083.61,19066.4,0.0,66150.01,10587.54,10754.78,5247.49,41248.23,107398.24
4,Calendar,2019,4,Community Health,DPH,,791.0,"SEIU, Local 1021, RN",2300,Nursing,P103,Special Nurse,8513113,11091.57,0.0,461.77,11553.34,0.0,0.0,689.13,689.13,12242.47


## Visualize Attributes
PCA Scatterplots, Box, Violin

## Explore Joint Attributes
Retirement COR total salary
Total Benefits COR Retirement
Total Compensation COR Total Benefits
Total Salary COR Salaries


## Explore Attributes and Class
Interesting relationships between features, i.e.  in terms of Org Group or Job through 2028

## New Features

Clarify: WOuld "Unpaid Time Off Liability"


## Exceptional Work - Lei ??
PCA Provide additional analyses. One idea: implement dimensionality reduction, then visualize and interpret the results.