# San Francisco Salaries (2011-2014)

In [None]:
import pandas as pd

** Reading in the data **

In [None]:
sal = pd.read_csv('../input/Salaries.csv')
sal['BasePay'] = pd.to_numeric(sal['BasePay'], errors='coerce')
sal['OvertimePay'] = pd.to_numeric(sal['OvertimePay'], errors='coerce')
sal['OtherPay'] = pd.to_numeric(sal['OtherPay'], errors='coerce')
sal['Benefits'] = pd.to_numeric(sal['Benefits'], errors='coerce')

** Checking the format of the data **

In [None]:
sal.head()

In [None]:
sal.info()

** Finding the average base pay **

In [None]:
print("Average Base Pay: ${}".format(round(sal['BasePay'].mean(), 2)))

** Finding the highest base pay **

In [None]:
print("The highest base pay is ${}".format(round(sal['BasePay'].max(), 2)))

** What is the highest amount of OvertimePay in the dataset ? **

In [None]:
sal['OvertimePay'].max()

** Data for the person with the highest total compensation**

In [None]:
sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].max()]

** Data for the person with the lowest total compensation **

In [None]:
sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].min()]

For some reason this person has no base pay, no overtime pay, no benefits, and actually owes $618.13.

** Average base pay over the entire time horizon (2011-2014) as well as a per-year basis **

In [None]:
print("The average base pay from 2011-2014 was ${}".format(round(sal['BasePay'].mean(), 2)))

In [None]:
sal.groupby('Year').mean()['BasePay']

** Number of unique job titles **

In [None]:
print("There were {} unique job titles in this data set.".format(sal['JobTitle'].nunique()))

** Top 10 most common job titles **

In [None]:
sal['JobTitle'].value_counts()[:10]

** Job Titles represented by only one person in 2013**

In [None]:
len(sal[sal['Year'] == 2013]['JobTitle'].value_counts()[sal[sal['Year'] == 2013]['JobTitle'].value_counts() == 1])

In [None]:
sum(sal[sal['Year'] == 2013]['JobTitle'].value_counts() == 1)

** Checking for correlations **

Title Length vs Base Pay

In [None]:
sal['LenTitle'] = sal['JobTitle'].apply(len)

sal[['LenTitle', 'BasePay']].corr()

Title Length vs Other Pay

In [None]:
sal['LenTitle'] = sal['JobTitle'].apply(len)
sal[['LenTitle', 'OtherPay']].corr()

There is seemingly no correlation.

** Average Police total compensation vs Fire Department total compensation **

In [None]:
police_mean = sal[sal['JobTitle'].str.lower().str.contains('police')]['TotalPayBenefits'].mean()
fire_mean = sal[sal['JobTitle'].str.lower().str.contains('fire')]['TotalPayBenefits'].mean()
print("On average, people whose title includes 'police' make ${:,} in total compensation.".format(round(police_mean,2)))
print("On average, people whose title includes 'fire' make ${:,} in total compensation. \n".format(round(fire_mean,2)))

pct_diff = (fire_mean - police_mean) * 100 / police_mean
print("People whose title includes 'fire' have a {:.2f}% higher total compensation than those whose title includes 'police'.".format(pct_diff))