## **Exploration of San Francisco Salaries Dataset**
Following are some questions that this notebook address related to the data:
1. What is Average BasePay?
2. What is the highest amount of Overtime Pay?
3. What is the name of highest paid person (including benefits)?
4. What is the name of lowest paid person (including benefits)?
5. What was the average (mean) BasePay of all employees per year? (2011-2014)?
6. What are the top 5 most common jobs?
7. How many people have the word Manager in their job title?
8. Is there a correlation between length of the Job Title string and Salary?




In [None]:
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
sal = pd.read_csv("/kaggle/input/sf-salaries/Salaries.csv")

In [None]:
sal.head()

In [None]:
sal.info()

In [None]:
def str_to_numeric(x,col):
    '''
    Converting string object to numeric 
    wrt to column "col"
    '''
    if (x=="Not Provided"):
        return
    else:
        if col in ['Id','Year']:
            return int(x)
        else:
            return float(x)

In [None]:
for col in sal.columns:
    try:
        sal[col]= sal[col].apply(lambda x:str_to_numeric(x,col))
    except:
        pass

In [None]:
sal.info()

#### **Average BasePay**

In [None]:
sal['BasePay'].mean()

#### **Highest amount of OvertimePay**

In [None]:
sal["OvertimePay"].max()

#### **What is the name of highest paid person (including benefits)?**

In [None]:
highest_pay = sal["TotalPayBenefits"].max()
sal[sal["TotalPayBenefits"]==highest_pay][["EmployeeName","TotalPayBenefits"]]

 #### **What is the name of lowest paid person (including benefits)?**

In [None]:
lowest_pay = sal["TotalPayBenefits"].min()
sal[sal["TotalPayBenefits"]==lowest_pay][["EmployeeName","TotalPayBenefits"]]

#### **What was the average (mean) BasePay of all employees per year? (2011-2014) ?**

In [None]:
sal.groupby(["Year"]).mean()["BasePay"]

#### **What are the top 5 most common jobs?**

In [None]:
sal['JobTitle'].value_counts().head(5)

#### **How many people have the word Manager in their job title?**

In [None]:
def manager_string(title):
    if 'manager' in title.lower():
        return True
    else:
        return False

In [None]:
sum(sal['JobTitle'].apply(lambda x: manager_string(x)))

#### **Is there a correlation between length of the Job Title string and Salary?**

In [None]:
sal['title_len'] = sal['JobTitle'].apply(len)

In [None]:
sal[['title_len','TotalPayBenefits']].corr()