## Salary Analysis of Seattle's 2018 Wage Data  ##

## Abstract ##

*** Introduction ***
> In this lab, I will be using Seattle's 2018 wage data to determine a few things. I want to find the average salary of all of Seattle's employees, as well as the minimum and maximum wages. I'm specifically curious to see whether the mayor is Seattle's highest paid employee, as well as whether or not the lowest paid employee makes a legal salary. I want to see how much the city that I live in is spending on paying its employees, which is important, because my family along with many others pay taxes. Everyone should have the ability to know where the taxes that they are paying are going.

## Dataset Preparation ##

***Data Location***

> This data was retrieved from Seattle's public database: [Data Link](https://data.seattle.gov/City-Business/City-of-Seattle-Wage-Data/2khk-5ukd)

> The data was not modified from the form stored on the server before importing to Jupyter Notebook


***Accessing the Data in Python***

> To access the data, I had to first to open the file in python. To do that, I opened the file with read permissions, and stored the data in a buffer variable called 'content.' I used the keyword "with" to specify to Python that I wanted the file closed after the code block had been executed.

In [3]:
with open('./City_of_Seattle_Wage_Data.csv', 'r') as data:
    content = data.read()

***Cleaning the Data***

> Cleaning the data was fairly straigtforward. The only aspect that would cause problems during analyses was the fact that some of the job titles had commas in them (this causes problems because usually values are seperated with commas in .csv files), but luckily the City of Seattle had thought of this and surrounded those values with quotation marks.

> To interpret the data, I first split the entire 'content' buffer by quotation marks, which means that the values inside the quotation marks would be stored in every other element in the list. Then, I looped through every other element, and in that element, I replaced it with a version where the commas had been replaced with spaces. I then modified the initial 'content' buffer to be a variable that stores the new values, where each element is joined with a separating comma. Of course, I wanted to ignore the initial value (indicated with '[1:]'), because the first value is the column titles.

In [4]:
# Cleaning data *****
subQuotes = content.split('"')

# Need pointer, because Python for loops don't keep one unless I use range
a = 1
# Run on every other item, starting from index 1
for x in subQuotes[1::2]:
    subQuotes[a] = x.replace(',', ' ')
    a += 2

content = ','.join(subQuotes).replace(',,', ',').splitlines()[1:]
# Done cleaning data *****

***Storing the Data for Analysis***

> To store the column data in separate lists, I first needed to find away to loop through the master 'content' list. It turns out, Python allows for some fancy notation that basically says, "take each element in content and split it by commas, then take a given element of that new list and store it in the new list". That might sound a bit complicated, but all it is really doing is analyzing the given element in 'content', and taking one specific comma-separated value for the new list.

In [5]:
department = [x.split(',')[0] for x in content]
lastName   = [x.split(',')[1] for x in content]
firstName  = [x.split(',')[2] for x in content]
title      = [x.split(',')[3] for x in content]
wage       = [float(x.split(',')[4]) for x in content]
# Wage is being casted as float, so I can do numerical comparisons

## Data Modelling ##

***Finding the Highest-Paid Employee***
> To find the highest paid employee, I found the maximum element in the 'wage' list, then grabbed its index. Since the indexes of all lists coorespond with each other, I could use that index to grab the information of the highest paid employee. Finally, I had to print the information that I grabbed in a nice way.

In [6]:
# Grab pointer of max wage
maxWage = wage.index(max(wage))

print("Maximum Wage: $%.2f/hr" % wage[maxWage])
print(firstName[maxWage], lastName[maxWage], "(%s %s)\n" %(title[maxWage], department[maxWage]))

Maximum Wage: $137.79/hr
Mami Hara (SPU General Mgr&CEO Seattle Public Utilities)



***Finding the Lowest-Paid Employee***
> To find the lowest paid employee, I found the minimum element in the 'wage' list, then grabbed its index. Since the indexes of all lists coorespond with each other, I could use that index to grab the information of the lowest paid employee. Finally, I had to print the information that I grabbed in a nice way.

In [7]:
# Grab pointer of min wage
minWage = wage.index(min(wage))

print("Mininum Wage: $%.2f/hr" % wage[minWage])
print(firstName[minWage], lastName[minWage], "(%s %s)\n" %(title[minWage], department[minWage]))

Mininum Wage: $5.53/hr
Amy Bonfrisco (Civil Svc Commissioner Civil Service Commissions Dept)



***Finding the Average Wage***
> To find the average wage of all of Seattle's employees, I totalled the elements in wage with the sum() method, then divided it by the length of wage (the amount of employees). Since this is the formula for mean, the average salary is returned. I had to cast one of the elements as a float so that Python wouldn't round the answer. Finally, I had to print the information that I grabbed in a nice way.

In [8]:
avgWage = (sum(wage)/float(len(wage)))
# Divided by float-casted wage, because then it will return a float

print("Average wage: $%.2f/hr" % avgWage)

Average wage: $39.59/hr


## Data Analysis & Conclusion ##

*** Conclusion ***
>After analyzing the data, I found that the lowest paid employee is being paid far less than minimum wage, which is certainly shocking. Conversely, the highest paid employee is not the mayor, which certainly raises the question of why they are being paid so much, and how vital their job is to the City of Seattle.

> I also found that the average wage is around 40 dollars/hour, which I am not surprised by. Most government employees are paid less than those of private companies, and an average salary of 40 dollars/hour seems perfectly reasonable. The question that this raises is: how does Seattle compare to other cities in terms of average worker salary?

## Acknowledgements ##

*** Ms. Sconyers: ***
> I would like to thank Ms. Sconyers for suggesting both which dataset I should use and what questions should be asked.