### DataSet

1. **work_year**: The year the salary was paid.
2. **experience_level**: The experience level in the job during the year with the following possible values:

* EN = Entry-level/Junior
* MI = Mid-level/Intermediate
* SE = Senior-level/Expert
* EX = Executive-level/Director
3. **employment_type**: The type of employement for the role:

* PT = Part-time
* FT = Full-time
* CT = Contract
* FL = Freelance
4. **job_title**: The role worked in during the year.
5. **salary**: The total gross salary amount paid.
6. **salary_currency**: The currency of the salary paid as an ISO 4217 currency code.
7. **salaryinusd**: The salary in USD
8. **employee_residence**: Employee's primary country of residence in during the work year as an ISO 3166 country code.
9. **remote_ratio**: The overall amount of work done remotely, possible values are as follows:

* 0 = No remote work
* 50 = Partially remote
* 100 = Fully remote
10. **company_location**: The country of the employer's main office or contracting branch as an ISO 3166 country code.

11. **company_size**: The average number of people that worked for the company during the year:

* S = less than 50 employees (small)
* M = 50 to 250 employees (medium)
* L = more than 250 employees (large)





### Objectives

The purpose of this EDA is to analyze data scientist salaries in America by experience and employment type.

### Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import datetime as dt
import plotly.graph_objects as go


### Data Visualization

In [None]:
data = pd.read_csv('../input/data-science-job-salaries/ds_salaries.csv')

In [None]:
data.info()

### Data Cleaning

In [None]:
# Droping the useless column 'Unnamed: 0'
data = data.drop('Unnamed: 0',axis=1)

In [None]:
# Replacing column names
data['experience_level'].replace({'EN':'Entry-Level','MI':'Mid-Level','EX':'Executive Level','SE':'Senior'},inplace=True)
data['employment_type'].replace({'PT':'Part-Time','FT':'Full-Time','CT':'Contract','FL':'Freelance'},inplace=True)

In [None]:
#Checking for null values
data.isnull().sum()

### Exploratory Data Analysis

In [None]:
data.head()

In [None]:
data.describe()

In [None]:
# Mean salary in USD grouped by job title
z = data.groupby('job_title', as_index=False)[['salary_in_usd']].mean().rename({'salary_in_usd' : 'mean_salary_in_usd'}, axis=1).sort_values(by='mean_salary_in_usd',ascending=False)
print(z)

In [None]:
z['mean_salary_in_usd']=round(z['mean_salary_in_usd'],2)
fig=px.bar(z.head(10),x='job_title',y='mean_salary_in_usd',color='job_title',
           labels={'job_title':'job title','mean_salary_in_usd':'mean salary in usd'},text='mean_salary_in_usd',template='seaborn',title='<b> Top 10 Roles in Data Science based on Average Pay')
fig.show()

In [None]:
# Mean salary in USD grouped by job title and experience level
z = data.groupby(['job_title', 'experience_level'], as_index=False)[['salary_in_usd']].mean().rename({'salary_in_usd' : 'mean_salary_in_usd'}, axis=1).sort_values(by='mean_salary_in_usd',ascending=False)
print(z)

In [None]:
z['mean_salary_in_usd']=round(z['mean_salary_in_usd'],2)
z['job-experience'] = z['job_title'].map(str) + ' - ' + z['experience_level'].map(str)
fig=px.bar(z.head(10),x='job-experience',y='mean_salary_in_usd',color='job_title',
           labels={'job-experience':'job title - experience level','mean_salary_in_usd':'mean salary in usd'},text='mean_salary_in_usd',template='seaborn',title='<b> Top 10 average salary in USD grouped by job title and experience level')
fig.show()

In [None]:
# Max salary in USD grouped by job title, experience level and employment type
z = data.groupby(['job_title', 'experience_level', 'employment_type'], as_index=False)[['salary_in_usd']].max().rename({'salary_in_usd' : 'max_salary_in_usd'}, axis=1).sort_values(by='max_salary_in_usd',ascending=False)
print(z)

In [None]:
z['max_salary_in_usd']=round(z['max_salary_in_usd'],2)
z['job-experience-employment'] = z['job_title'].map(str) + ' - ' + z['experience_level'].map(str) + ' - ' + z['employment_type'].map(str)
fig=px.bar(z.head(10),x='job-experience-employment',y='max_salary_in_usd',color='job_title',
           labels={'job-experience-employment':'job title - experience level - employment type','max_salary_in_usd':'max salary in usd'},text='max_salary_in_usd',template='seaborn',title='<b> Top 10 salarys in USD grouped by job title, experience level and employment type')
fig.show()

In [None]:
# Count machine learning scientist jobs grouped by experience level and employment type
z = data.groupby(['job_title', 'experience_level', 'employment_type'], as_index=False)[['salary']].count().rename({'salary' : 'count'}, axis=1).sort_values(by='count',ascending=False)
z = z.loc[z['job_title'] == "Machine Learning Scientist"]
z['experience-employment'] = z['experience_level'].map(str) + ': ' + z['employment_type'].map(str)
print(z)

In [None]:
fig=px.pie(z ,names='experience-employment',values='count',color='experience-employment',hole=0.7,labels={'experience-employment':'Experience level','count':'count'}
,template='seaborn',title='<b> Total Machine Learning Scientist Jobs Based on Experience Level and Employment Type')
fig.update_layout(title_x=0.5)

In [None]:
px.histogram(data,x='salary_in_usd',marginal='rug',template='seaborn',labels={'salary_in_usd':'Salary in USD'},title='<b> Salary Distribution')

In [None]:
px.box(data.loc[data['job_title'] == "Machine Learning Scientist"],x='experience_level',y='salary_in_usd',color='experience_level',template='ggplot2'
,labels={'experience_level':'Experience Level','salary_in_usd':'salary in usd'},title='<b>Machine Learning Scientist Salaries by Experience')