# Daily Challenge 
Your Task

    - Download and import the Data Science Job Salary dataset.
    - Normalize the ‘salary’ column using Min-Max normalization which scales all salary values between 0 and 1.
    - Implement dimensionality reduction like Principal Component Analysis (PCA) or t-SNE to reduce the number of features  (columns) in the dataset.
    - Group the dataset by the ‘experience_level’ column and calculate the average and median salary for each experience level (e.g., Junior, Mid-level, Senior).

In [1]:
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
import numpy as np

dataset = pd.read_csv(r"datascience_salaries.csv")
print(len(dataset))

1171


In [2]:
dataset.head()

Unnamed: 0.1,Unnamed: 0,job_title,job_type,experience_level,location,salary_currency,salary
0,0,Data scientist,Full Time,Senior,New York City,USD,149000
1,2,Data scientist,Full Time,Senior,Boston,USD,120000
2,3,Data scientist,Full Time,Senior,London,USD,68000
3,4,Data scientist,Full Time,Senior,Boston,USD,120000
4,5,Data scientist,Full Time,Senior,New York City,USD,149000


In [3]:
scaler = MinMaxScaler()
dataset['salary'] = scaler.fit_transform(dataset[['salary']])

In [4]:
pca = PCA(n_components=2)
pca_components = pca.fit_transform(dataset.select_dtypes(include=['float64', 'int64']))

dataset["PCA1"] = pca_components[:,0]
dataset["PCA2"] = pca_components[:,1]
dataset.head()

Unnamed: 0.1,Unnamed: 0,job_title,job_type,experience_level,location,salary_currency,salary,PCA1,PCA2
0,0,Data scientist,Full Time,Senior,New York City,USD,0.60101,-931.620836,0.427553
1,2,Data scientist,Full Time,Senior,Boston,USD,0.454545,-929.620836,0.281083
2,3,Data scientist,Full Time,Senior,London,USD,0.191919,-928.620837,0.018454
3,4,Data scientist,Full Time,Senior,Boston,USD,0.454545,-927.620836,0.281077
4,5,Data scientist,Full Time,Senior,New York City,USD,0.60101,-926.620836,0.427539


In [6]:
salary_grouped = dataset.groupby('experience_level')['salary'].agg(['mean', 'median'])
salary_grouped.rename(columns= {"mean": "Average Salary", "median": "Median Salary"}, inplace = True)
salary_grouped.head()

Unnamed: 0_level_0,Average Salary,Median Salary
experience_level,Unnamed: 1_level_1,Unnamed: 2_level_1
Entry,0.030864,0.0
Executive,0.232712,0.080808
Mid,0.110035,0.106061
Senior,0.227717,0.191919
