
# Exploratory Data Analysis: Data Science Salaries

This notebook provides an exploratory data analysis of the data science job market. Using the cleaned dataset, we analyze key metrics such as salaries, experience levels, job locations, and trends over time.


In [None]:

# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the cleaned dataset
data_path = 'data/ds_salaries_cleaned.csv'
df = pd.read_csv(data_path)

# Display dataset information
df.info()

# Display the first few rows
df.head()


In [None]:

# Descriptive statistics
df.describe()

# Top 10 employee residences
top_residences = df['employee_residence'].value_counts().head(10)
print("Top 10 Employee Residences:")
print(top_residences)

# Salary trends over years
salary_trends = df.groupby('work_year')['salary_in_usd'].mean()
print("Average Salary Trends Over Years:")
print(salary_trends)


In [None]:

# Visualization 1: Average Salary by Experience Level and Employment Type
plt.figure(figsize=(10, 6))
sns.scatterplot(
    x="experience_level",
    y="salary_in_usd",
    hue="employment_type",
    data=df,
    s=100,
    palette="muted"
)
plt.title("Average Salary by Experience Level and Employment Type")
plt.xlabel("Experience Level")
plt.ylabel("Salary in USD")
plt.legend(title="Employment Type", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
plt.show()

# Visualization 2: Top 10 Employee Residences
plt.figure(figsize=(10, 6))
sns.barplot(
    x=top_residences.values, 
    y=top_residences.index, 
    palette="coolwarm"
)
plt.title("Top 10 Employee Residences")
plt.xlabel("Number of Employees")
plt.ylabel("Country")
plt.tight_layout()
plt.show()

# Visualization 3: Salary Trends Over Years
salary_trends_df = salary_trends.reset_index()
plt.figure(figsize=(10, 6))
sns.lineplot(
    x="work_year", 
    y="salary_in_usd", 
    data=salary_trends_df, 
    marker="o", 
    color="b"
)
plt.title("Salary Trends Over Years")
plt.xlabel("Year")
plt.ylabel("Average Salary in USD")
plt.tight_layout()
plt.show()



## Conclusions

1. **Top Residences**: The majority of employees are based in the United States, followed by countries like Great Britain and India.
2. **Salary Trends**: Salaries have shown a steady increase over the years.
3. **Experience and Employment Types**: Senior professionals and full-time employees tend to earn the highest salaries.

This analysis provides key insights into the data science job market.
