# Review of the Notebook:

Data scientists' pay scales can range substantially. Entry-level professionals with a few years of experience can earn between $60,000  and  $90,000 annually, while mid-level professionals can make between $90,000 and $130,000. Senior data scientists, or those with substantial knowledge and experience, can make between $130,000 and well over $200,000 annually.
Location: The salary of data scientists is significantly influenced by the location. Due to the high cost of living and competitiveness, major tech hubs and places with a high need for data scientists, such as San Francisco, New York City, Seattle, and Boston, typically provide higher compensation. In smaller towns or locations where there is less need for data scientists, salaries could be lower. Industry Many different businesses, including technology, banking, healthcare, e-commerce, consulting, and more, employ data scientists. Salary levels might differ by industry. For instance, data scientists in the banking and IT industries frequently earn more money than their counterparts in governmental or non-profit institutions.
Skills and Experience: Data scientists who possess specialized knowledge and high-level degrees, such as a Ph.D. in an associated discipline, typically earn greater compensation. Salary levels can also be impacted by expertise in statistical analysis, machine learning, and data visualization as well as programming languages like Python and R. Advantages and Perks Data scientists frequently earn extra compensation on top of their base income, such as health insurance, retirement plans, bonuses, stock options, flexible work schedules, and professional development opportunities. Larger organizations or IT behemoths might provide more comprehensive benefit packages. Supply and the Job Market Due to the growing reliance on data-driven decision-making across industries, data scientists are in great demand. Data scientists have a very competitive job market.

It's crucial to remember that the information above is only a broad outline, and that actual pay may differ depending on a person's unique situation and the requirements of each work opportunity. In order to obtain a more precise grasp of data scientist compensation, it is always advisable to investigate the most recent income trends in your particular location and industry.

# Data Dictionary

<div style = "color: Black; display: fill;
              border-radius: 5px;
              background-color: #F9D371;
              font-size: 100%;
              font-family: Verdana">
    
<p style = "padding: 7px; color: Black;">
    <ul> 📌 <b>Work_year</b> - The number of years of work experience in the field of data science<br>
         📌 <b>Experience_level</b> - The level of experience, such as Junior, Senior, or Lead<br>
         📌 <b>Employment_type</b> - The type of employment, such as Full-time or Contract<br>
         📌 <b>Job_title</b> - The specific job title or role, such as Data Analyst or Data Scientist<br>
         📌 <b>Salary</b> - The salary amount for the given job<br>
         📌 <b>Salary_currency</b> - The currency in which the salary is denoted<br>
        📌 <b>Salary_in_usd</b> - The equivalent salary amount converted to US dollars (USD) for comparison purposes<br><br>
         📌 <b>Employee_residence</b> - The country or region where the employee  resides<br>
        📌 <b>Remote_ratio</b> - The percentage of remote work offered in the job<br>
        📌 <b>Company_location</b> - The location of the company or organization<br>
        📌 <b>Company_location</b> - The location of the company or organization<br>
    <p style = "padding: 3px; color: Black;">

# Importing Necessary Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image
import plotly.express as px
from IPython.display import Image

# Basic statistics and Data Reading

In [None]:
DS_file_path = ("/kaggle/input/data-science-salaries-2023/ds_salaries.csv")
Data_science =  pd.read_csv(DS_file_path)

In [None]:
Data_science

# Exploratory Data Analysis (EDA)

In [None]:
Data_science.head()

In [None]:
Data_science.describe()

In [None]:
Data_science.columns

In [None]:
Data_science.info()

In [None]:
dtypes = pd.DataFrame(Data_science.dtypes, columns = ["DataTypes"])
dtypes

In [None]:
print("Shape of the Dataset is {} Rows and {} Columns." .format(len(Data_science), len(Data_science.columns)))


# Checking if there is any Duplicate values present in this Dataset

In [None]:
Data_science.duplicated().sum()

# Checking the number of unique values of different features

In [None]:
Data_science["salary"]. nunique()

In [None]:
Data_science["work_year"]. nunique()

In [None]:
Data_science["salary_in_usd"]. nunique()

In [None]:
Data_science["remote_ratio"]. nunique()

# Dataset Check Through Heatmap

In [None]:
plt.figure(figsize = (20, 5))
sns.heatmap(Data_science.isnull());

# Checking Mean, Median, Maximum and Minimum salary

In [None]:
print("Mean salary:", round(Data_science["salary"].mean()))
print("Median salary:", round(Data_science["salary"].median()))
print("Highest salary:", round(Data_science["salary"].max()))
print("Lowest salary:", round(Data_science["salary"].min()))

# Lowest 10 salaries present in the dataset

In [None]:
Data_science["salary"].sort_values()[:10]

# Higest 10 salaries present in the dataset

In [None]:
Data_science["salary"].sort_values(ascending = False)[:10]

# BARPLOT FOR NUMBER OF DIFFERENT COMPANY salary:

In [None]:
plt.figure(figsize = (20, 5))
ax = Data_science["salary"].value_counts()[:25].plot(kind = 'bar',
                                              color = "crimson")

for p in ax.patches:
    ax.annotate(int(p.get_height()), (p.get_x() + 0.25, p.get_height() + 1), ha = 'center', va = 'bottom', color = 'black')

# Show company_size in dataset through displot function (with Kde)

In [None]:
sns.displot(Data_science['company_size'],kde=True)
plt.show()

# Show remote_ratio in dataset through displot function (without Kde)

In [None]:
sns.displot(Data_science['remote_ratio'],color="r")
plt.show()

# Show remote_ratio in dataset through displot function (with Kde)

In [None]:
sns.displot(Data_science['remote_ratio'],kde=True)
plt.show()

# Show scatterplot company_experience and size wise

In [None]:
sns.scatterplot(x="experience_level",
                y="remote_ratio",data=Data_science,hue="company_size",style="company_size",size="company_size",sizes=(60,40))
plt.show()

# Show Multiplot (Company size and Experience) with FacetGrid Function

In [None]:
fg = sns.FacetGrid(Data_science,col="company_size")
fg.map(plt.scatter,"company_size","experience_level").add_legend()
plt.show()