# Salary Prediction Using Linear Regression

`Author:` [Syed Muhammad Ebad](https://www.kaggle.com/syedmuhammadebad)\
`Date:` 28-Sept-2024\
[Send me an email](mailto:mohammadebad1@hotmail.com)\
[Visit my GitHub profile](https://github.com/smebad)

## Introduction
In this notebook, we will explore a simple linear regression model for predicting salaries based on years of experience. The dataset consists of two features: `YearsExperience` and `Salary`. This will demonstrate how to use machine learning techniques, specifically linear regression, to create a predictive model.

We will go through the following steps:
1. Importing the necessary libraries and loading the dataset.
2. Exploring and visualizing the data.
3. Training the linear regression model.
4. Making predictions based on user input.
5. Summarizing what we learned.

In [1]:
# importing the necessary libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings("ignore")

### Dataset Overview
The dataset used in this project contains information on employees' years of experience and their corresponding salaries. There are 30 entries, each representing one employee.

In [2]:
df = pd.read_csv("Salary_Data.csv")
df.head()

Unnamed: 0,YearsExperience,Salary
0,1.1,39343.0
1,1.3,46205.0
2,1.5,37731.0
3,2.0,43525.0
4,2.2,39891.0


### Checking for missing values in the dataset:

In [3]:
# checking out the null values
df.isnull().sum()

YearsExperience    0
Salary             0
dtype: int64

### Data Visualization
We will now visualize the relationship between `YearsExperience` and `Salary` to check if a linear pattern exists.

In [4]:
# Plotting the data
fig = px.scatter(df, x="YearsExperience", y="Salary", trendline="ols")
fig.show()

### Model Training
Next, we will train a linear regression model to predict salaries based on the years of experience. We will split the data into training and testing sets to ensure the model generalizes well.

In [5]:
# Training a linear regression model
x = df[["YearsExperience"]]
y = df[["Salary"]]

# # Train-test split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [6]:
# model training
model = LinearRegression()
model.fit(x_train, y_train)

### Salary Prediction Function

We define a custom function `predict_salary()` to take the number of years of experience as input and return the corresponding predicted salary. This function uses the trained linear regression model to make predictions.

In [8]:
def predict_salary(years_experience):
    features = np.array([[years_experience]])
    predicted_salary = model.predict(features)
    return predicted_salary[0]

### Making Predictions
Now that the model is trained, we can make predictions. For testing purposes, we'll predict the salary for an employee with 5 years of experience.

In [19]:
prediction = 5  # Replace with any number of years for quick testing
predicted_salary = predict_salary(prediction)
print(f"Predicted salary: {predicted_salary}")

Predicted salary: [72440.65962693]


### Summary
In this project, we have built a simple linear regression model to predict salaries based on years of experience. We started by visualizing the data, which indicated a clear linear relationship. We then trained the model using this data and made predictions for new values of years of experience.