<a href="https://colab.research.google.com/github/itinasharma/MachineLearning/blob/main/SalaryPrediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predicting Salary Based on Years of Experience

In this section, we will use a simple linear regression model to predict the salary based on the years of experience. The dataset contains two columns: **YearsExperience** and **Salary**.

### Steps:
1. **Import Necessary Libraries**: We will import the required libraries for data manipulation and modeling.
2. **Load the Dataset**: The salary dataset will be loaded from a CSV file.
3. **Preprocess the Data**: We will prepare the data for training by separating features (X) and target variable (Y).
4. **Train the Model**: A linear regression model will be fitted using the training data.
5. **Make Predictions**: We will input new values (e.g., years of experience) to predict corresponding salaries.




In [10]:
# 1. Import Necessary Libraries
import pandas as pd


# 2. Load the Data
file_path = '/content/sample_data/Salary_dataset.csv'
# Define the column names
columns = [
    'YearsExperience',      # Number of years of professional experience
    'Salary'                # Corresponding annual salary
]
df = pd.read_csv(file_path)

# If the file doesn't have headers, you can manually assign the column names:
# df = pd.read_csv(file_path, header=None, names=columns)

# Display the first few rows of the DataFrame
print(df.head())

   Unnamed: 0  YearsExperience   Salary
0           0              1.2  39344.0
1           1              1.4  46206.0
2           2              1.6  37732.0
3           3              2.1  43526.0
4           4              2.3  39892.0


In [22]:
from sklearn.preprocessing import StandardScaler
# Initialize the scaler
scaler = StandardScaler()

# Normalize the relevant columns
normalized_data = scaler.fit_transform(df[['YearsExperience', 'Salary']])

# Create a new DataFrame with normalized data
normalized_df = pd.DataFrame(normalized_data, columns=['YearsExperience', 'Salary'])

# Display the normalized DataFrame
print("\nNormalized Data:")
print(normalized_df)


Normalized Data:
    YearsExperience    Salary
0         -1.510053 -1.360113
1         -1.438373 -1.105527
2         -1.366693 -1.419919
3         -1.187494 -1.204957
4         -1.115814 -1.339781
5         -0.864935 -0.718307
6         -0.829096 -0.588158
7         -0.757416 -0.799817
8         -0.757416 -0.428810
9         -0.578216 -0.698013
10        -0.506537 -0.474333
11        -0.470697 -0.749769
12        -0.470697 -0.706620
13        -0.434857 -0.702020
14        -0.291498 -0.552504
15        -0.148138 -0.299217
16        -0.076458 -0.370043
17        -0.004779  0.262859
18         0.210261  0.198860
19         0.246100  0.665476
20         0.532819  0.583780
21         0.640339  0.826233
22         0.927058  0.938611
23         1.034577  1.402741
24         1.213777  1.240203
25         1.321296  1.097402
26         1.500496  1.519868
27         1.536336  1.359074
28         1.787215  1.721028
29         1.858894  1.701773


In [17]:
X = df['YearsExperience']  # Features
Y = df['Salary']

In [19]:
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import numpy as np

model = LinearRegression()
# Reshape X to be a 2D array
X = df['YearsExperience'].values.reshape(-1, 1)
model.fit(X, Y)
print ("c ", model.intercept_)
print ("m ",model.coef_)

c  24848.203966523193
m  [9449.96232146]


In [21]:
y_pred = model.predict([[2]])
print(y_pred)

[43748.12860943]
