# LINEAR REGRESSION WITH PYTHON SCIKIT-LEARN

Simple linear regression is a prediction when a variable (y) is dependent on a second variable (x) based on the regression equation of a given set of data.

# THE SPARKS FOUNDATION

# TASK 1: PREDICTION USING SUPERVISED ML

In this regression task, we will predict the percentage of marks that a student is expected to score based upon the number of hours they studied. This is a simple linear regression task as there are just two variables. 

In [None]:
#IMPORTING ALL THE NECESSARY LIBRARIES
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [None]:
#LOADING THE DATA
data = pd.read_csv('https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv')

In [None]:
#PRINTING THE FIRST FEW ENTRIES OF THE DATA
data.head()


# Visualizing the dataset using Matplotlib


In [None]:
#VISUALIZING THE DATA
plt.scatter(x = data['Hours'],y = data['Scores'])
plt.xlabel('Hours studied', fontsize=15)
plt.ylabel('Percentage score obtained', fontsize=15)
plt.title('Percentage score vs Hours studied', fontsize=15)
plt.show()

# Dividing the dataset into features and targets


In [None]:
#ASSIGNING VALUES TO INPUTS AND TARGETS
y = data['Scores']
x = data['Hours']

In [None]:
#RESHAPE THE VALUES OF x
x_matrix = x.values.reshape(-1,1)

In [None]:
#SPLITTING THE DATASET INTO TRAINING AND TESTING SETS
x_train, x_test, y_train,  y_test = train_test_split(x_matrix,y, test_size=0.2, random_state=2021)

In [None]:
#PRINTING THE DIMENSIONS OF THE TRAINING AND TESTING SETS
print("No. of training features:", x_train.shape) # X_train
print("No. of testing features:",x_test.shape) # X_test
print("No. of training targets :",y_train.shape) # y_train
print("No. of testing targets:",y_test.shape) # y_test


# Training the Algorithm

In [None]:
#FITTING THE MODEL USING THE TRAINING DATASETS
reg = LinearRegression()
reg.fit(x_train,y_train)

In [None]:
Intercept=reg.intercept_
Coefficient=reg.coef_

In [None]:
#PRINTING THE INTERCEPT AND COEFFICIENT VALUES
print('The intercept value for this Linear regression is '  +  str(Intercept))
print('The coefficient value for this Linear regression is '  +  str(Coefficient))

In [None]:
#PLOTTING THE REGRESSION LINE
plt.scatter(x,y)
RegressionLine = x*reg.coef_+reg.intercept_
fig = plt.plot(x,RegressionLine, lw=4, c='orange', label ='regression line')
plt.xlabel('Hours Studied', fontsize = 15)
plt.ylabel('Percentage score obtained', fontsize = 15)
plt.show()

# Making Predictions

In [None]:
#PREDICTING THE TARGET VALUES USING THE TESTING SET
y_predicted = reg.predict(x_test)

In [None]:
#CREATING A DATAFRAME AND DISPLAYING THE ACTUAL AND THE PREDICTED VALUES OF THE TEST DATASET
test_data = pd.DataFrame({'Actual Marks': y_test,'Predicted Marks': y_predicted})
test_data

# Checking how accurate the model is

In [None]:
#CHECKING THE ACCURACY OF THE MODEL
accuracy = reg.score(x_matrix,y)*100
print('The accuracy of this model is', accuracy, '%')


# What will be the predicted score if a student studies for 9.25 hours per day?

In [None]:
#PREDICTING THE SCORE OF A STUDENT WHO STUDIED FOR 9.25 HOURS
Hours = [9.25]
predicted_score = reg.predict([Hours])
print('The predicted score is', predicted_score)


# Calculating the mean absolute error 


In [None]:
#CALCULATING THE MEAN ABSOLUTE ERROR
print('The error in the model is', metrics.mean_absolute_error(y_test,y_predicted))