# Prediction using Supervised Machine Learning
The task is to percentage of marks that a student is expected to score based upon the number of hours they studied. By objective we see it is a linear regression task as it involves just two variables.

The tools you can find - Numpy Array, Pandas, Matplotlib, Scikit Learn

**Lets begin**

In [None]:
# Importing the required libraries. 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# First: Reading the data from source

In [None]:
dataset= pd.read_csv("../input/student-study-hour-v2/Student Study Hour V2.csv")
print("Dataset imported successfully.")

In [None]:
# Exploring the dataset.

dataset.shape

In [None]:
dataset.head()

In [None]:
dataset.describe()

# Second: Input data visualisation

In [None]:
# Plotting our data points on 2-D graph.

dataset.plot(x='Hours', y='Scores', style='o')
plt.title("Hours vs Percentage")
plt.xlabel("Hours Studied")
plt.ylabel("Percentage Score")
plt.show()

From the graph above, we can clearly see that there is a positive linear relation between the number of hours studied and percentage of score.

# Third: Data preprocessing

This step deals with the division of the data into "attributes" and "labels".

In [None]:
# Preparing the data.

x= dataset.iloc[:,:-1].values
y= dataset.iloc[:,1].values

# Fourth: Model Training

This step the splitting of the data into training and test sets using Scikit Learn's built-in train_test_split() method and training the algorithm.

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

In [None]:
# Training the algorithm.

from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)

print("Training complete.")

# Fifth: Plotting the line of regression

In [None]:
# To retrieve the intercept and slope.

print(regressor.intercept_)
print(regressor.coef_)

In [None]:
# Plotting the line of regression.

line= regressor.coef_*x+regressor.intercept_

#plotting the test data

plt.scatter(x, y)
plt.plot(x, line)
plt.show()

# Sixth: Making predictions

We have now trained our algorithm and it's time to make some predictions. To do so, we will use our test data.

In [None]:
# Making predictions.

print(x_test)
y_pred= regressor.predict(x_test)

# Seventh: Comparing actual result with the predicted model result

In [None]:
# Comparing actual values with predicted.

df= pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df

In [None]:
# What will be the predicted score if a student studies for 9.25 hrs/ day?

hours = 9.25
test= np.array([hours])
test= test.reshape(-1,1)
own_pred = regressor.predict(test)
print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(own_pred[0]))

# Eighth: Evaluation of algorithm

In [None]:
# Evaluating the algorithm.

from sklearn import metrics
print("Mean Absolute Error= ", metrics.mean_absolute_error(y_test, y_pred))
print("Mean Squared Error= ", metrics.mean_squared_error(y_test, y_pred))
print("Root Mean Squared Error= ", np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

> ## **Let me know by e-mail if I did any mistake and leave an upvote if you like it**

## Thank you