# **Prediction using Supervised ML**

### **Task: Predict the percentage of a student based on the no. of study hours**.

**Libraries Used:** Numpy, Pandas, Matplotlib, Sklearn

● Dataset can be found at http://bit.ly/w-data

● What will be predicted score if a student studies for 9.25 hrs/ day?

**Task Completed for Graduate Rotational Internship Program offered by The Sparks Foundation**

**Task Submitted By - Hardik Jain**

## Importing Libraries and Dataset

In [None]:
# Importing the required librarires

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Loading the database

dataset = pd.read_csv("http://bit.ly/w-data") 

In [None]:
# View dataset 

dataset.head()

In [None]:
# Analyze the dataset by plotting the graph

dataset.plot(x='Hours', y='Scores', style='o')  
plt.title('Hours vs Percentage')  
plt.xlabel('Hours Studied')  
plt.ylabel('Percentage Score')  
plt.show()

**From the above scatterplot, we can clearly see that there is a positive linear relation between the "Number of hours studied" and "Percentage of score"**


## Preparing the data

In [None]:
# We will divide the data into "attributes" (inputs) and "labels" (outputs). 

X = dataset.iloc[:,0].values
y = dataset.iloc[:,-1].values
print(X)
print(y)

# changing the shape for X

X = X.reshape((len(X),1))
print(X)

In [None]:
# Splitting the data into train and test in the ratio of 75% and 25%

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25, random_state=0)
print(X_train)
print(y_train)
print(X_test)
print(y_test)

## Training the Model

In [None]:
# We will be using the Linear Regression model for training the dataset

from sklearn.linear_model import LinearRegression
regress = LinearRegression()
regress.fit(X_train,y_train)
print("Training Completed")

In [None]:
# Plotting the regression line

line = regress.coef_*X+regress.intercept_

# Plotting for the test data
plt.scatter(X, y)
plt.plot(X, line);
plt.show()

## Making Predictions

In [None]:
# Making Predictions

y_pred = regress.predict(X_test)
print(y_pred)

In [None]:
# Comparing Actual vs Predicted Data

df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})  
df 

In [None]:
# Prdicting the score if a student studies for 9.25 hrs/ day

hours = 9.25
x_new = np.array(hours).reshape((-1, 1))
new_pred = regress.predict(x_new)

print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(new_pred[0]))

## Evaluating the model
The final step is to evaluate the performance of algorithm.

In [None]:
from sklearn import metrics  

print('Mean Absolute Error:',metrics.mean_absolute_error(y_test, y_pred)) 