# Prediction Using Supervised ML
### Author : Shubhranshu Shivam

#### In this task, we are going to predict the percentage of marks scored by a student by studying for certain number of hours
##### This is a simple linear regression as it involves just 2 variables 
##### Dataset used - http://bit.ly/w-data


In [None]:
# importing all the required modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# loading dataset from the given URL

student_data = pd.read_csv('https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv')
print("Successfully loaded data...\n")
print(student_data.head())

#### Now, we are creating a 2-D plot of the dataset to see if we can manually find any relationship.

In [None]:
# plotting the dataset

student_data.plot(x='Hours', y='Scores', style='o')
plt.title('Hours vs Percentage')
plt.xlabel('Number of Hours studied')
plt.ylabel('Percentage Scored')
plt.show()

### Preparing the data 
#### The next step involves dividing the data into attributes and labels

In [None]:
# preparing the dataset

X = student_data.iloc[:, :-1].values
y = student_data.iloc[:, 1].values
print(X)
print(y)

#### Now as we have our data divided into attributes and labels. Next step is to split the data into training sets and validation sets. To do this I am using the Scikit Learn package

In [None]:
# importing and splitting of datasets

from sklearn.model_selection import train_test_split

x_train, val_x, y_train, val_y = train_test_split(X, y, random_state=0)

### Training the Algorithm
#### Now as we have divided our dataset, its time to train our algorithm. I am using the ScikitLearn package to do this

In [None]:
# importing and training the linear regression model

from sklearn.linear_model import LinearRegression

student_model = LinearRegression()
student_model.fit(x_train, y_train)

In [None]:
# Plotting the regression line

line = student_model.coef_*X + student_model.intercept_

# Plotting the test data

plt.scatter(X,y)
plt.plot(X, line, color='red')
plt.show()

###  Making Predictions

#### Now I am goint to predict the values based on the model trained

In [None]:
# predicting the values

print(val_x)
y_pred = student_model.predict(val_x)
print(y_pred)

In [None]:
# comparing actual vs. predicted

df = pd.DataFrame({'Original': val_y, 'Predicted':y_pred})
df

In [None]:
# testing with our own data. 
# 9.25 hours in this case as it is given in the task.
hrs = [[9.25]]
user_pred = student_model.predict(hrs)
print("Number of hours=",hrs)
print("Percentage Scored=",user_pred)

### Evaluating the Model

#### Calculating the Mean Absolute Error to determine the performance of the model.


In [None]:
# importing the mean_absolute_error function from Scikit Learn package and evaluating the performance

from sklearn.metrics import mean_absolute_error
print("The MAE is",mean_absolute_error(val_y,y_pred))