# Author : Akash Kothare

Data Science & Business Analytics Intern (Batch - Dec'20)

## Task 1: Prediction using Supervised ML


In this task, we have to develop a model to predict the percentage of marks a student is expected to score based on the number of hours he/she studied.
This is implemented using Simple Linear Regression involving 2 variables.

### Importing Libraries

In [None]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

### Loading Dataset

In [None]:
df = pd.read_csv('../input/tsf-datasets/student_scores.csv')

In [None]:
df.head()

In [None]:
df.describe()

### Checking Null Values

In [None]:
df.isnull().sum()

No Null values found, so no need to clean this data.

### Plotting the distribution of scores

In [None]:
plt.scatter(df['Hours'], df['Scores'], color = 'red')
plt.title('Hours vs Percentage(%)')
plt.xlabel('Hours Studied')
plt.ylabel("Percentage Score(%)")
plt.show()

### Preparing the data

In [None]:
x = df.iloc[:, :-1].values
y = df.iloc[:, 1].values

### Splitting the data into Training and Testing Sets

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)

### Training the model

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
model = LinearRegression()
model.fit(x_train, y_train)
print("Model Trained!")

### Plotting the Regression Line

In [None]:
line = model.coef_*x + model.intercept_
plt.scatter(x, y, color = 'red')
plt.plot(x, line)
plt.show()

### Making Predictions

In [None]:
print(x_test)
y_pred = model.predict(x_test)

### Comparing Actual vs Predicted

In [None]:
df1 = pd.DataFrame({'Actual' : y_test, 'Predicted' : y_pred})
df1

### Testing with custom data 

In [None]:
hrs = [[2.9]]
predict = model.predict(hrs)
print("No. of Hours = {}".format(hrs))
print("Predicted Score = {}".format(predict[0]))

### Evaluating the Model

In [None]:
from sklearn import metrics
print('Mean Absolute Error(MAE) :', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error(MSE) :', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error(RMSE) :', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))