<a href="https://colab.research.google.com/github/pranavamshu4/Machine-learning/blob/master/Sol1_Linear_Regression_pranav.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Linear Regression with Python Scikit Learn**
In this section we will see how the Python Scikit-Learn library for machine learning can be used to implement regression functions. We will start with simple linear regression involving two variables.

### **Simple Linear Regression**
In this regression task we will predict the percentage of marks that a student is expected to score based upon the number of hours they studied. This is a simple linear regression task as it involves just two variables.

In [1]:
# Importing all libraries required in this notebook
import pandas as pd
import numpy as np  
import matplotlib.pyplot as plt  
%matplotlib inline

In [None]:
# Reading data from remote link
url = "http://bit.ly/w-data"
data = pd.read_csv(url)
print("Data imported successfully")
data.head(20)


In [None]:
# Summary of data
data.describe()

In [None]:
# Plotting the distribution of scores
plt.scatter(data.Hours, data.Scores,  color='red') 
plt.title('Hours vs Percentage')  
plt.xlabel('Hours Studied')  
plt.ylabel('Percentage Score')  
plt.show()

In [5]:
#Dividing the data into "attributes" (inputs) and "labels" (outputs)
X = data.iloc[:, :-1].values  
y = data.iloc[:, 1].values  

In [None]:
print(X)

In [None]:
print(y)

In [8]:
#Splitting the Dataset into Training and Test Set
from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) 


In [None]:
# Training the model
from sklearn.linear_model import LinearRegression  
regressor = LinearRegression()  
regressor.fit(X_train, y_train) 
print("Training complete.")
print('Linear model coeff (w): {}'
     .format(regressor.coef_))
print('Linear model intercept (b): {:.3f}'
     .format(regressor.intercept_))
print('R-squared score (training): {:.3f}'
     .format(regressor.score(X_train, y_train)))
print('R-squared score (test): {:.3f}'
     .format(regressor.score(X_test, y_test)))

In [None]:
# Plotting the Regression Line
plt.scatter(X_train,y_train)
plt.plot(X_train, regressor.predict(X_train),color='red')
plt.title('Hours vs Percentage')
plt.xlabel('Hours Studied')
plt.ylabel('Percentage Scored')
plt.show()

In [None]:
#Making predictions
print(X_test) # Testing data - In Hours
y_pred = regressor.predict(X_test) # Predicting the scores

In [None]:
# Comparing Actual vs Predicted
data = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})  
data 

In [None]:
# You can also test with your own data

hours =np.array([[9.25]])
own_pred = regressor.predict(hours)
print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(own_pred[0]))

In [None]:
#Evaluating the model
from sklearn import metrics  
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

**EXTRA:
Polynomial regression model**

In [19]:
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [None]:
#Training the Polynomial Regression model on the whole dataset
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 4)
X_poly1 = poly_reg.fit_transform(X_train1)
regressor1 = LinearRegression()
regressor1.fit(X_poly1, y_train1)

In [None]:
#Predicting the Test set results
y_pred1 = regressor1.predict(poly_reg.transform(X_test1))
np.set_printoptions(precision=2)
print(np.concatenate((y_pred1.reshape(len(y_pred1),1), y_test1.reshape(len(y_test1),1)),1))

In [None]:
#Visualising the Polynomial Regression results 
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X_train1, y_train1, color = 'red')
plt.plot(X_grid, regressor1.predict(poly_reg.fit_transform(X_grid)), color = 'blue')
plt.title('Hours vs Percentage')
plt.xlabel('Hours Studied')
plt.ylabel('Percentage Scored')
plt.show()

In [None]:
#Predicting a new result with Polynomial Regression
regressor1.predict(poly_reg.fit_transform([[9.25]]))

In [None]:
#Evaluating the Model Performance
from sklearn.metrics import r2_score
r2_score(y_test1, y_pred1)