## Using scikit learn for Averaging Ensemble learning 

`Ensemble averaging` is the process of creating **multiple models** and combining them to produce a desired output, as opposed to creating just one model. 

Frequently an ensemble of models performs better than any individual model, because the various errors of the models "average out."

* Simple and efficient tools for data mining and data analysis
* Accessible to everybody, and reusable in various contexts
* Built on NumPy, SciPy, and matplotlib
* Open source, commercially usable - BSD license

![](https://lh3.googleusercontent.com/-bLQJoRT38AU/Xw5y0s5qnSI/AAAAAAAApEg/1pu0ZlclFDIPcBfQL-BCxjKYS6CzxOFnACK8BGAsYHg/s0/2020-07-14.png)

### Predicting weight using linear regression on dirrerent data sizes 

### Import the height-weight data csv
[Data Sets](https://github.com/reddyprasade/Machine-Learning-Problems-DataSets/blob/master/Ensemble/height-weight.csv)

In [None]:
import numpy
filename = "height-weight.csv"
raw_data = open(filename, 'rt')
data = numpy.loadtxt(raw_data, delimiter=",")

In [5]:
import pandas_profiling 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [6]:
path = "https://raw.githubusercontent.com/reddyprasade/Machine-Learning-Problems-DataSets/master/Ensemble/height-weight.csv"

In [7]:
df = pd.read_csv(path,header=None)
df.columns = ["Height",'Weight']

In [8]:
df.head()

Unnamed: 0,Height,Weight
0,73.847017,241.893563
1,68.781904,162.310473
2,74.110105,212.740856
3,71.730978,220.04247
4,69.881796,206.349801


In [9]:
df.tail()

Unnamed: 0,Height,Weight
4995,68.860062,177.131052
4996,68.973423,159.285228
4997,67.013795,199.1954
4998,71.557718,185.90591
4999,70.35188,198.903012


In [None]:
pandas_profiling.ProfileReport(df)

In [None]:
# We check the size of the data
data.shape

In [10]:
# We check the first 10 rows
data[:10]

array([[ 73.84701702, 241.8935632 ],
       [ 68.78190405, 162.3104725 ],
       [ 74.11010539, 212.7408556 ],
       [ 71.7309784 , 220.0424703 ],
       [ 69.88179586, 206.3498006 ],
       [ 67.25301569, 152.2121558 ],
       [ 68.78508125, 183.9278886 ],
       [ 68.34851551, 167.9711105 ],
       [ 67.01894966, 175.9294404 ],
       [ 63.45649398, 156.3996764 ]])

In [11]:
#We import linear regression model from sklearn
from sklearn import linear_model
from sklearn.metrics import mean_squared_error

In [None]:
#We separate out the independent variable height into X 
#and dependent variable weight into y
X=data[:,0]
y=data[:,1]

In [None]:
# Split the data into training/testing sets
X_train1=X[:2500]
X_train2=X[2500:4500]
X_test=X[4500:]

In [None]:
# Split the targets into training/testing sets
y_train1=y[:2500]
y_train2=y[2500:4500]
y_test=y[4500:]

In [None]:
# Modeify the data to input into sklearn
X_train1=X_train1.reshape(-1, 1)
X_train2=X_train2.reshape(-1, 1)
X_test=X_test.reshape(-1, 1)

A simple average ensemble model

In [None]:
model1 = linear_model.LinearRegression()
model2 = linear_model.ElasticNet()

model1.fit(X_train1,y_train1)
model2.fit(X_train2,y_train2)

pred1=model1.predict(X_test)
pred2=model2.predict(X_test)

In [None]:
print("Train Score of First Model is",model1.score(X_train1,y_train1))
print("Train Score of Second Model is",model2.score(X_train2,y_train2))
print('*'*30)
print("Test Score of First Model is",model1.score(X_test,pred1))
print("Test Score of Second Model is",model2.score(X_test,pred2))

In [None]:
finalpred=(pred1+pred2)/2
finalpred

In [None]:
y_test.shape

In [None]:
finalpred.shape

In [None]:
print(mean_squared_error(y_test, finalpred, sample_weight=None, multioutput='uniform_average'))

In [None]:
from sklearn.metrics import mean_squared_error,r2_score

In [None]:
mean_squared_error(y_test,finalpred)

In [None]:
r2_score(y_test,finalpred)

In [None]:
model1.score(X_train1,y_train1)

In [None]:
model1.score(X_test,y_test)

In [None]:
model2.score(X_train2,y_train2)

In [None]:
model2.score(X_test,y_test)