### How do I know which regression model to choose for a particular problem/dataset?
* try all ML models and just select the one has best performance result! 
> performance is measured coefficient, (adjusted) R-Sqaured

* 5 ML models
> support vector, random forest, polynomial, multiple linear and decision tree regression

* no categorical or string & missing data in dataset 
> NO NEED for preprocessing! 

* dependent vairable : energy output - PE 


In [29]:
# import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [30]:
# import the dataset
dataset = pd.read_csv('real world dataset.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
dataset.head()

Unnamed: 0,AT,V,AP,RH,PE
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.4,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.5,1009.23,96.62,473.9


### Splitting the dataset into the Training set and Test set
* split the dataset to get the best result by comparison

In [31]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state = 0)

## 1. Multiple Linear Regression


### Training the Multiple Linear Regression model on the Training set

In [32]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train,y_train)

LinearRegression()

### Predicting the Test set result

In [33]:
y_pred = regressor.predict(x_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[431.43 431.23]
 [458.56 460.01]
 [462.75 461.14]
 ...
 [469.52 473.26]
 [442.42 438.  ]
 [461.88 463.28]]


### Evaluating the Model Performance
* compare the real results and prediction using R-Sqaured coefficient

In [35]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)


0.9325315554761303

## 2. Polynomial Linear Regression


### Training the Polynomial Regression model on the Training set

In [18]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 4)
x_poly = poly_reg.fit_transform(x_train)
regressor_2 = LinearRegression()
regressor_2.fit(x_poly, y_train)

LinearRegression()

### Predicting the Test set result

In [19]:
y_pred2 = regressor_2.predict(poly_reg.transform(x_test))
np.set_printoptions(precision=2)
print(np.concatenate((y_pred2.reshape(len(y_pred2),1), y_test.reshape(len(y_test),1)),1))

[[433.94 431.23]
 [457.9  460.01]
 [460.52 461.14]
 ...
 [469.53 473.26]
 [438.27 438.  ]
 [461.67 463.28]]


### Evaluating the Model Performance

In [36]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred2)

0.9458193300147094

## 3. support_vector_regression

In [41]:
y = y.reshape(len(y),1)

In [42]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state = 0)

### Feature Scaling

In [43]:
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
x_train = sc_x.fit_transform(x_train)
y_train = sc_y.fit_transform(y_train)

### Training the SVR model on the Training set

In [44]:
from sklearn.svm import SVR
regressor_3 = SVR(kernel = 'rbf')
regressor_3.fit(x_train, y_train)

  y = column_or_1d(y, warn=True)


SVR()

### Predicting the Test set result

In [45]:
y_pred3 = sc_y.inverse_transform(regressor_3.predict(sc_x.transform(x_test)).reshape(-1,1))
np.set_printoptions(precision=2)
print(np.concatenate((y_pred3.reshape(len(y_pred3),1), y_test.reshape(len(y_test),1)),1))

[[434.05 431.23]
 [457.94 460.01]
 [461.03 461.14]
 ...
 [470.6  473.26]
 [439.42 438.  ]
 [460.92 463.28]]


### Evaluating the Model Performance

In [46]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred3)

0.948078404998626

## 4. Decision Tree Regression

In [52]:
# import the dataset
dataset = pd.read_csv('real world dataset.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [53]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

### Training the Decision Tree Regression model on the Training set

In [54]:
from sklearn.tree import DecisionTreeRegressor
regressor_4 = DecisionTreeRegressor(random_state = 0)
regressor_4.fit(x_train, y_train)

DecisionTreeRegressor(random_state=0)

### Predicting the Test set result

In [57]:
y_pred4 = regressor_4.predict(x_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred4.reshape(len(y_pred4),1), y_test.reshape(len(y_test),1)),1))

[[431.28 431.23]
 [459.59 460.01]
 [460.06 461.14]
 ...
 [471.46 473.26]
 [437.76 438.  ]
 [462.74 463.28]]


### Evaluating the Model Performance

In [58]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred4)

0.922905874177941

## 5. Random Forest Regression

### Training the Random Forest Regression model on the Training set

In [59]:
from sklearn.ensemble import RandomForestRegressor
regressor_5 = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor_5.fit(x_train, y_train)

RandomForestRegressor(n_estimators=10, random_state=0)

### Predicting the Test set result

In [60]:
y_pred5 = regressor_5.predict(x_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred5.reshape(len(y_pred5),1), y_test.reshape(len(y_test),1)),1))

[[434.05 431.23]
 [458.79 460.01]
 [463.02 461.14]
 ...
 [469.48 473.26]
 [439.57 438.  ]
 [460.38 463.28]]


### Evaluating the Model Performance

In [61]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred5)

0.9615908334363876

## Compare the r2_score 

1. Multiple Linear Regression
> 0.9325315554761303

2. Polynomial Linear Regression
> 0.9458193300147094

3. support_vector_regression
> 0.948078404998626

4. Decision Tree Regression
> 0.922905874177941

5. Random Forest Regression
> 0.9615908334363876

### Conclusion
- the best model : Random Forest Regression model