<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#33ccff;
           font-size:110%;
           font-family:Verdana;
           letter-spacing:0.5px">
<p style="padding: 12px; color:white; text-align:center"><b>Highlights</b></p>
</div>
<div>
<p style="text-align:center">1. Custom scoring function for parameter tuning based on RMSE values.</p>
<p style="text-align:center">2. Extensive missing data analysis with detailed reasoning.</p>
<p style="text-align:center">3. Comparison amongst different regression algorithms.</p>
<p style="text-align:center">4. Novice attempt at the Stacking method.</p>
<p style="text-align:center">5. Prediction using ensemble method.</p>
</div>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

train=pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')
test=pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')

# Exploratory Data Analysis

In [None]:
train.describe().T

In [None]:
train.describe(include='O').T

**Bivariate Analysis** 

**Let's look at house price variable against different numerical variables as per their sale conditions.**

In [None]:
sns.set_theme()
features=[i for i in (train.iloc[:,1:-1]).columns if train[i].nunique() > 25 ]

plt.style.use(plt.style.available[19])
i = 1
plt.figure(figsize=(20,25))
for j in features:
    plt.subplot(6, 3, i)
    sns.scatterplot(x=j,data=train, y=train['SalePrice'], hue='SaleCondition',palette='deep')
    plt.xlabel(j)
    i += 1
plt.show()

**Number of Houses sold in each neighborhood**

In [None]:
plt.figure(figsize=(22,6))
sns.countplot(data=train, x='Neighborhood')

**Let's understand the distribution of houses sold by the built year for each neighborhood separately.**

In [None]:
sns.set_theme()
j=1
plt.figure(figsize=(24,40))
for i in train['Neighborhood'].unique():
    plt.subplot(9,3, j)
    sns.scatterplot(x=train[train['Neighborhood']==i]['YearBuilt'],data=train, y=train[train['Neighborhood']==i]['SalePrice'],hue='YrSold',palette='deep')
    plt.xlabel(i)
    j+=1

In [None]:
sns.set_theme()
fig=plt.figure(figsize=(20,40))
for i in range(len((train.select_dtypes(include='object')).columns)):
    fig.add_subplot(11,4,i+1)
    train.select_dtypes(include='object').iloc[:,i].value_counts().plot(kind="pie", subplots=True)

# Outlier Analysis

In [None]:
sns.set_theme()
for i in (train[['BsmtFinSF2', 'BsmtFinSF1', 'MasVnrArea', 'LotArea', 'LotFrontage']].columns):
    plt.figure(figsize=(15,6))
    sns.scatterplot(x=i,data=train,y=train['SalePrice'])
    plt.title('SalePrice against {}'.format(i))
    plt.show()

**Extreme outlier observations** are neglected from the train dataset. Hence, the following code. 

In [None]:
train= train[~((train['BsmtFinSF2']>1200) |(train['ScreenPorch']>350)|(train['GrLivArea']>4000)|(train['OpenPorchSF']>350)|(train['EnclosedPorch']>350)| (train['BsmtFinSF1']>3000) |(train['MasVnrArea']>1200) |(train['LotArea']>100000) | (train['LotFrontage']>200))]

**Let's concatenate train and test csv files to only address missing values issue.**

This concatenation also helps in unifying structure for train and test datasets. 

In [None]:
data=pd.concat([train, test], axis=0)

# Missing Data Analysis

In [None]:
df=pd.DataFrame({'Type': data.dtypes,
                  'Missing': data.isna().sum(),
                  'Size':data.shape[0],
                  'Unique': data.nunique()})
df['Missing_%']= (df.Missing/df.Size)*100
df[df['Missing']>0].sort_values(by=['Missing_%'], ascending=False)

# Categorical Variables

**PoolQC**: Missing value indicates No pool as few dwellings are likely to have pool.

**MiscFeature**: Data Description text file indicates NA means None.

**Alley**: Data Description text file indicates NA means 'No alley access' regarding Alley column. It is likely that missing values are NA.

**Fence**: Similar reasoning as previous code cell.

**FireplaceQC**:It is likely that certain share of dwellings will not have any fireplace.

**GarageType**:Similar reasoning as previous code cell.

**GarageFinish**: If the GarageType is 'NA', then 'GarageFinish' missing values should be denoted as 'NA'.

**GarageQual**: Similarly, All the missing values in GarageQual column have 'NA' as a garage type.

**GarageCond**: Similar reasoning for GarageCond column as mentioned in the previous cell.

**Basement Condition**:Unique values with missing values do not indicate a single dwelling that has no basement. Similar assumption as Garagetype is taken into considertaion. Certain share of dwellings likely to have no basement.

**BasementExposure**: All the 37 missing values in BsmtExposure column have NO basement as the basement type.

**BasementQuality** and **BasementFintype1**: Similar reasoning as the above.

**BasementFinType2**: All the missing values are assumed 'NA'.

**Masonry veneer type**: Observations with missing values in the MasVnrType column have missing values in MasVnrArea as well.

Let's replace the MasVnrType with the mode(Mode is 'None').

**Electrical**: The missing observation is replaced by the most frequent category.

Regarding **MsZoning**, **Functional**, **Utilities**, **KitchenQual**, **Exterior2nd**,**Exterior1st** and **Saletype**: 
    missing values are replaced with most frequent observation for the respective columns.

In [None]:
data['PoolQC']=data['PoolQC'].fillna('NA')
data['MiscFeature']=data['MiscFeature'].fillna('NA')
data['Alley']=data['Alley'].fillna('NA')
data['Fence']=data['Fence'].fillna('NA')
data['FireplaceQu']=data['FireplaceQu'].fillna('NA')
data['GarageType']=data['GarageType'].fillna('NA')
data['GarageFinish']=data['GarageFinish'].fillna('NA')
data['BsmtCond']=data['BsmtCond'].fillna('NA')
data['BsmtExposure']=data['BsmtExposure'].fillna('NA')
data['BsmtQual']=data['BsmtQual'].fillna('NA')
data['BsmtFinType2']=data['BsmtFinType2'].fillna('NA')
data['Electrical']=data['Electrical'].fillna(data['Electrical'].mode()[0])
data['GarageCond']=data['GarageCond'].fillna('NA')
data['GarageQual']=data['GarageQual'].fillna('NA')
data['BsmtFinType1']=data['BsmtFinType1'].fillna('NA')
data['MasVnrType']=data['MasVnrType'].fillna('None')
data['MSZoning']=data['MSZoning'].fillna('RL')
data['Functional']=data['Functional'].fillna('Typ')
data['Utilities']=data['Utilities'].fillna('AllPub')
data['KitchenQual']=data['KitchenQual'].fillna('TA')
data['Exterior2nd']=data['Exterior2nd'].fillna('VinylSd')
data['Exterior1st']=data['Exterior1st'].fillna('VinylSd')
data['SaleType']=data['SaleType'].fillna('WD')

**MSSubClass**: Let's make MSSubclass a categorical variable as numbers represent the belonging class. **MoSold** is assumed 
categorical variable as well. 

In [None]:
data['MSSubClass']=data['MSSubClass'].astype(object)
data['MoSold']=data['MoSold'].astype(object)

## Missing values in Numerical Variables

In [None]:
df=pd.DataFrame({'Type': data.dtypes,
                  'Missing': data.isna().sum(),
                  'Size':data.shape[0],
                  'Unique': data.nunique()})
df['Missing_%']= (df.Missing/df.Size)*100
df[df['Missing']>0].sort_values(by=['Missing_%'], ascending=False)

**SalePrice**: The idea behind this competition is predict saleprice values for the test dataset. These missing values hence 
    represent empty cells. While addressing the missing values issue using median value, it is important to ignore this column.

In [None]:
for i in df[df['Missing']>0].index:
    if i=='SalePrice':
        continue
    else:
        data[i]=data[i].fillna(data[i].median())

Let's make sure that **SalePrice** remains unaffected.

In [None]:
df=pd.DataFrame({'Type': data.dtypes,
                  'Missing': data.isna().sum(),
                  'Size':  data.shape[0],
                  'Unique': data.nunique()})
df['Missing_%']= (df.Missing/df.Size)*100
df[df['Missing']>0].sort_values(by=['Missing_%'], ascending=False)

**Now the missing values issue has been addressed.**

# Variable Categorization

This competition provides a text file which explains the variables. All the variables are **categorized manually** using
the information provided in the text file.

In [None]:
categorical=['MSSubClass', 'MSZoning', 'LotConfig', 'Neighborhood', 'LandSlope', 'LandContour',
             'Condition1','Condition2','BldgType', 'HouseStyle', 'YearBuilt','YearBuilt',
             'YearRemodAdd', 'RoofStyle', 'Exterior1st', 'Exterior2nd','RoofMatl',
            'MasVnrType', 'Foundation', 'Heating', 'Electrical', 'GarageType', 'Fence',
            'MiscFeature', 'MoSold' ,'YrSold', 'SaleType', 'PavedDrive','Alley','SaleCondition' ]

In [None]:
ordinal=['LotShape', 'ExterQual','ExterCond', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 
         'BsmtFinType1', 'BsmtFinType2','HeatingQC','KitchenQual', 'Functional', 
        'FireplaceQu', 'GarageFinish','GarageQual', 'GarageCond', ]

In [None]:
numerical=['LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'MasVnrArea',
          'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', 
          '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath','FullBath',
          'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 
           'GarageCars', 'GarageArea', 'WoodDeckSF','OpenPorchSF', 'EnclosedPorch',
          'ScreenPorch', 'MiscVal', 'GarageYrBlt' ]

Using the **value_counts** code, categories in few ordinal columns are clubbed together and applied same rating. This is done in order to reduce **the number of rating levels.** 

Unfortunately, the code is not provided. The notebook becomes unnecessarily longer with value_counts result.

In [None]:
ex_qu= { 'Po':0, 'Fa': 0, 'TA': 1, 'Gd': 2, 'Ex': 3 }
ex_cond={ 'Po':0, 'Fa': 1, 'TA': 2, 'Gd': 3, 'Ex': 4 }
Bsmt_Qual={"NA": 0, 'Po':1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5 }
BsmtFinType1={ "NA": 0, 'Unf':0, 'LwQ': 1, 'Rec': 2, 'BLQ': 3, 'ALQ': 4, 'GLQ':5 }
Bsmt_Exposure={ "NA":0, "No": 0, "Mn": 1, "Av": 2, "Gd": 3}
garage_fin={'NA': 0, 'Unf': 1, 'RFn': 2, 'Fin': 3}
garage_qu= { "NA": 0, 'Po':0, 'Fa':1, 'TA': 2, 'Gd': 3, 'Ex': 4  }
LotShape={"IR3": 0, 'IR2':0, 'IR1': 1, 'Reg': 2}
Functional={"Sal": 0, 'Sev':1, 'Maj2': 2, 'Maj1': 3, 'Mod': 4, 'Min2':5, 'Min1':6, 'Typ':7}

data=data.replace({"LotShape": LotShape,
                    "ExterQual": ex_qu,
                   "ExterCond": ex_cond,
                   "BsmtQual": Bsmt_Qual,
                   "BsmtCond": Bsmt_Qual,
                   "BsmtExposure": Bsmt_Exposure, 
                   "BsmtFinType1": BsmtFinType1, 
                   "BsmtFinType2": BsmtFinType1,
                   "HeatingQC": ex_qu,
                   "KitchenQual": ex_qu,
                   "Functional": Functional,
                    "GarageFinish": garage_fin,
                    "GarageQual": garage_qu,
                    "GarageCond": garage_qu,
                    "FireplaceQu": garage_qu})

In [None]:
X1=data[ordinal]
X2=pd.get_dummies(data[categorical], drop_first=True)
X3=data[numerical]

**In order to avoid data leakage from the observations in the test csv file into train csv file, I will convert X3 into X3_train and X3_test file and then apply box-cox transformation.**

In [None]:
X3_train=X3.iloc[:len(train),:]
X3_test=X3.iloc[len(train):,:]

#### Box-cox Transformation for the data in the Train csv file

In [None]:
skewed_columns=[]
for i in X3_train.columns:
    if abs(X3_train[i].skew())> 0.5:
        skewed_columns.append(i)

from scipy.special import boxcox1p
lam=0.15
for i in skewed_columns:
    X3_train[i]= boxcox1p(X3_train[i],lam)

#### Box-cox Transformation for the data in the Test csv file

In [None]:
from scipy.special import boxcox1p
lam=0.15
for i in skewed_columns:
    X3_test[i]= boxcox1p(X3_test[i],lam)

**Let's merge these two X3_train and X3_test files to revert back to the same dimension as X2 and X1.**

In [None]:
X3=pd.concat([X3_train,X3_test], axis=0)

In [None]:
dataset=(pd.concat([X2, X1, X3], axis=1))

In [None]:
X=dataset.iloc[:len(train),:].values
Y=train.iloc[:,-1:].values
Y=np.log1p(Y)

**Let's come back to this test dataset after finding the hypertuned model**

In [None]:
test_dataset=dataset.iloc[len(train):,:]

## Splitting the dataset- Train and Test

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size= 0.2, random_state=1)

In [None]:
from sklearn.preprocessing import RobustScaler
sc= RobustScaler()
X_train[:,(len(X1.columns)+len(X2.columns)):]= sc.fit_transform(X_train[:, (len(X1.columns)+len(X2.columns)):])
X_test[:,(len(X1.columns)+len(X2.columns)):]= sc.transform(X_test[:, (len(X1.columns)+len(X2.columns)):])

## Important Libraries

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_squared_log_error
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer

# Regression Algorithms

## Lasso Regression

In [None]:
from sklearn.linear_model import Lasso
reg = Lasso(alpha=0.0008)
reg.fit(X_train, Y_train)
Y_pred = reg.predict(X_test)
print(reg.score(X_train,Y_train))
print("RMSE: ",round(mean_squared_error((Y_test),(Y_pred), squared=False), 4))
print("MSE: ", round(mean_absolute_error(np.exp(Y_test), np.exp(Y_pred)), 4))

## Custom GridSearch CV

I plan to use **custom scoring technique** based on RMSE values. More details on this technique are mentioned in the following link.

link: 'https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter' 

In [None]:
def custom_function(Y_train, Y_pred):
    RMSE=(round(mean_squared_error((Y_train),(Y_pred), squared=False), 4))
    return RMSE

scorer=make_scorer(custom_function,greater_is_better=False)

In [None]:
parameters = [{ 'alpha': [0.0005,0.0006,0.0007,0.0008,0.001,0.002,0.003,0.004,0.005,0.006,0.007,0.008,0.009,0.01,0.011,0.012,0.1,0.2,0.3,0.4]
              }]
grid_search = GridSearchCV(estimator = Lasso(),
                           param_grid = parameters,
                           scoring = scorer,
                           cv = 10,
                           n_jobs = -1)
grid_search.fit(X_train, Y_train)

In [None]:
grid_search.best_params_

## Ridge Regression

In [None]:
from sklearn.linear_model import Ridge
reg = Ridge(alpha=1)
reg.fit(X_train, Y_train)
Y_pred = reg.predict(X_test)
print(reg.score(X_train,Y_train))
print("RMSE: ",round(mean_squared_error((Y_test),(Y_pred), squared=False), 4))
print("MSE: ", round(mean_absolute_error(np.exp(Y_test), np.exp(Y_pred)), 4))

In [None]:
parameters = [{ 'alpha': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]
              }]
grid_search = GridSearchCV(estimator = Ridge(),
                           param_grid = parameters,
                           scoring = scorer,
                           cv = 10,
                           n_jobs = -1)
grid_search.fit(X_train, Y_train)

In [None]:
grid_search.best_params_

## RidgeCV Regression

In [None]:
from sklearn.linear_model import RidgeCV
reg = RidgeCV(alphas=(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9, 1.0))
reg.fit(X_train, Y_train)
Y_pred = reg.predict(X_test)
print(reg.score(X_train,Y_train))
print("RMSE: ",round(mean_squared_error((Y_test),(Y_pred), squared=False), 4))
print("MSE: ", round(mean_absolute_error(np.exp(Y_test), np.exp(Y_pred)), 4))

## Random_Forest Regressor

In [None]:
from sklearn.ensemble import RandomForestRegressor
reg = RandomForestRegressor()
reg.fit(X_train, Y_train)
Y_pred = reg.predict(X_test)
print(reg.score(X_train,Y_train))
print("RMSE: ",round(mean_squared_error((Y_test),(Y_pred), squared=False), 4))
print("MSE: ", round(mean_absolute_error(np.exp(Y_test), np.exp(Y_pred)), 4))

## XG Boost Regressor

In [None]:
import xgboost as xgb
reg = xgb.XGBRegressor()
reg.fit(X_train, Y_train)
Y_pred = reg.predict(X_test)
print(reg.score(X_train,Y_train))
print("RMSE: ",round(mean_squared_error((Y_test),(Y_pred), squared=False), 4))
print("MSE: ", round(mean_absolute_error(np.exp(Y_test), np.exp(Y_pred)), 4))

## LGBM Regressor

In [None]:
from lightgbm import LGBMRegressor
reg = LGBMRegressor()
reg.fit(X_train, Y_train)
Y_pred = reg.predict(X_test)
print(reg.score(X_train,Y_train))
print("RMSE: ",round(mean_squared_error((Y_test),(Y_pred), squared=False), 4))
print("MSE: ", round(mean_absolute_error(np.exp(Y_test), np.exp(Y_pred)), 4))

**I ran grid search CV technique to find best parameters. Best parameters are used in following results.**

<h4>Let's tabulate the results from different ML algorithms and compare the results using RMSE and MAE values.</h4>

In [None]:
models=['RidgeCV_Regression', 'Random_Forest_Regression', 'XG-Boost_Regression', 'Ridge_Regression', 'Lasso_Regression', 'LGBM_Regression']

regressor = [
             RidgeCV(alphas=(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0)), 
             RandomForestRegressor(), 
             xgb.XGBRegressor(learning_rate=0.07, max_depth= 2, n_estimators= 900, reg_alpha= 0.5, reg_lambda=0.3), 
             Ridge(alpha=0.006),
             Lasso(alpha=0.0008),
             LGBMRegressor(learning_rate=0.07, max_depth= 2, n_estimators= 1100, reg_alpha= 0.4, reg_lambda=0.4)
            ]
scores=[]
RMSE=[]
MAE=[]

for i in regressor:
    i.fit(X_train, Y_train)
    Y_pred = i.predict(X_test)
    scores.append(i.score(X_train,Y_train))
    RMSE.append(round(mean_squared_error((Y_test),(Y_pred), squared=False), 4))
    MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_pred)), 4))

In [None]:
result=pd.DataFrame({'Model': models, 'Score': scores, 'RMSE': RMSE, 'MAE': MAE})
result.sort_values(by=['RMSE'], ascending=True)

# Ensemble prediction using base models

In [None]:
models=['Model-1', 'Model-2', 'Model-3', 'Model-4', 'Model-5']

regressor = [LGBMRegressor(learning_rate=0.07, max_depth= 2, n_estimators= 900, reg_alpha= 0.5, reg_lambda=0.3), 
             xgb.XGBRegressor(learning_rate=0.05, n_estimators=900, reg_alpha=0.3, reg_lambda=0.2, max_depth=4),
             Lasso(alpha=0.0008)]

weights1=[0.25,0.45,0.30]
weights2=[0.40,0.40,0.20]
weights3=[0.05,0.75,0.2]
weights4=[0.45,0.45,0.10]
weights5=[0.35,0.35,0.30]

RMSE=[]
MAE=[]
Y_tot1=0
Y_tot2=0
Y_tot3=0
Y_tot4=0
Y_tot5=0


for i in range(len(regressor)):
    regressor[i].fit(X_train, Y_train)
    Y_pred = regressor[i].predict(X_test)
    Y_tot1+=weights1[i]*Y_pred
    Y_tot2+=weights2[i]*Y_pred
    Y_tot3+=weights3[i]*Y_pred
    Y_tot4+=weights4[i]*Y_pred
    Y_tot5+=weights5[i]*Y_pred

RMSE.append(round(mean_squared_error((Y_test),(Y_tot1), squared=False), 4))
RMSE.append(round(mean_squared_error((Y_test),(Y_tot2), squared=False), 4))
RMSE.append(round(mean_squared_error((Y_test),(Y_tot3), squared=False), 4))
RMSE.append(round(mean_squared_error((Y_test),(Y_tot4), squared=False), 4))
RMSE.append(round(mean_squared_error((Y_test),(Y_tot5), squared=False), 4))

MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot1)), 3))
MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot2)), 3))
MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot3)), 3))
MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot4)), 3))
MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot5)), 3))

In [None]:
result=pd.DataFrame({'Model': models ,'RMSE': RMSE, 'MAE': MAE})
result.sort_values(by=['RMSE'], ascending=True)

**It is evident from the results that ensemble predictions have lower RMSE values compared to base model predictions.**

I am introduced to the following technique from different notebooks under the same competition codes. 

I thank the community for your codes. 

I used following link to generate my first stacking regressor predictions.

**Link**: https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/) 

# Stacking Regressor

**This is first time I am creating a stacking regressor.** 

**Please comment if you see any mistakes in my structural approach.** 

In [None]:
from sklearn.ensemble import StackingRegressor

In [None]:
level_0 = list()
level_0.append(('Lasso',Lasso(alpha=0.0008)))
level_0.append(('LGBMRegressor', LGBMRegressor(learning_rate=0.07, max_depth= 2, n_estimators= 900, reg_alpha= 0.5, reg_lambda=0.3)))
level_0.append(('XGBoost', xgb.XGBRegressor(learning_rate=0.05, n_estimators=900, reg_alpha=0.3, reg_lambda=0.2, max_depth=4)))

level1 =RidgeCV()

In [None]:
model = StackingRegressor(estimators=level_0, final_estimator=level1, cv=10)

In [None]:
model.fit(X_train, Y_train)

In [None]:
Y_pred = model.predict(X_test)
print(model.score(X_train,Y_train))
print("RMSE: ",round(mean_squared_error((Y_test),(Y_pred), squared=False), 4))
print("MSE: ", round(mean_absolute_error(np.exp(Y_test), np.exp(Y_pred)), 4))

# Ensemble model with a stacking regressor

In [None]:
models=['Model-1', 'Model-2', 'Model-3', 'Model-4', 'Model-5']

regressor = [LGBMRegressor(learning_rate=0.07, max_depth= 2, n_estimators= 900, reg_alpha= 0.5, reg_lambda=0.3), 
             xgb.XGBRegressor(learning_rate=0.05, n_estimators=900,reg_alpha=0.4, reg_lambda=0.2,max_depth=3), 
             Lasso(alpha=0.0008), model]

weights1=[0.2,0.2,0.3,0.3]
weights2=[0.1,0.1,0.4,0.4]
weights3=[0.1,0.2,0.2, 0.5]
weights4=[0.05,0.05,0.25, 0.65]
weights5=[0.1,0.1,0.05,0.75]

RMSE=[]
MAE=[]
Y_tot1=0
Y_tot2=0
Y_tot3=0
Y_tot4=0
Y_tot5=0

for i in range(len(regressor)):
    regressor[i].fit(X_train, Y_train)
    Y_pred = regressor[i].predict(X_test)
    Y_tot1+=weights1[i]*Y_pred
    Y_tot2+=weights2[i]*Y_pred
    Y_tot3+=weights3[i]*Y_pred
    Y_tot4+=weights4[i]*Y_pred
    Y_tot5+=weights5[i]*Y_pred

RMSE.append(round(mean_squared_error((Y_test),(Y_tot1), squared=False), 4))
RMSE.append(round(mean_squared_error((Y_test),(Y_tot2), squared=False), 4))
RMSE.append(round(mean_squared_error((Y_test),(Y_tot3), squared=False), 4))
RMSE.append(round(mean_squared_error((Y_test),(Y_tot4), squared=False), 4))
RMSE.append(round(mean_squared_error((Y_test),(Y_tot5), squared=False), 4))

MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot1)), 3))
MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot2)), 3))
MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot3)), 3))
MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot4)), 3))
MAE.append(round(mean_absolute_error(np.exp(Y_test), np.exp(Y_tot5)), 3))

In [None]:
result=pd.DataFrame({'Model': models, 'RMSE': RMSE, 'MAE': MAE})
result

## Predictions for the Test dataset

In [None]:
test=test_dataset.values

In [None]:
regressor = [LGBMRegressor(learning_rate=0.07, max_depth= 2, n_estimators= 900, reg_alpha= 0.5, reg_lambda=0.3), 
             xgb.XGBRegressor(learning_rate=0.05, n_estimators=900,reg_alpha=0.4, reg_lambda=0.2,max_depth=3), 
             Lasso(alpha=0.0008), model]

weights=[0.1,0.1,0.4,0.4]

In [None]:
test[:,(len(X1.columns)+len(X2.columns)):]= sc.transform(test[:, (len(X1.columns)+len(X2.columns)):])

In [None]:
Y_tot=0
for i in range(0,len(regressor),1):
               regressor[i].fit(X_train, Y_train)
               Y_pred = regressor[i].predict(test)
               Y_tot+=weights[i]*Y_pred

In [None]:
Y_tot

In [None]:
Y_pred=pd.DataFrame(np.exp(Y_tot))

In [None]:
ID=(pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')).iloc[:,0:1]

In [None]:
result=pd.concat([ID, Y_pred], axis=1)
result.columns=['ID', 'SalePrice']

In [None]:
result

In [None]:
result.to_csv('prediction.csv', index=False)

**Possible improvements:** 
   1. Feature engineering (removing the least correlated features)
   2. Hypertuning Stacking regressor parameters. 

**Please upvote if you like this kernel.**