**Introduction**

PyCaret is an open source, low-code machine learning library in Python that aims to reduce the cycle time from hypothesis to insights. It is well suited for seasoned data scientists who want to increase the productivity of their ML experiments by using PyCaret in their workflows or for citizen data scientists and those new to data science with little or no background in coding. PyCaret allows you to go from preparing your data to deploying your model within seconds using your choice of notebook environment. Please choose your track below to continue learning more about PyCaret. 

You can reach detailed information about PyCaret from https://pycaret.org/

In [None]:
#Install pycaret library
#Make sure Internet option is on in the settings panel!
!pip install pycaret

In [None]:
#Importing the neccessary libraries!
import pandas as pd
from pycaret.regression import *

In [None]:
#Read the data
train = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')
test=pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')

PyCaret setup method has a lot of options which users can set.

**Default:**

setup(data, target, train_size = 0.7, sampling = True, sample_estimator = None, categorical_features = None, categorical_imputation = ‘constant’, ordinal_features = None, high_cardinality_features = None, high_cardinality_method = ‘frequency’, numeric_features = None, numeric_imputation = ‘mean’, date_features = None, ignore_features = None, normalize = False, normalize_method = ‘zscore’, transformation = False, transformation_method = ‘yeo-johnson’, handle_unknown_categorical = True, unknown_categorical_method = ‘least_frequent’, pca = False, pca_method = ‘linear’, pca_components = None, ignore_low_variance = False, combine_rare_levels = False, rare_level_threshold = 0.10, bin_numeric_features = None, remove_outliers = False, outliers_threshold = 0.05, remove_multicollinearity = False, multicollinearity_threshold = 0.9, create_clusters = False, cluster_iter = 20, polynomial_features = False, polynomial_degree = 2, trigonometry_features = False, polynomial_threshold = 0.1, group_features = None, group_names = None, feature_selection = False, feature_selection_threshold = 0.8, feature_interaction = False, feature_ratio = False, interaction_threshold = 0.01, transform_target = False, transform_target_method = ‘box-cox’, session_id = None, silent=False, profile = False)


In [None]:
#pycaret regression setup

target='SalePrice'

setup_reg = setup(
    data = train,
    target = target,
    train_size=0.8,
    numeric_imputation = 'mean',
    silent = True,
    remove_outliers=True,
    normalize=True
)

compare_models method allows us to compare available algorithms according to selected success criteria with K-Fold cross calidation option.

**Default:**

compare_models(blacklist = None, fold = 10,  round = 4,  sort = ‘R2’, turbo = True)


In [None]:
#Comparing the models with blacklist option!
#You can also specify fold number, sorting metric etc!
#compare_models(blacklist = None, fold = 10,  round = 4,  sort = ‘R2’, turbo = True)

bl_models = ['ransac', 'tr', 'rf', 'et', 'ada', 'gbr']

result=compare_models(
    #blacklist = bl_models,
    fold = 5,
    sort = 'MAE', ## competition metric
    turbo = True
)

In [None]:
result

In [None]:
#You can check estimator options from https://pycaret.org/regression/
#This method returns MAE, MSE, RMSE, R2, RMSLE and MAPE values for selected models.
cb = create_model(
    estimator='catboost',
    fold=5
)

In [None]:
#Pycaret also has tune option for selected model!
#This function tunes the hyperparameters of a model and scores it using K-fold Cross Validation. 
tuned_cb = tune_model(
    cb,
    fold=5
)

In [None]:
#prediction
predictions =  predict_model(tuned_cb, data=test)
predictions.head()

In [None]:
#you can also ensemble your trained model
ensembled_cb = ensemble_model(cb)

In [None]:
#prediction
predictions_ensemble =  predict_model(ensembled_cb, data=test)
predictions_ensemble.head()

In [None]:
#prepare the submission file
predictions_ensemble.rename(columns={'Label':'SalePrice'}, inplace=True)
predictions_ensemble[['Id','SalePrice']].to_csv('submission_ensemble.csv', index=False)


In [None]:
#you can also blend the models
#Blend function creates an ensemble meta-estimator that fits a base regressor on the whole dataset. It then averages the predictions to form a final prediction.
#create models for blending
cb = create_model(estimator='catboost',fold=5)
par = create_model(estimator='par',fold=5)
hr = create_model(estimator='huber',fold=5)
#blend trained models
blend_specific = blend_models(estimator_list = [cb,par,hr])

In [None]:
#prediction
predictions_blended =  predict_model(blend_specific, data=test)
predictions_blended.head()

In [None]:
#prepare the submission file
predictions_blended.rename(columns={'Label':'SalePrice'}, inplace=True)
predictions_blended[['Id','SalePrice']].to_csv('submission_blended.csv', index=False)

# ***Some last words***

> > I just want to show usage of PyCaret functions for prediction. I didn't apply any preprocessing or feature engineering technics to the data. If you want to improve the reults, you can apply some preprocessing or feature engineering technics before the prediction stage. If you have any ideas to feedback please let me know in comments, and if you liked my work please don't forget to vote, thank you!
