If you find this notebook useful, please do **UPVOTE** the notebook :)

## <p style="font-family:newtimeroman; font-size:100%; text-align:center">Introduction</p> <a href= '#Introduction'></a>

In this notebook, we are going to compare different regression models with the TPS-3 dataset. We'll see how each of the model performs and compare them using MAE.

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder,StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeRegressor
from sklearn.gaussian_process.kernels import RationalQuadratic
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import LinearSVR
from xgboost import XGBRegressor
from catboost import CatBoostRegressor
import lightgbm as lgb
from sklearn.metrics import mean_absolute_error

In [None]:
train = pd.read_csv("../input/tabular-playground-series-mar-2022/train.csv",parse_dates=["time"])
test = pd.read_csv("../input/tabular-playground-series-mar-2022/test.csv",parse_dates=["time"])

In [None]:
train.head()

In [None]:
test.head()

In [None]:
train.describe()

In [None]:
test.describe()

In [None]:
train.drop("row_id", axis=1, inplace=True)
test.drop("row_id", axis=1, inplace=True)

In [None]:
def addTimeFeature(df,time_col):
    df['weekday'] = df[time_col].dt.weekday
    df['hour'] = df[time_col].dt.hour
    df['minute'] = df[time_col].dt.minute 
    
    df = df.drop([time_col],axis=1)
    
    return df

train = addTimeFeature(train,"time")
test = addTimeFeature(test,"time")

In [None]:
num_col = []
for col in train.columns:
    if train[col].dtypes != "object" and col != "congestion" and col != "row_id?":
        num_col.append(col)
        
scaler = StandardScaler()
train[num_col] = scaler.fit_transform(train[num_col])
test[num_col] = scaler.transform(test[num_col])

In [None]:
str_list = [] 
num_list = []
for colname, colvalue in train.iteritems():
    if type(colvalue[1]) == str:
        str_list.append(colname)
    else:
        num_list.append(colname)
        
for col in str_list:
    encoder = LabelEncoder()
    encoder.fit(train[col])
    train[col] = encoder.transform(train[col])

    for label in np.unique(test[col]):
        if label not in encoder.classes_: 
            encoder.classes_ = np.append(encoder.classes_, label) 
    test[col] = encoder.transform(test[col])

In [None]:
train_X = train.drop('congestion', axis=1)
train_y = train['congestion']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(train_X, train_y, test_size=0.22, random_state=2000)

## <p style="font-family:newtimeroman; font-size:100%; text-align:center">1. Pipeline of polynomial and linear regression</p> <a href= '#Pipeline of polynomial and linear regression'></a>
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. In below pipeline, polynomial features followed by linear regression is done.

In [None]:
# define the pipeline and train model
linear_model = Pipeline([('poly', PolynomialFeatures(degree=2)),
                  ('linear', LinearRegression(fit_intercept=False))])
                  
linear_model.fit(X_train, y_train)

In [None]:
linear_preds_valid = linear_model.predict(X_test).astype('int')
linear_mae = mean_absolute_error(y_test,  linear_preds_valid)
print("MAE score for LR:", linear_mae)

## <p style="font-family:newtimeroman; font-size:100%; text-align:center">2. Decision Tree</p> <a href= 'Decision Tree'></a>

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

In [None]:
# Build decision tree
tree_model = DecisionTreeRegressor(max_depth=1)
tree_model.fit(X_train,y_train)

In [None]:
tree_preds_valid = tree_model.predict(X_test).astype('int')
tree_mae = mean_absolute_error(y_test,  tree_preds_valid)
print("MAE score for DTR:", tree_mae)

## <p style="font-family:newtimeroman; font-size:100%; text-align:center">3. Random Forest</p> <a href= '#Random Forest'></a>

A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

In [None]:
rf_model = RandomForestRegressor(random_state=57).fit(X_train, y_train)

In [None]:
rf_preds_valid = rf_model.predict(X_test).astype('int')
rf_mae = mean_absolute_error(y_test,  rf_preds_valid)
print("MAE score for RFR:", rf_mae)

## <p style="font-family:newtimeroman; font-size:100%; text-align:center">4. Support Vector Regressor</p> <a href= '#Support Vector Regressor'></a>

Similar to SVR with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

In [None]:
# Choose regression method and set hyperparameter
svr_model = LinearSVR(C = 1.0, epsilon = 0.2)
# Training of the regression model
svr_model.fit(X_train, y_train)

In [None]:
svr_preds_valid = svr_model.predict(X_test).astype('int')
svr_mae = mean_absolute_error(y_test,  svr_preds_valid)
print("MAE score for SVR:", svr_mae)

## <p style="font-family:newtimeroman; font-size:100%; text-align:center">5. CatBoost</p> <a href= '#CatBoost'></a>

CatBoost is an algorithm for gradient boosting on decision trees.

In [None]:
cat_model = CatBoostRegressor(
    verbose=71,
    early_stopping_rounds=10,
    random_seed=2000,
    max_depth=14,
    task_type='GPU',
    learning_rate=0.025,
    iterations=10000,
    loss_function='MAE',
    eval_metric= 'MAE'
)

cat_model.fit(X_train, y_train)

In [None]:
cat_preds_valid = cat_model.predict(X_test).astype('int')
cat_mae = mean_absolute_error(y_test,  cat_preds_valid)
print("MAE score for CBR:", cat_mae)

## <p style="font-family:newtimeroman; font-size:100%; text-align:center">6. XGB Regressor</p> <a href= '#XGB Regressor'></a>

XGBoost is an implementation of gradient boosted decision trees designed for speed and performance that is dominative competitive machine learning.

In [None]:
xgb_model = XGBRegressor(
    max_depth=8,
    learning_rate=0.01,
    n_estimators=10000,
    verbosity=1,
    silent=None,
    objective = 'reg:linear',
    tree_method = 'gpu_hist',
    predictor = 'gpu_predictor',
    booster='gbtree',
    n_jobs=-1,
    nthread=None,
    gamma=1.0,
    min_child_weight=1,
    max_delta_step=0,
    subsample=0.7,
    colsample_bytree=1,
    colsample_bylevel=1,
    colsample_bynode=1,
    reg_alpha=20,
    reg_lambda=15,
    scale_pos_weight=1,
    base_score=0.5,
    random_state=0,
    seed=None
)



xgb_model.fit(X_train, y_train, verbose=False)

In [None]:
xgb_preds_valid = xgb_model.predict(X_test).astype('int')
xgb_mae = mean_absolute_error(y_test,  xgb_preds_valid)
print("MAE score for XGBR:", xgb_mae)

## <p style="font-family:newtimeroman; font-size:100%; text-align:center">7. LGBM Regressor</p> <a href= '#LGBM Regressor'></a>

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to have faster training speed and higher efficiency.

In [None]:
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

params = {
        'task': 'train',
        'boosting_type': 'gbdt',
        'objective': 'regression',
        'metric': 'mae',
        'learning_rate': 0.1,
        'num_leaves': 500,
        'max_bin': 50,
        'num_iterations': 10000,
        'verbosity': -1
}

lgbm_model = lgb.train(
    params,
    train_set=lgb_train,
    valid_sets=lgb_eval,
    early_stopping_rounds=100,
    verbose_eval=100
)

In [None]:
lgb_preds_valid = lgbm_model.predict(X_test).astype('int')
lgb_mae = mean_absolute_error(y_test,  lgb_preds_valid)
print("MAE score for LGBM:", lgb_mae)

Thanks for your time Kaggler. If you find this notebook useful, please do **UPVOTE** the notebook which will be a great motivation :)