<div style="border: 10px solid #4A148C; border-radius: 15px; padding: 20px 20px 20px 20px; box-shadow: 3px 3px 10px grey; background-color: #f4f4f4;">
    <h2 style="font-size: 28px; font-weight: bold; color: #006400; text-align: center; border-bottom: 4px ridge #FFC107; font-family: 'Georgia'; padding: 10px;">üåç Business Problem</h2>
    <p style="font-size: 16px; font-family: 'Times New Roman'; padding: 10px;">In the financial services sector, especially in e-commerce, a technology company providing payment solutions to businesses of varying sizes aims to maintain its market leadership and enhance customer satisfaction by continually developing innovative solutions. This company strives to make shopping more accessible and secure for both buyers and sellers by simplifying payment processes. However, rapid changes in cust...
    <ul style="font-size: 16px; font-family: 'Times New Roman'; padding: 10px;">
        <li>üìä <strong>Machine Learning for Payment Systems Optimization:</strong> The company can better understand customer behaviors and payment preferences using machine learning models. This enables them to develop strategies for personalizing payment processes, strengthening fraud prevention systems, and improving the overall customer experience.</li>
        <li>üéØ <strong>Targeted Innovations:</strong> Entrepreneurs and investors developing innovative payment solutions can use these predictions to create products that better meet market needs.</li>
        <li>üå± <strong>Sustainable Growth:</strong> Accurate predictions of payment trends can help the company achieve its sustainable growth objectives while contributing to the strengthening of the e-commerce ecosystem.</li>
    </ul>
    <p style="font-size: 16px; font-family: 'Times New Roman'; padding: 10px;">This approach can provide valuable insights into shaping policies, directing investments, and adopting best practices in e-commerce payment systems.</p>
</div>


<div style="border: 10px solid #FFA500; border-radius: 15px; padding: 20px 20px 20px 20px; box-shadow: 3px 3px 10px grey; background-color: #f4f4f4;">
    <h2 style="font-size: 28px; font-weight: bold; color: #8B4513; text-align: center; border-bottom: 4px ridge #FFA500; font-family: 'Georgia'; padding: 10px;">üìä Dataset Story</h2>
    <p style="font-size: 16px; font-family: 'Times New Roman'; padding: 10px;">This dataset spans from the beginning of 2020 to September 2023, offering a monthly glimpse into the transaction counts of various businesses. While a test file has not been provided, participants are encouraged to segment the train dataset for testing as they see fit. The primary goal is to predict the monthly transaction counts (net_payment_count) for merchants for the last quarter of 2023 (October - December).</p>
    <ul style="font-size: 16px; font-family: 'Times New Roman'; padding: 10px;">
        <li><strong>merchant_id:</strong> Masked merchant ID, representing a unique identifier for each business.</li>
        <li><strong>month_id:</strong> The month of the transaction, formatted as YYYYMM.</li>
        <li><strong>merchant_source:</strong> The source through which the merchant joined the platform.</li>
        <li><strong>settlement_period:</strong> The frequency at which the merchant receives their settlements.</li>
        <li><strong>working_type:</strong> Indicates the type of the merchant's business.</li>
        <li><strong>mcc_id:</strong> Merchant category code that shows the category of sales made by the merchant.</li>
        <li><strong>merchant_segment:</strong> Indicates the segment within the platform in which the merchant is categorized.</li>
        <li><strong>net_payment_count:</strong> Represents the net number of transactions (payments - cancellations - returns) made by the merchant within the given month.</li>
    </ul>
</div>

# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'>1 |</span></b> <b>Importing Libraries</b></div>

In [1]:
# Standard library imports
import warnings

# Third-party library imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import lightgbm as lgb
import statsmodels.api as sm
import statsmodels.tsa.api as smt
import xgboost as xgb
from xgboost import XGBRegressor
import catboost as cb
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor, ExtraTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler, normalize, scale
from sklearn.neighbors import KNeighborsRegressor
from lightgbm import LGBMRegressor
from sklearn import metrics
from IPython.display import HTML as html_print, display
from termcolor import colored
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, ExponentialSmoothing
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Suppress all warnings
warnings.filterwarnings('ignore')



# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'>2 |</span></b> <b>Adjusting Row & Column Settings</b></div>


In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'>3 |</span></b> <b>Loading The Data Set</b></div>

In [3]:
submission_data = pd.read_csv('/kaggle/input/iyzico-datathon/sample_submission.csv')

In [4]:
submission_data.head()

Unnamed: 0,id,net_payment_count
0,202311merchant_36004,0
1,202312merchant_36004,0
2,202310merchant_36004,0
3,202311merchant_23099,0
4,202312merchant_23099,0


In [5]:
submission_data.shape

(78180, 2)

In [6]:
train_data = pd.read_csv('/kaggle/input/iyzico-datathon/train.csv')

In [7]:
train_data.head()

Unnamed: 0,merchant_id,month_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count
0,merchant_43992,202307,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,15106
1,merchant_43992,202301,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,16918
2,merchant_43992,202305,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,13452
3,merchant_43992,202308,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,16787
4,merchant_43992,202302,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,12428


In [8]:
train_data["merchant_id"].nunique()

26060

In [9]:
train_data.shape

(291142, 8)

In [10]:
test_data = submission_data.copy()

In [11]:
test_data.head()

Unnamed: 0,id,net_payment_count
0,202311merchant_36004,0
1,202312merchant_36004,0
2,202310merchant_36004,0
3,202311merchant_23099,0
4,202312merchant_23099,0


In [12]:
test_data["month_id"] = test_data["id"].apply(lambda x: x[:6])

In [13]:
test_data.head()

Unnamed: 0,id,net_payment_count,month_id
0,202311merchant_36004,0,202311
1,202312merchant_36004,0,202312
2,202310merchant_36004,0,202310
3,202311merchant_23099,0,202311
4,202312merchant_23099,0,202312


In [14]:
test_data["merchant_id"] = test_data["id"].apply(lambda x: x[6:])

In [15]:
test_data.head()

Unnamed: 0,id,net_payment_count,month_id,merchant_id
0,202311merchant_36004,0,202311,merchant_36004
1,202312merchant_36004,0,202312,merchant_36004
2,202310merchant_36004,0,202310,merchant_36004
3,202311merchant_23099,0,202311,merchant_23099
4,202312merchant_23099,0,202312,merchant_23099


In [16]:
test_data["net_payment_count"] = np.nan

In [17]:
test_data.head()

Unnamed: 0,id,net_payment_count,month_id,merchant_id
0,202311merchant_36004,,202311,merchant_36004
1,202312merchant_36004,,202312,merchant_36004
2,202310merchant_36004,,202310,merchant_36004
3,202311merchant_23099,,202311,merchant_23099
4,202312merchant_23099,,202312,merchant_23099


In [18]:
test_data = test_data[["merchant_id", "month_id", "net_payment_count"]]

In [19]:
test_data.shape

(78180, 3)

In [20]:
test_data.isnull().sum()

merchant_id              0
month_id                 0
net_payment_count    78180
dtype: int64

In [21]:
train_data.isnull().sum()

merchant_id             0
month_id                0
merchant_source_name    0
settlement_period       0
working_type            0
mcc_id                  0
merchant_segment        0
net_payment_count       0
dtype: int64

In [22]:
test_data.head()

Unnamed: 0,merchant_id,month_id,net_payment_count
0,merchant_36004,202311,
1,merchant_36004,202312,
2,merchant_36004,202310,
3,merchant_23099,202311,
4,merchant_23099,202312,


In [23]:
test_data = test_data.merge(
    train_data[
        [
            "merchant_id",
            "merchant_source_name",
            "settlement_period",
            "working_type",
            "mcc_id",
            "merchant_segment",
        ]
    ].drop_duplicates(),
    on=["merchant_id"],
    how="left",
)

In [24]:
test_data.sort_values(by='merchant_id').head()

Unnamed: 0,merchant_id,month_id,net_payment_count,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment
57417,merchant_1,202311,,Merchant Source - 2,Settlement Period - 1,Working Type - 5,mcc_128,Segment - 4
57418,merchant_1,202312,,Merchant Source - 2,Settlement Period - 1,Working Type - 5,mcc_128,Segment - 4
57419,merchant_1,202310,,Merchant Source - 2,Settlement Period - 1,Working Type - 5,mcc_128,Segment - 4
58753,merchant_10,202312,,Merchant Source - 2,Settlement Period - 3,Working Type - 6,mcc_42,Segment - 4
58752,merchant_10,202311,,Merchant Source - 2,Settlement Period - 3,Working Type - 6,mcc_42,Segment - 4


In [25]:
test_data.shape

(78180, 8)

In [26]:
train_data.sort_values(by='merchant_id').head()

Unnamed: 0,merchant_id,month_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count
234371,merchant_1,202210,Merchant Source - 2,Settlement Period - 1,Working Type - 5,mcc_128,Segment - 4,3
234370,merchant_1,202108,Merchant Source - 2,Settlement Period - 1,Working Type - 5,mcc_128,Segment - 4,3
249226,merchant_10,202112,Merchant Source - 2,Settlement Period - 3,Working Type - 6,mcc_42,Segment - 4,24
249227,merchant_10,202203,Merchant Source - 2,Settlement Period - 3,Working Type - 6,mcc_42,Segment - 4,9
249225,merchant_10,202202,Merchant Source - 2,Settlement Period - 3,Working Type - 6,mcc_42,Segment - 4,7


In [27]:
df = pd.concat([train_data, test_data], axis=0).reset_index(drop=True)

In [28]:
df.shape

(369322, 8)

In [29]:
df.head()

Unnamed: 0,merchant_id,month_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count
0,merchant_43992,202307,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,15106.0
1,merchant_43992,202301,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,16918.0
2,merchant_43992,202305,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,13452.0
3,merchant_43992,202308,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,16787.0
4,merchant_43992,202302,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,12428.0


In [30]:
df.tail()

Unnamed: 0,merchant_id,month_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count
369317,merchant_35969,202312,Merchant Source - 1,Settlement Period - 1,Working Type - 6,mcc_110,Segment - 4,
369318,merchant_35969,202310,Merchant Source - 1,Settlement Period - 1,Working Type - 6,mcc_110,Segment - 4,
369319,merchant_8429,202311,Merchant Source - 1,Settlement Period - 1,Working Type - 5,mcc_42,Segment - 4,
369320,merchant_8429,202312,Merchant Source - 1,Settlement Period - 1,Working Type - 5,mcc_42,Segment - 4,
369321,merchant_8429,202310,Merchant Source - 1,Settlement Period - 1,Working Type - 5,mcc_42,Segment - 4,


# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'>4 |</span></b> <b>Exploratory Data Analysis</b></div>

In [31]:
df.dtypes

merchant_id              object
month_id                 object
merchant_source_name     object
settlement_period        object
working_type             object
mcc_id                   object
merchant_segment         object
net_payment_count       float64
dtype: object

In [32]:
df["date"] = df["month_id"].apply(lambda x: str(x)[:4] + "-" + str(x)[4:] + "-01")

In [33]:
df.head()

Unnamed: 0,merchant_id,month_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,date
0,merchant_43992,202307,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,15106.0,2023-07-01
1,merchant_43992,202301,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,16918.0,2023-01-01
2,merchant_43992,202305,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,13452.0,2023-05-01
3,merchant_43992,202308,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,16787.0,2023-08-01
4,merchant_43992,202302,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,12428.0,2023-02-01


In [34]:
df = df.drop(columns={'month_id'})

In [35]:
df["date"] = pd.to_datetime(df["date"])

In [36]:
df["date"].min()

Timestamp('2020-01-01 00:00:00')

In [37]:
df["date"].max()

Timestamp('2023-12-01 00:00:00')

In [38]:
df.head()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,date
0,merchant_43992,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,15106.0,2023-07-01
1,merchant_43992,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,16918.0,2023-01-01
2,merchant_43992,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,13452.0,2023-05-01
3,merchant_43992,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,16787.0,2023-08-01
4,merchant_43992,Merchant Source - 3,Settlement Period - 3,Working Type - 2,mcc_197,Segment - 2,12428.0,2023-02-01


In [39]:
columns = [ 'merchant_id','merchant_source_name', 'settlement_period', 'working_type', 'mcc_id', 'merchant_segment']
for column in columns:
    df[column] = df[column].str.extract('(\d+)').astype(int)

In [40]:
df.head()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,date
0,43992,3,3,2,197,2,15106.0,2023-07-01
1,43992,3,3,2,197,2,16918.0,2023-01-01
2,43992,3,3,2,197,2,13452.0,2023-05-01
3,43992,3,3,2,197,2,16787.0,2023-08-01
4,43992,3,3,2,197,2,12428.0,2023-02-01


# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'>5 |</span></b> <b>Feature Engineering</b></div>

<h2 style="background-color: #f2f2f2; padding: 10px; color: #0c5674;">Date Features</h2>

In [41]:
def create_date_features(df, date_column):
    df['month'] = df[date_column].dt.month
    df['year'] = df[date_column].dt.year
    df['quarter'] = df[date_column].dt.quarter
    return df

In [42]:
df = create_date_features(df, "date")

In [43]:
df.head()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,date,month,year,quarter
0,43992,3,3,2,197,2,15106.0,2023-07-01,7,2023,3
1,43992,3,3,2,197,2,16918.0,2023-01-01,1,2023,1
2,43992,3,3,2,197,2,13452.0,2023-05-01,5,2023,2
3,43992,3,3,2,197,2,16787.0,2023-08-01,8,2023,3
4,43992,3,3,2,197,2,12428.0,2023-02-01,2,2023,1


In [44]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 369322 entries, 0 to 369321
Data columns (total 11 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   merchant_id           369322 non-null  int64         
 1   merchant_source_name  369322 non-null  int64         
 2   settlement_period     369322 non-null  int64         
 3   working_type          369322 non-null  int64         
 4   mcc_id                369322 non-null  int64         
 5   merchant_segment      369322 non-null  int64         
 6   net_payment_count     291142 non-null  float64       
 7   date                  369322 non-null  datetime64[ns]
 8   month                 369322 non-null  int32         
 9   year                  369322 non-null  int32         
 10  quarter               369322 non-null  int32         
dtypes: datetime64[ns](1), float64(1), int32(3), int64(6)
memory usage: 26.8 MB


In [45]:
df.head()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,date,month,year,quarter
0,43992,3,3,2,197,2,15106.0,2023-07-01,7,2023,3
1,43992,3,3,2,197,2,16918.0,2023-01-01,1,2023,1
2,43992,3,3,2,197,2,13452.0,2023-05-01,5,2023,2
3,43992,3,3,2,197,2,16787.0,2023-08-01,8,2023,3
4,43992,3,3,2,197,2,12428.0,2023-02-01,2,2023,1


In [46]:
df.tail()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,date,month,year,quarter
369317,35969,1,1,6,110,4,,2023-12-01,12,2023,4
369318,35969,1,1,6,110,4,,2023-10-01,10,2023,4
369319,8429,1,1,5,42,4,,2023-11-01,11,2023,4
369320,8429,1,1,5,42,4,,2023-12-01,12,2023,4
369321,8429,1,1,5,42,4,,2023-10-01,10,2023,4


In [47]:
df['school holiday'] = 0

for year, month in [(2020, 11), (2021, 11), (2022, 2), (2022, 4), (2023, 11)]:
    mask = (df['year'] == year) & (df['month'] == month)
    df.loc[mask, 'school holiday'] = 1


In [48]:
df.tail()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,date,month,year,quarter,school holiday
369317,35969,1,1,6,110,4,,2023-12-01,12,2023,4,0
369318,35969,1,1,6,110,4,,2023-10-01,10,2023,4,0
369319,8429,1,1,5,42,4,,2023-11-01,11,2023,4,1
369320,8429,1,1,5,42,4,,2023-12-01,12,2023,4,0
369321,8429,1,1,5,42,4,,2023-10-01,10,2023,4,0


In [49]:
df['other holiday'] = 0

mask = (df['month'] == 11) | (df['month'] == 12)
df.loc[mask, 'other holiday'] = 1

In [50]:
df.head()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,date,month,year,quarter,school holiday,other holiday
0,43992,3,3,2,197,2,15106.0,2023-07-01,7,2023,3,0,0
1,43992,3,3,2,197,2,16918.0,2023-01-01,1,2023,1,0,0
2,43992,3,3,2,197,2,13452.0,2023-05-01,5,2023,2,0,0
3,43992,3,3,2,197,2,16787.0,2023-08-01,8,2023,3,0,0
4,43992,3,3,2,197,2,12428.0,2023-02-01,2,2023,1,0,0


In [51]:
df = df.drop(columns=['date'])

In [52]:
df.tail()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,month,year,quarter,school holiday,other holiday
369317,35969,1,1,6,110,4,,12,2023,4,0,1
369318,35969,1,1,6,110,4,,10,2023,4,0,0
369319,8429,1,1,5,42,4,,11,2023,4,1,1
369320,8429,1,1,5,42,4,,12,2023,4,0,1
369321,8429,1,1,5,42,4,,10,2023,4,0,0


In [53]:
df.head()

Unnamed: 0,merchant_id,merchant_source_name,settlement_period,working_type,mcc_id,merchant_segment,net_payment_count,month,year,quarter,school holiday,other holiday
0,43992,3,3,2,197,2,15106.0,7,2023,3,0,0
1,43992,3,3,2,197,2,16918.0,1,2023,1,0,0
2,43992,3,3,2,197,2,13452.0,5,2023,2,0,0
3,43992,3,3,2,197,2,16787.0,8,2023,3,0,0
4,43992,3,3,2,197,2,12428.0,2,2023,1,0,0


In [54]:
df = pd.get_dummies(df, columns=['mcc_id', 'working_type', 'merchant_source_name', 'settlement_period','merchant_segment'])

In [55]:
df.head()

Unnamed: 0,merchant_id,net_payment_count,month,year,quarter,school holiday,other holiday,mcc_id_1,mcc_id_2,mcc_id_3,mcc_id_4,mcc_id_5,mcc_id_6,mcc_id_7,mcc_id_9,mcc_id_10,mcc_id_11,mcc_id_12,mcc_id_13,mcc_id_14,mcc_id_15,mcc_id_16,mcc_id_18,mcc_id_19,mcc_id_20,mcc_id_21,mcc_id_22,mcc_id_23,mcc_id_24,mcc_id_25,mcc_id_26,mcc_id_27,mcc_id_28,mcc_id_29,mcc_id_30,mcc_id_31,mcc_id_33,mcc_id_34,mcc_id_35,mcc_id_36,mcc_id_37,mcc_id_38,mcc_id_39,mcc_id_40,mcc_id_42,mcc_id_43,mcc_id_44,mcc_id_45,mcc_id_46,mcc_id_47,mcc_id_48,mcc_id_49,mcc_id_50,mcc_id_51,mcc_id_52,mcc_id_53,mcc_id_54,mcc_id_55,mcc_id_56,mcc_id_57,mcc_id_58,mcc_id_59,mcc_id_60,mcc_id_61,mcc_id_63,mcc_id_64,mcc_id_65,mcc_id_66,mcc_id_67,mcc_id_68,mcc_id_69,mcc_id_70,mcc_id_71,mcc_id_72,mcc_id_73,mcc_id_74,mcc_id_76,mcc_id_77,mcc_id_78,mcc_id_79,mcc_id_80,mcc_id_81,mcc_id_82,mcc_id_83,mcc_id_84,mcc_id_85,mcc_id_86,mcc_id_87,mcc_id_88,mcc_id_89,mcc_id_90,mcc_id_92,mcc_id_93,mcc_id_94,mcc_id_95,mcc_id_96,mcc_id_98,mcc_id_100,mcc_id_101,mcc_id_102,mcc_id_104,mcc_id_106,mcc_id_107,mcc_id_108,mcc_id_109,mcc_id_110,mcc_id_112,mcc_id_113,mcc_id_114,mcc_id_115,mcc_id_116,mcc_id_117,mcc_id_118,mcc_id_120,mcc_id_121,mcc_id_122,mcc_id_124,mcc_id_125,mcc_id_126,mcc_id_127,mcc_id_128,mcc_id_130,mcc_id_131,mcc_id_132,mcc_id_133,mcc_id_134,mcc_id_135,mcc_id_137,mcc_id_138,mcc_id_141,mcc_id_143,mcc_id_144,mcc_id_145,mcc_id_147,mcc_id_148,mcc_id_149,mcc_id_150,mcc_id_151,mcc_id_152,mcc_id_153,mcc_id_154,mcc_id_155,mcc_id_156,mcc_id_157,mcc_id_160,mcc_id_161,mcc_id_162,mcc_id_163,mcc_id_164,mcc_id_165,mcc_id_166,mcc_id_167,mcc_id_168,mcc_id_169,mcc_id_170,mcc_id_171,mcc_id_172,mcc_id_173,mcc_id_174,mcc_id_175,mcc_id_176,mcc_id_177,mcc_id_178,mcc_id_179,mcc_id_180,mcc_id_183,mcc_id_184,mcc_id_185,mcc_id_186,mcc_id_187,mcc_id_188,mcc_id_189,mcc_id_190,mcc_id_192,mcc_id_193,mcc_id_194,mcc_id_195,mcc_id_196,mcc_id_197,working_type_1,working_type_2,working_type_3,working_type_4,working_type_5,working_type_6,merchant_source_name_1,merchant_source_name_2,merchant_source_name_3,settlement_period_1,settlement_period_2,settlement_period_3,merchant_segment_1,merchant_segment_2,merchant_segment_3,merchant_segment_4
0,43992,15106.0,7,2023,3,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
1,43992,16918.0,1,2023,1,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
2,43992,13452.0,5,2023,2,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
3,43992,16787.0,8,2023,3,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
4,43992,12428.0,2,2023,1,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False


In [56]:
df.head()

Unnamed: 0,merchant_id,net_payment_count,month,year,quarter,school holiday,other holiday,mcc_id_1,mcc_id_2,mcc_id_3,mcc_id_4,mcc_id_5,mcc_id_6,mcc_id_7,mcc_id_9,mcc_id_10,mcc_id_11,mcc_id_12,mcc_id_13,mcc_id_14,mcc_id_15,mcc_id_16,mcc_id_18,mcc_id_19,mcc_id_20,mcc_id_21,mcc_id_22,mcc_id_23,mcc_id_24,mcc_id_25,mcc_id_26,mcc_id_27,mcc_id_28,mcc_id_29,mcc_id_30,mcc_id_31,mcc_id_33,mcc_id_34,mcc_id_35,mcc_id_36,mcc_id_37,mcc_id_38,mcc_id_39,mcc_id_40,mcc_id_42,mcc_id_43,mcc_id_44,mcc_id_45,mcc_id_46,mcc_id_47,mcc_id_48,mcc_id_49,mcc_id_50,mcc_id_51,mcc_id_52,mcc_id_53,mcc_id_54,mcc_id_55,mcc_id_56,mcc_id_57,mcc_id_58,mcc_id_59,mcc_id_60,mcc_id_61,mcc_id_63,mcc_id_64,mcc_id_65,mcc_id_66,mcc_id_67,mcc_id_68,mcc_id_69,mcc_id_70,mcc_id_71,mcc_id_72,mcc_id_73,mcc_id_74,mcc_id_76,mcc_id_77,mcc_id_78,mcc_id_79,mcc_id_80,mcc_id_81,mcc_id_82,mcc_id_83,mcc_id_84,mcc_id_85,mcc_id_86,mcc_id_87,mcc_id_88,mcc_id_89,mcc_id_90,mcc_id_92,mcc_id_93,mcc_id_94,mcc_id_95,mcc_id_96,mcc_id_98,mcc_id_100,mcc_id_101,mcc_id_102,mcc_id_104,mcc_id_106,mcc_id_107,mcc_id_108,mcc_id_109,mcc_id_110,mcc_id_112,mcc_id_113,mcc_id_114,mcc_id_115,mcc_id_116,mcc_id_117,mcc_id_118,mcc_id_120,mcc_id_121,mcc_id_122,mcc_id_124,mcc_id_125,mcc_id_126,mcc_id_127,mcc_id_128,mcc_id_130,mcc_id_131,mcc_id_132,mcc_id_133,mcc_id_134,mcc_id_135,mcc_id_137,mcc_id_138,mcc_id_141,mcc_id_143,mcc_id_144,mcc_id_145,mcc_id_147,mcc_id_148,mcc_id_149,mcc_id_150,mcc_id_151,mcc_id_152,mcc_id_153,mcc_id_154,mcc_id_155,mcc_id_156,mcc_id_157,mcc_id_160,mcc_id_161,mcc_id_162,mcc_id_163,mcc_id_164,mcc_id_165,mcc_id_166,mcc_id_167,mcc_id_168,mcc_id_169,mcc_id_170,mcc_id_171,mcc_id_172,mcc_id_173,mcc_id_174,mcc_id_175,mcc_id_176,mcc_id_177,mcc_id_178,mcc_id_179,mcc_id_180,mcc_id_183,mcc_id_184,mcc_id_185,mcc_id_186,mcc_id_187,mcc_id_188,mcc_id_189,mcc_id_190,mcc_id_192,mcc_id_193,mcc_id_194,mcc_id_195,mcc_id_196,mcc_id_197,working_type_1,working_type_2,working_type_3,working_type_4,working_type_5,working_type_6,merchant_source_name_1,merchant_source_name_2,merchant_source_name_3,settlement_period_1,settlement_period_2,settlement_period_3,merchant_segment_1,merchant_segment_2,merchant_segment_3,merchant_segment_4
0,43992,15106.0,7,2023,3,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
1,43992,16918.0,1,2023,1,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
2,43992,13452.0,5,2023,2,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
3,43992,16787.0,8,2023,3,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
4,43992,12428.0,2,2023,1,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False


In [57]:
df.shape

(369322, 195)

# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'>5 |</span></b> <b>One-Hot Encoding</b></div>

In [58]:
split_index = df.index[df['net_payment_count'].isna()][0]

train_ = df.iloc[:split_index]
sub= df.iloc[split_index:]

In [59]:
print(train_.shape)
print(sub.shape)

(291142, 195)
(78180, 195)


In [60]:
train_.head()

Unnamed: 0,merchant_id,net_payment_count,month,year,quarter,school holiday,other holiday,mcc_id_1,mcc_id_2,mcc_id_3,mcc_id_4,mcc_id_5,mcc_id_6,mcc_id_7,mcc_id_9,mcc_id_10,mcc_id_11,mcc_id_12,mcc_id_13,mcc_id_14,mcc_id_15,mcc_id_16,mcc_id_18,mcc_id_19,mcc_id_20,mcc_id_21,mcc_id_22,mcc_id_23,mcc_id_24,mcc_id_25,mcc_id_26,mcc_id_27,mcc_id_28,mcc_id_29,mcc_id_30,mcc_id_31,mcc_id_33,mcc_id_34,mcc_id_35,mcc_id_36,mcc_id_37,mcc_id_38,mcc_id_39,mcc_id_40,mcc_id_42,mcc_id_43,mcc_id_44,mcc_id_45,mcc_id_46,mcc_id_47,mcc_id_48,mcc_id_49,mcc_id_50,mcc_id_51,mcc_id_52,mcc_id_53,mcc_id_54,mcc_id_55,mcc_id_56,mcc_id_57,mcc_id_58,mcc_id_59,mcc_id_60,mcc_id_61,mcc_id_63,mcc_id_64,mcc_id_65,mcc_id_66,mcc_id_67,mcc_id_68,mcc_id_69,mcc_id_70,mcc_id_71,mcc_id_72,mcc_id_73,mcc_id_74,mcc_id_76,mcc_id_77,mcc_id_78,mcc_id_79,mcc_id_80,mcc_id_81,mcc_id_82,mcc_id_83,mcc_id_84,mcc_id_85,mcc_id_86,mcc_id_87,mcc_id_88,mcc_id_89,mcc_id_90,mcc_id_92,mcc_id_93,mcc_id_94,mcc_id_95,mcc_id_96,mcc_id_98,mcc_id_100,mcc_id_101,mcc_id_102,mcc_id_104,mcc_id_106,mcc_id_107,mcc_id_108,mcc_id_109,mcc_id_110,mcc_id_112,mcc_id_113,mcc_id_114,mcc_id_115,mcc_id_116,mcc_id_117,mcc_id_118,mcc_id_120,mcc_id_121,mcc_id_122,mcc_id_124,mcc_id_125,mcc_id_126,mcc_id_127,mcc_id_128,mcc_id_130,mcc_id_131,mcc_id_132,mcc_id_133,mcc_id_134,mcc_id_135,mcc_id_137,mcc_id_138,mcc_id_141,mcc_id_143,mcc_id_144,mcc_id_145,mcc_id_147,mcc_id_148,mcc_id_149,mcc_id_150,mcc_id_151,mcc_id_152,mcc_id_153,mcc_id_154,mcc_id_155,mcc_id_156,mcc_id_157,mcc_id_160,mcc_id_161,mcc_id_162,mcc_id_163,mcc_id_164,mcc_id_165,mcc_id_166,mcc_id_167,mcc_id_168,mcc_id_169,mcc_id_170,mcc_id_171,mcc_id_172,mcc_id_173,mcc_id_174,mcc_id_175,mcc_id_176,mcc_id_177,mcc_id_178,mcc_id_179,mcc_id_180,mcc_id_183,mcc_id_184,mcc_id_185,mcc_id_186,mcc_id_187,mcc_id_188,mcc_id_189,mcc_id_190,mcc_id_192,mcc_id_193,mcc_id_194,mcc_id_195,mcc_id_196,mcc_id_197,working_type_1,working_type_2,working_type_3,working_type_4,working_type_5,working_type_6,merchant_source_name_1,merchant_source_name_2,merchant_source_name_3,settlement_period_1,settlement_period_2,settlement_period_3,merchant_segment_1,merchant_segment_2,merchant_segment_3,merchant_segment_4
0,43992,15106.0,7,2023,3,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
1,43992,16918.0,1,2023,1,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
2,43992,13452.0,5,2023,2,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
3,43992,16787.0,8,2023,3,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False
4,43992,12428.0,2,2023,1,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,True,False,False


In [61]:
sub.reset_index(drop=True, inplace=True)

In [62]:
sub.head()

Unnamed: 0,merchant_id,net_payment_count,month,year,quarter,school holiday,other holiday,mcc_id_1,mcc_id_2,mcc_id_3,mcc_id_4,mcc_id_5,mcc_id_6,mcc_id_7,mcc_id_9,mcc_id_10,mcc_id_11,mcc_id_12,mcc_id_13,mcc_id_14,mcc_id_15,mcc_id_16,mcc_id_18,mcc_id_19,mcc_id_20,mcc_id_21,mcc_id_22,mcc_id_23,mcc_id_24,mcc_id_25,mcc_id_26,mcc_id_27,mcc_id_28,mcc_id_29,mcc_id_30,mcc_id_31,mcc_id_33,mcc_id_34,mcc_id_35,mcc_id_36,mcc_id_37,mcc_id_38,mcc_id_39,mcc_id_40,mcc_id_42,mcc_id_43,mcc_id_44,mcc_id_45,mcc_id_46,mcc_id_47,mcc_id_48,mcc_id_49,mcc_id_50,mcc_id_51,mcc_id_52,mcc_id_53,mcc_id_54,mcc_id_55,mcc_id_56,mcc_id_57,mcc_id_58,mcc_id_59,mcc_id_60,mcc_id_61,mcc_id_63,mcc_id_64,mcc_id_65,mcc_id_66,mcc_id_67,mcc_id_68,mcc_id_69,mcc_id_70,mcc_id_71,mcc_id_72,mcc_id_73,mcc_id_74,mcc_id_76,mcc_id_77,mcc_id_78,mcc_id_79,mcc_id_80,mcc_id_81,mcc_id_82,mcc_id_83,mcc_id_84,mcc_id_85,mcc_id_86,mcc_id_87,mcc_id_88,mcc_id_89,mcc_id_90,mcc_id_92,mcc_id_93,mcc_id_94,mcc_id_95,mcc_id_96,mcc_id_98,mcc_id_100,mcc_id_101,mcc_id_102,mcc_id_104,mcc_id_106,mcc_id_107,mcc_id_108,mcc_id_109,mcc_id_110,mcc_id_112,mcc_id_113,mcc_id_114,mcc_id_115,mcc_id_116,mcc_id_117,mcc_id_118,mcc_id_120,mcc_id_121,mcc_id_122,mcc_id_124,mcc_id_125,mcc_id_126,mcc_id_127,mcc_id_128,mcc_id_130,mcc_id_131,mcc_id_132,mcc_id_133,mcc_id_134,mcc_id_135,mcc_id_137,mcc_id_138,mcc_id_141,mcc_id_143,mcc_id_144,mcc_id_145,mcc_id_147,mcc_id_148,mcc_id_149,mcc_id_150,mcc_id_151,mcc_id_152,mcc_id_153,mcc_id_154,mcc_id_155,mcc_id_156,mcc_id_157,mcc_id_160,mcc_id_161,mcc_id_162,mcc_id_163,mcc_id_164,mcc_id_165,mcc_id_166,mcc_id_167,mcc_id_168,mcc_id_169,mcc_id_170,mcc_id_171,mcc_id_172,mcc_id_173,mcc_id_174,mcc_id_175,mcc_id_176,mcc_id_177,mcc_id_178,mcc_id_179,mcc_id_180,mcc_id_183,mcc_id_184,mcc_id_185,mcc_id_186,mcc_id_187,mcc_id_188,mcc_id_189,mcc_id_190,mcc_id_192,mcc_id_193,mcc_id_194,mcc_id_195,mcc_id_196,mcc_id_197,working_type_1,working_type_2,working_type_3,working_type_4,working_type_5,working_type_6,merchant_source_name_1,merchant_source_name_2,merchant_source_name_3,settlement_period_1,settlement_period_2,settlement_period_3,merchant_segment_1,merchant_segment_2,merchant_segment_3,merchant_segment_4
0,36004,,11,2023,4,1,1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,False,False,True
1,36004,,12,2023,4,0,1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,False,False,True
2,36004,,10,2023,4,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,False,False,True
3,23099,,11,2023,4,1,1,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,True,False,False,False,False,False,True
4,23099,,12,2023,4,0,1,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,True,False,False,False,False,False,True


In [63]:
sub['net_payment_count'].isna().sum()

78180

# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'>6 |</span></b> <b></b>Modelling</div>

In [64]:
y=train_['net_payment_count']
X=train_.drop(['net_payment_count'],axis=1)

X = X.fillna(0)

In [65]:
def regression_analysis(x, y):
    
    algorithms = {
        "LGBM": LGBMRegressor(),
        "ExtraTree": ExtraTreeRegressor(),
        "GradientBoosting": GradientBoostingRegressor(),
        "DecisionTree": DecisionTreeRegressor(),
        "XGB": XGBRegressor()
    }

    X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=42)

    results = pd.DataFrame(columns=['MAE'], index=algorithms.keys())

    for name, algo in algorithms.items():
        algo.fit(X_train, y_train)
        predictions = algo.predict(X_test)
        mae = mean_absolute_error(y_test, predictions)

        results.loc[name, 'MAE'] = mae

    return results.sort_values('MAE', ascending=True)

In [66]:
regression_analysis(X, y)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.041578 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 576
[LightGBM] [Info] Number of data points in the train set: 262027, number of used features: 153
[LightGBM] [Info] Start training from score 427.691051


Unnamed: 0,MAE
DecisionTree,94.399
ExtraTree,97.096
XGB,142.768
LGBM,222.808
GradientBoosting,342.912


In [67]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.01, random_state=42)

In [68]:
xgb = DecisionTreeRegressor()
xgb.fit(X_train, y_train)
xgb_preds = xgb.predict(X_test) 

In [69]:
print('MAE:', metrics.mean_absolute_error(y_test, xgb_preds))

MAE: 80.33482142857143


<style>
  .link-card {
    background-color: #f9f9f9;
    border: 1px solid #e0e0e0;
    padding: 15px;
    border-radius: 8px;
    margin: 10px 0;
    display: flex;
    align-items: center;
    text-decoration: none;
    color: #333;
    transition: background-color 0.3s, transform 0.3s, box-shadow 0.3s;
  }
  
  .link-card:hover {
    background-color: #f0f0f0;
    transform: translateY(-5px);
    box-shadow: 0px 4px 8px rgba(0, 0, 0, 0.1);
  }
  
  .link-icon {
    font-size: 50px;
    margin-right: 15px;
  }
  
  .link-text {
    font-size: 20px;
    font-weight: bold;
  }
</style>

<a href="https://www.kaggle.com/mehmetisik/code" class="link-card">
  <span class="link-icon">üìä</span>
  <span class="link-text">Mehmet ISIK's Notebook</span>
</a>
