### Hi everyone, this is a very basic starter notebook for this competition with LightGBM.

## Approach

1. Import libraries
2. Read the data
3. Check for missing values and target distribution
4. Create folds for Cross Validation
5. Fit with base LGBMClassifier
6. Create submission files

## 1. Import libraries

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from lightgbm import LGBMClassifier
from sklearn.model_selection import StratifiedKFold # For creating folds
from sklearn.metrics import log_loss # Evaluation metrics

## 2. Reading the train, test and sample submission file

In [None]:
df = pd.read_csv("/kaggle/input/tabular-playground-series-jun-2021/train.csv")
test = pd.read_csv("/kaggle/input/tabular-playground-series-jun-2021/test.csv")
ss = pd.read_csv("/kaggle/input/tabular-playground-series-jun-2021/sample_submission.csv")

In [None]:
print(f"Shape of train : {df.shape}")
print(f"Shape of test : {test.shape}")
print(f"Shape of sample submission : {ss.shape}")

## 3. Basic data check

In [None]:
df.head()

In [None]:
df.info()

In [None]:
test.info()

There are no missing values in the both train and test datasets and all are integers, so the categories might be encoded already.

## 4. Checking target distribution

In [None]:
sns.countplot(x= df.target)

Target column is imbalanced, so I will use StratifiedKFold for cross validation.

Since it is a baseline/starter model, I am not doing EDA and directly moving onto model building part.

## 5. Basline model

Creating folds for the train dataset, so that we can train the model for the n folds, to avoid overfitting.

In [None]:
df["kfold"] = -1
df = df.sample(frac=1).reset_index(drop=True)
y = df.target
kf = StratifiedKFold(n_splits=5)
for f, (t_,v_) in enumerate(kf.split(X=df,y=y)):
  df.loc[v_,"kfold"] = f

In [None]:
lgbm = LGBMClassifier(random_state=42)
logloss = []
lgbm_pred = 0
for f in range(5): # Looping around 5 folds
    
    #Splitting the data into train and validation set
    train = df[df.kfold!= f].reset_index(drop=True) 
    valid = df[df.kfold== f].reset_index(drop=True)
    
    #Creating X_train and y_train
    X_train = train.drop(["id","target", "kfold"], axis=1)
    y_train = train.target
    X_valid = valid.drop(["id","target", "kfold"], axis=1)
    y_valid = valid.target
    X_test = test.drop(["id"], axis=1)
    
    
    #Fitting the model
    lgbm.fit(X_train,y_train)
    
    #Predicting for valid and test datasets
    valid_preds = lgbm.predict_proba(X_valid)
    lgbm_pred += lgbm.predict_proba(X_test)/5
    
    #Calculating log loss
    logloss.append(log_loss(y_valid,valid_preds))
    
print(logloss)
print(sum(logloss)/len(logloss))

**The average log loss is 1.7560632816516921**

## 6. Creating submission file

In [None]:
ss["Class_1"] = lgbm_pred[:,0]
ss["Class_2"] = lgbm_pred[:,1]
ss["Class_3"] = lgbm_pred[:,2]
ss["Class_4"] = lgbm_pred[:,3]
ss["Class_5"] = lgbm_pred[:,4]
ss["Class_6"] = lgbm_pred[:,5]
ss["Class_7"] = lgbm_pred[:,6]
ss["Class_8"] = lgbm_pred[:,7]
ss["Class_9"] = lgbm_pred[:,8]
ss.to_csv("/kaggle/working/sub.csv", index=False)

If you like the notebook kindly upvote it. It will motivate me to write more notebooks. :)

### Thank you! 