# Tabular Playground Series - Sep 2021

A quick review of the data, predict with LightAutoML.  
I've also made a prediction on PyCaret, but I may not publish that one...  
After last month, I'm a bit disgusted by the large amount of abstract data I'm seeing this month, but I'll get myself together and do my best this month.  

What doesn't kill you makes you stronger.  


## Setup

In [None]:
!pip install -U lightautoml

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import missingno as msno
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML
from lightautoml.tasks import Task

In [None]:
df_train = pd.read_csv('../input/tabular-playground-series-sep-2021/train.csv')
df_test = pd.read_csv('../input/tabular-playground-series-sep-2021/test.csv')

## Overview

There are missing values, but since PyCaret assigns the average value by default We will leave it as it is.

In [None]:
df_train.head()

In [None]:
df_test.head()

In [None]:
df_train.info()

In [None]:
df_test.info()

In [None]:
cols = df_train.columns.values
fig, ax = plt.subplots(30, 4, figsize=(16,120))
cnt = 0
for i in cols:
    sns.histplot(df_train[i], ax=ax[cnt//4, cnt%4], color='lightskyblue')
    cnt += 1
    
plt.show()

## Predict with LightAutoML

In [None]:
def do_automl(target, train, test):
    laml = TabularUtilizedAutoML(task = Task('binary', ),
                                 timeout=8*3600, cpu_limit=4,
                         reader_params = {'n_jobs': 4, 'cv': 5, 'random_state': 42},
                         general_params = {'use_algos': [['lgb', 'lgb_tuned']]}
                        )
    laml.fit_predict(train_data=train, roles={'target': target })
    pred = laml.predict(test).data.ravel()
    return(pred)

In [None]:
pred = do_automl('claim', df_train.drop(['id'], axis=1), df_test.drop(['id'], axis=1))

In [None]:
pred

## Submission

In [None]:
sample_sub = pd.read_csv('../input/tabular-playground-series-sep-2021/sample_solution.csv')
submission = pd.DataFrame({'id': sample_sub.id, 'claim': pred })
submission

In [None]:
submission.to_csv('LightAutoML_sub.csv',index=False)