# Sweet Visualization with SweetViz & IPython (Tabular Playground Series - Feb 2021)


Hello, I am sharing this automatic visualization notebook to help data understanding without much effort. I made the same kind of visualization in [Jane Street Competition](https://www.kaggle.com/rapela/jane-street-sweet-visualization). I hope you like it! :)


![SweetViz](https://warehouse-camo.ingress.cmh1.psfhosted.org/a488ecd36ce307df2949bf839872d2fb3917f5cc/687474703a2f2f636f6f6c74696d696e672e636f6d2f53562f6c6f676f2e706e67)




EDA is essential to understand the data and visualize feature distribution giving the scientist insightful ideas for feature engineering. This step can be crucial to building winning models.


> "In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application". 

For more information: https://pypi.org/project/sweetviz/


![SweetViz](https://warehouse-camo.ingress.cmh1.psfhosted.org/9eb20dbdb87a70b5e8e117088be7bd153cbc6b3d/687474703a2f2f636f6f6c74696d696e672e636f6d2f53562f66656174757265732e706e67)


## Install Sweetviz


In [None]:
!pip install sweetviz

## Import Libs

In [None]:
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.preprocessing import LabelEncoder

import sweetviz as sv # SweetViz 
from IPython.display import IFrame # Display SweetViz plots

## Parameters

In [None]:
TRAIN_PATH = "/kaggle/input/tabular-playground-series-feb-2021/train.csv"
TEST_PATH = "/kaggle/input/tabular-playground-series-feb-2021/test.csv"
SUBMISSION = "/kaggle/input/tabular-playground-series-feb-2021/sample_submission.csv"
IFRAME_SRC = "test.html"
IFRAME_WIDTH = 1920
IFRAME_HEIGHT = 1080

## Read Data (Train & Test)

In [None]:
train = pd.read_csv(TRAIN_PATH, index_col='id')
train.head()

In [None]:
submission = pd.read_csv(SUBMISSION, index_col='id')
target = train.pop("target")
target = target.values

In [None]:
train.columns

In [None]:
test = pd.read_csv(TEST_PATH, index_col='id')

In [None]:
test.columns

## Label Encoding 

* Encode categorical data

In [None]:
for c in train.columns:
    if train[c].dtype == 'object':
        lbl = LabelEncoder()
        lbl.fit(list(train[c].values) + list(test[c].values))
        
        train[c] = lbl.transform(train[c].values)
        test[c] = lbl.transform(test[c].values)

In [None]:
train.head()

In [None]:
test.head()

## SweetViz

### Train Data

In [None]:
train.shape

In [None]:
%%time
analyze = sv.analyze(train)
analyze.show_html(IFRAME_SRC)

In [None]:
IFrame(src=IFRAME_SRC, width=IFRAME_WIDTH, height=IFRAME_HEIGHT)

### Test Data

In [None]:
test.shape

In [None]:
%%time
analyze = sv.analyze(test)
analyze.show_html(IFRAME_SRC)

In [None]:
IFrame(src=IFRAME_SRC, width=IFRAME_WIDTH, height=IFRAME_HEIGHT)

## Here I will test a model :) 

In [None]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=50, n_jobs=-1)
model.fit(train, target)
submission['target'] = model.predict(test)
submission.to_csv('random_forest.csv')

## If you like this interactive visualization, do not forget to like the notebook. Thanks! :)

* In the "Associations" button, you can see the correlation of the features.
* In each interactive row is described the features with many statistics!