### Sweetviz
* Sweetviz is an open source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with a single line of code. Output is a fully self-contained HTML application.
* Sweetviz is a powerful package that helps you visualize your data in an extremely intuitive manner. It shows all the plots specific to each and every column and also lists out the Statistical Summary of each column

In [1]:
import pandas as pd

In [2]:
# install library - pip install sweetviz
import sweetviz 

In [3]:
pd.__version__

'1.0.3'

In [4]:
!pip freeze | grep sweetviz

sweetviz==1.0a7


* Sweetviz is in the ALPHA TESTING PHASE
* Sweetviz currently supports Python 3.6+ and Pandas 0.25.3+. Reports are output using the base "os" module, so custom environments such as Google Colab which require custom file operations are not yet supported
* Currently the only rendering supported is to a standalone HTML file

In [6]:
# load data
house_train = pd.read_csv(r'./data/house_train.csv')
house_test = pd.read_csv(r'./data/house_test.csv')

In [7]:
# check dimensions
house_train.shape, house_test.shape

((1460, 81), (1459, 80))

In [8]:
house_train.head(n=3)

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500


In this case we have used house price predictions data. This has 80 columns, this kind of heavy data is hard for pandas profiling to handle. But sweetviz handles it with ease.

### Visualization w.r.t target variable:
Sweetviz offers a platform to compare train data in terms of target variable which in this case is Sales Price.

In [10]:
report = sweetviz.analyze([house_train, 'Train'], target_feat = 'SalePrice')


                                   |                         | [  0%]   00:00  -> (? left)[A
Summarizing dataframe:             |                         | [  0%]   00:00  -> (? left)[A
Summarizing dataframe:             |▎                    | [  1%]   00:00  -> (00:27 left)[A
:TARGET::                          |▎                    | [  1%]   00:00  -> (00:27 left)[A
:TARGET::                          |▌                    | [  2%]   00:02  -> (01:06 left)[A
:Id:                               |▌                    | [  2%]   00:02  -> (01:06 left)[A
:Id:                               |▊                    | [  4%]   00:03  -> (01:07 left)[A
:MSSubClass:                       |▊                    | [  4%]   00:03  -> (01:07 left)[A
:MSSubClass:                       |█                    | [  5%]   00:04  -> (01:08 left)[A
:MSZoning:                         |█                    | [  5%]   00:04  -> (01:08 left)[A
:MSZoning:                         |█▎                   | 

:CentralAir:                       |███████████          | [ 52%]   00:33  -> (00:30 left)[A
:CentralAir:                       |███████████▎         | [ 54%]   00:33  -> (00:27 left)[A
:Electrical:                       |███████████▎         | [ 54%]   00:33  -> (00:27 left)[A
:Electrical:                       |███████████▌         | [ 55%]   00:34  -> (00:25 left)[A
:1stFlrSF:                         |███████████▌         | [ 55%]   00:34  -> (00:25 left)[A
:1stFlrSF:                         |███████████▊         | [ 56%]   00:35  -> (00:28 left)[A
:2ndFlrSF:                         |███████████▊         | [ 56%]   00:35  -> (00:28 left)[A
:2ndFlrSF:                         |████████████         | [ 57%]   00:36  -> (00:29 left)[A
:LowQualFinSF:                     |████████████         | [ 57%]   00:36  -> (00:29 left)[A
:LowQualFinSF:                     |████████████▎        | [ 59%]   00:37  -> (00:31 left)[A
:GrLivArea:                        |████████████▎        | [

:Processing Pairwise Features:     |███                  | [ 15%]   00:01  -> (00:17 left)[A
:Processing Pairwise Features:     |███▎                 | [ 16%]   00:02  -> (00:16 left)[A
:Processing Pairwise Features:     |███▋                 | [ 17%]   00:02  -> (00:16 left)[A
:Processing Pairwise Features:     |███▉                 | [ 19%]   00:02  -> (00:14 left)[A
:Processing Pairwise Features:     |████▏                | [ 20%]   00:02  -> (00:13 left)[A
:Processing Pairwise Features:     |████▍                | [ 21%]   00:02  -> (00:13 left)[A
:Processing Pairwise Features:     |████▋                | [ 22%]   00:03  -> (00:12 left)[A
:Processing Pairwise Features:     |████▉                | [ 23%]   00:03  -> (00:13 left)[A
:Processing Pairwise Features:     |█████▏               | [ 25%]   00:03  -> (00:12 left)[A
:Processing Pairwise Features:     |█████▉               | [ 28%]   00:03  -> (00:09 left)[A
:Processing Pairwise Features:     |██████▏              | [

Creating Associations graph... DONE!


Here `Train` is just a name you are supposed to give to represent your train data in sweetviz and your target feature is SalePrice.  Once this code is executed it runs for about 10-12 seconds and processes pairwise features and also creates association graphs in  this step.

In [15]:
type(report)

sweetviz.dataframe_report.DataframeReport

In [11]:
# render the output as html
report.show_html(r'./data/output/House_train_report.html')

* Once we execute this code we are presented  with the beautiful visualization by Sweetviz.
* The top section gives information about the train data.

* We get information about each feature, its distribution and its relation with target variable. 
  For example let us look at the feature `MSSubClass`. Here the blue lines above the bars in the 
  picture represent the Average SalePrice (target variable).
  We are provided wih the entire Statistical summary of this feature since this is a continuous feature. 
* In case of categorical features you may not find the statistical summary because this part behaves exactly in the manner .describe() behaves.

### Comparison of train and test datasets.

Sweetviz offers an option where we can compare train and test datasets with respect to target variable.

In [13]:
train_test_comp_report = sweetviz.compare([house_train, 'Train'], [house_test, 'Test'], 
                                         target_feat = 'SalePrice')


                                   |                         | [  0%]   00:00  -> (? left)[A
Summarizing dataframe:             |                         | [  0%]   00:00  -> (? left)[A
Summarizing dataframe:             |▎                    | [  1%]   00:00  -> (00:56 left)[A
:TARGET::                          |▎                    | [  1%]   00:00  -> (00:56 left)[A
:TARGET::                          |▌                    | [  2%]   00:01  -> (00:56 left)[A
:Id:                               |▌                    | [  2%]   00:01  -> (00:56 left)[A
:Id:                               |▊                    | [  4%]   00:02  -> (01:10 left)[A
:MSSubClass:                       |▊                    | [  4%]   00:02  -> (01:10 left)[A
:MSSubClass:                       |█                    | [  5%]   00:03  -> (01:14 left)[A
:MSZoning:                         |█                    | [  5%]   00:03  -> (01:14 left)[A
:MSZoning:                         |█▎                   | 

:CentralAir:                       |███████████          | [ 52%]   00:35  -> (00:28 left)[A
:CentralAir:                       |███████████▎         | [ 54%]   00:35  -> (00:25 left)[A
:Electrical:                       |███████████▎         | [ 54%]   00:35  -> (00:25 left)[A
:Electrical:                       |███████████▌         | [ 55%]   00:36  -> (00:23 left)[A
:1stFlrSF:                         |███████████▌         | [ 55%]   00:36  -> (00:23 left)[A
:1stFlrSF:                         |███████████▊         | [ 56%]   00:38  -> (00:38 left)[A
:2ndFlrSF:                         |███████████▊         | [ 56%]   00:38  -> (00:38 left)[A
:2ndFlrSF:                         |████████████         | [ 57%]   00:39  -> (00:37 left)[A
:LowQualFinSF:                     |████████████         | [ 57%]   00:39  -> (00:37 left)[A
:LowQualFinSF:                     |████████████▎        | [ 59%]   00:40  -> (00:36 left)[A
:GrLivArea:                        |████████████▎        | [

:Processing Pairwise Features:     |███▎                 | [ 16%]   00:03  -> (00:25 left)[A
:Processing Pairwise Features:     |███▋                 | [ 17%]   00:03  -> (00:28 left)[A
:Processing Pairwise Features:     |███▉                 | [ 19%]   00:04  -> (00:25 left)[A
:Processing Pairwise Features:     |████▏                | [ 20%]   00:04  -> (00:24 left)[A
:Processing Pairwise Features:     |████▍                | [ 21%]   00:04  -> (00:23 left)[A
:Processing Pairwise Features:     |████▋                | [ 22%]   00:05  -> (00:21 left)[A
:Processing Pairwise Features:     |████▉                | [ 23%]   00:05  -> (00:20 left)[A
:Processing Pairwise Features:     |█████▏               | [ 25%]   00:05  -> (00:21 left)[A
:Processing Pairwise Features:     |█████▉               | [ 28%]   00:06  -> (00:16 left)[A
:Processing Pairwise Features:     |██████▏              | [ 30%]   00:06  -> (00:16 left)[A
:Processing Pairwise Features:     |██████▍              | [

Creating Associations graph... DONE!


* "Train" and "Test" are just the names given to represent train and test datasets in sweetviz environment.
* The target feature is SalePrice.

In [16]:
train_test_comp_report.show_html(r'./data/House_train_test_comparison.html')

Here the ones displayed in blue colour represents your train data and the ones displayed in orange represents your test data. As you can see the orange section in your sale price would be missing that is because there is no SalePrice in your test data since that is the target variable.