## Sweetviz Demo
Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application.

The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.

**Installation:** `pip install sweetviz`

**Additional Info:** <br>
 - [SweetViz Repo](https://github.com/fbdesignpro/sweetviz)
 - [Article Demonstration Usage](https://towardsdatascience.com/sweetviz-automated-eda-in-python-a97e4cabacde)
 - EDA ARticle https://towardsdatascience.com/powerful-eda-exploratory-data-analysis-in-just-two-lines-of-code-using-sweetviz-6c943d32f34

In [1]:
import sweetviz as sv
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import numpy as np
np.warnings = warnings

In [2]:
train = pd.read_csv("../data/Titanic_data/train.csv")
test = pd.read_csv("../data/Titanic_data/test.csv")

### Load and look at our data

For this demo, we will be using the [Titanic Suvivor dataset](https://drive.google.com/file/d/11MgXvwics58UwiMjUSWJDAjn6hiflC-W/view) that has already been split for machine learning pruposes. 

In [3]:
train.head(10)

Unnamed: 0,PassengerId,Sex,Age,Name,Survived,Pclass,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,male,22.0,"Braund, Mr. Owen Harris",0,3,1,0,A/5 21171,7.25,,S
1,2,female,38.0,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1,1,1,0,PC 17599,71.2833,C85,C
2,3,female,26.0,"Heikkinen, Miss. Laina",1,3,0,0,STON/O2. 3101282,7.925,,S
3,4,female,35.0,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",1,1,1,0,113803,53.1,C123,S
4,5,male,35.0,"Allen, Mr. William Henry",0,3,0,0,373450,8.05,,S
5,6,male,,"Moran, Mr. James",0,3,0,0,330877,8.4583,,Q
6,7,male,54.0,"McCarthy, Mr. Timothy J",0,1,0,0,17463,51.8625,E46,S
7,8,male,2.0,"Palsson, Master. Gosta Leonard",0,3,3,1,349909,21.075,,S
8,9,female,27.0,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",1,3,0,2,347742,11.1333,,S
9,10,female,14.0,"Nasser, Mrs. Nicholas (Adele Achem)",1,2,1,0,237736,30.0708,,C


In [4]:
test.head(10)

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
5,897,3,"Svensson, Mr. Johan Cervin",male,14.0,0,0,7538,9.225,,S
6,898,3,"Connolly, Miss. Kate",female,30.0,0,0,330972,7.6292,,Q
7,899,2,"Caldwell, Mr. Albert Francis",male,26.0,1,1,248738,29.0,,S
8,900,3,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",female,18.0,0,0,2657,7.2292,,C
9,901,3,"Davies, Mr. John Samuel",male,21.0,2,0,A/4 48871,24.15,,S


In [5]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Sex          891 non-null    object 
 2   Age          714 non-null    float64
 3   Name         891 non-null    object 
 4   Survived     891 non-null    int64  
 5   Pclass       891 non-null    int64  
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [6]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  418 non-null    int64  
 1   Pclass       418 non-null    int64  
 2   Name         418 non-null    object 
 3   Sex          418 non-null    object 
 4   Age          332 non-null    float64
 5   SibSp        418 non-null    int64  
 6   Parch        418 non-null    int64  
 7   Ticket       418 non-null    object 
 8   Fare         417 non-null    float64
 9   Cabin        91 non-null     object 
 10  Embarked     418 non-null    object 
dtypes: float64(2), int64(4), object(5)
memory usage: 36.1+ KB


### Sweetviz: Look at the individual dataframes

In [7]:
train_report = sv.analyze(train)
train_report.show_html('train_df.html')

                                             |          | [  0%]   00:00 -> (? left)

Report train_df.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


In [8]:
test_report = sv.analyze(train)
test_report.show_html('test_df.html')

                                             |          | [  0%]   00:00 -> (? left)

Report test_df.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


### Compare the Survived column across the two dataframes

In [9]:
comp_survived = sv.compare([train, 'train_df'], [test, 'test_df'], 'Survived')
comp_survived.show_html('comp_survived.html')

                                             |          | [  0%]   00:00 -> (? left)

Report comp_survived.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
