# Pandas Profilling
Pandas profiling is an open source Python module with which we can quickly do an exploratory data analysis with just a few lines of code. Besides, if this is not enough to convince us to use this tool, it also generates interactive reports in web format that can be presented to any person, even if they don’t know programming.

In short, what pandas profiling does is save us all the work of visualizing and understanding the distribution of each variable. It generates a report with all the information easily available.

## Installing the library

In [34]:
pip install pandas-profiling

Note: you may need to restart the kernel to use updated packages.


## Features:-
For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

1. Type inference: detect the types of columns in a dataframe.
2. Essentials: type, unique values, missing values
3. Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
4. Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
5. Most frequent values
6. Histogram
7. Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
8. Missing values matrix, count, heatmap and dendrogram of missing values
9. Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
10. File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

## Import Libraries

In [40]:
import numpy as np
import pandas as pd 
import pandas_profiling as pp

## Load the data

In [37]:
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Analyze the data

In [41]:
pp.ProfileReport(train)



### Disadvantage of Pandas profiling 
It's use with large datasets. With the increase in the size of the data the time to generate the report also increases a lot.
One way to solve this problem is to generate the report from only a part of all the data we have. It is important to make sure that the data selected to generate the report is representative of all the data we have, for example it could be the case that the first X rows of data contain only data from one category. In this example we would like to randomize the order of the data and select a representative sample.

# Sweetviz Library
Sweetviz is an open source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with a single line of code. Output is a fully self-contained HTML application.

The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.

## Installing the library

In [1]:
pip install sweetviz

Collecting sweetviz
  Downloading https://files.pythonhosted.org/packages/1e/b8/07d7c725beca60469c7772def06a5b6e3a6382280e5d5f60e91718a880b4/sweetviz-1.0a4-py3-none-any.whl (322kB)
Collecting importlib-resources>=1.2.0 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/b6/03/1865fdd49ec9a938f9f84b255d3d37863df9fbd18b48c1c3f761040cbf13/importlib_resources-2.0.0-py2.py3-none-any.whl
Collecting pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/1d/eb/b4f68f54ad287d583c9c3b3c77f865615f832f092810f20d2b44498cd06c/pandas-1.0.4-cp37-cp37m-win_amd64.whl (8.7MB)
Collecting jinja2>=2.11.1 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/30/9e/f663a2aa66a09d838042ae1a2c5659828bb9b41ea3a6efa20a20fd92b121/Jinja2-2.11.2-py2.py3-none-any.whl (125kB)
Collecting matplotlib>=3.1.3 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/b4/4d/8a2c06cb69935bb762738a8b9d5f8ce2a66be5a1410787839b71e

ERROR: bamboolib 1.8.0 has requirement pandas<1.0.0,>=0.18.0, but you'll have pandas 1.0.4 which is incompatible.


# Features


1. Target analysis: 
   * How target values (boolean or numerical) relate to other features
2. Visualize and compare:
   * Distinct datasets (e.g. training vs test data)
   * Intra-set characteristics (e.g. male versus female)
3. Mixed-type associations:
    Sweetviz integrates associations for numerical (Pearson's correlation), categorical (uncertainty coefficient) and   categorical-numerical (correlation ratio) datatypes seamlessly, to provide maximum information for all data types.
4. Type inference: automatically detects numerical, categorical and text features, with optional manual overrides
5. Summary information:
   * Type, unique values, missing values, duplicate rows, most frequent values
   * Numerical analysis:
      min/max/range, quartiles, mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness

## Import Library

In [1]:
import pandas as pd
import sweetviz

## Load the data

In [18]:
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

In [19]:
train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [20]:
test.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


# Analyze the train data separately

## Without focusing target 

In [21]:
report = sweetviz.analyze([train, "train"])

#Once association graphy done so we can show the final report in html 






                                   |                         | [  0%]   00:00  -> (? left)[A[A[A[A[A




Summarizing dataframe:             |                         | [  0%]   00:00  -> (? left)[A[A[A[A[A




:PassengerId:                      |███▏                 | [ 15%]   00:00  -> (00:00 left)[A[A[A[A[A




:PassengerId:                      |████▊                | [ 23%]   00:00  -> (00:02 left)[A[A[A[A[A




:Survived:                         |████▊                | [ 23%]   00:00  -> (00:02 left)[A[A[A[A[A




:Survived:                         |██████▍              | [ 31%]   00:01  -> (00:02 left)[A[A[A[A[A




:Pclass:                           |██████▍              | [ 31%]   00:01  -> (00:02 left)[A[A[A[A[A




:Pclass:                           |████████             | [ 38%]   00:01  -> (00:02 left)[A[A[A[A[A




:Name:                             |████████             | [ 38%]   00:01  -> (00:02 left)[A[A[A[A[A




:Sex:

Creating Associations graph... DONE!


## With Target 

In [23]:
report = sweetviz.analyze([train, "train"],target_feat='Survived')






                                   |                         | [  0%]   00:00  -> (? left)[A[A[A[A[A




Summarizing dataframe:             |                         | [  0%]   00:00  -> (? left)[A[A[A[A[A




:TARGET::                          |█▌                   | [  8%]   00:00  -> (00:00 left)[A[A[A[A[A




:TARGET::                          |███▏                 | [ 15%]   00:00  -> (00:01 left)[A[A[A[A[A




:PassengerId:                      |███▏                 | [ 15%]   00:00  -> (00:01 left)[A[A[A[A[A




:PassengerId:                      |████▊                | [ 23%]   00:01  -> (00:03 left)[A[A[A[A[A




:Pclass:                           |████▊                | [ 23%]   00:01  -> (00:03 left)[A[A[A[A[A




:Pclass:                           |██████▍              | [ 31%]   00:01  -> (00:03 left)[A[A[A[A[A




:Name:                             |██████▍              | [ 31%]   00:01  -> (00:03 left)[A[A[A[A[A




:Sex:

Creating Associations graph... DONE!


In [24]:
report.show_html('Report.html')   #Directly jump to the html page to check the detailed report 

In [None]:
# Default arguments will generate to "SWEETVIZ_REPORT.html"

# Compare both train and test
To compare two data sets, simply use the compare() function. Its parameters are the same as analyze(), except with an inserted second parameter to cover the comparison dataframe. It is recommended to use the [dataframe, "name"] format of parameters to better differentiate between the base and compared dataframes. (e.g. [my_df, "Train"] vs my_df)

In [27]:
report1 = sweetviz.compare([train, "Train"], [test, "Test"], "Survived")






                                   |                         | [  0%]   00:00  -> (? left)[A[A[A[A[A




Summarizing dataframe:             |                         | [  0%]   00:00  -> (? left)[A[A[A[A[A




:TARGET::                          |█▌                   | [  8%]   00:00  -> (00:00 left)[A[A[A[A[A




:TARGET::                          |███▏                 | [ 15%]   00:00  -> (00:01 left)[A[A[A[A[A




:PassengerId:                      |███▏                 | [ 15%]   00:00  -> (00:01 left)[A[A[A[A[A




:PassengerId:                      |████▊                | [ 23%]   00:01  -> (00:04 left)[A[A[A[A[A




:Pclass:                           |████▊                | [ 23%]   00:01  -> (00:04 left)[A[A[A[A[A




:Pclass:                           |██████▍              | [ 31%]   00:01  -> (00:04 left)[A[A[A[A[A




:Name:                             |██████▍              | [ 31%]   00:01  -> (00:04 left)[A[A[A[A[A




:Sex:

Creating Associations graph... DONE!


In [28]:
report1.show_html('Report1.html')

In [29]:
feature_config = sweetviz.FeatureConfig(skip="PassengerId", force_text=["Age"])

In [30]:
feature_config

<sweetviz.feature_config.FeatureConfig at 0x2c2085147f0>

In [None]:
my_report = sweetviz.analyze([train, "Train"], [test, "Test Data"], "Survived", feature_config)

### for more detail refer this link


https://towardsdatascience.com/powerful-eda-exploratory-data-analysis-in-just-two-lines-of-code-using-sweetviz-6c943d32f34
