Check `pandas-profiling` version 

In [None]:
! pip show pandas-profiling

### Have you ever wondered what if summary statistics is more just a simple summary? 

##Introducing, **pandas_profiling** for simple and fast exploratory data analysis of a Pandas Datafram

Exploratory Data Analysis (EDA) plays a very important role in understanding the dataset. Whether you are going to build a Machine Learning Model or if it's just an exercise to bring out insights from the given data, EDA is the primary task to perform. While it's undeniable that EDA is very important, The task of performing Exploratory Data Analysis grows in parallel with the number of columns your dataset has got. 

For ex: Assume you've got a dataset with 10 rows x 2 columns. It's very simply to specify those two column names separately and plot all the required plots to perform EDA. Alternatively, If the dataset has got 20 columns, you've to repeat the same above exercise for another 10 times. Now, there's another layer of complexity because the visualization that you choose for a `continuous variable` and `categorical variable` is different, hence the type of the plot changes when the data type changes. 

Given all these conditions, EDA sometimes becomes a tedious task - but remember it's all driven by a set of rules - like plot `boxplot` and `histogram` for a continous variable, Measure `missing values`, Calculate `frequency` if it's categorical variable - thus giving us opportunity to automate things. That's the base of this python module `pandas_profiling` that helps one in automating the first-level of EDA. 

From their github page:

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

* **Essentials**:  type, unique values, missing values
* **Quantile statistics** like minimum value, Q1, median, Q3, maximum, range, interquartile range
* **Descriptive statistics** like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
* **Most frequent values**
* **Histogram**
* **Correlations** highlighting of highly correlated variables, Spearman and Pearson matrixes

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import pandas_profiling as pp

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

**Loading Training Dataset**

In [None]:
df = pd.read_csv("../input/heart-disease-uci/heart.csv", sep = ",")

In [None]:
df.head()

In [None]:
profile = pp.ProfileReport(df, title = "Heart Disease UCI")

**Exploring `resources` Data**

In [None]:
profile.to_notebook_iframe()

Importing visualization libraries

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()
get_ipython().run_line_magic('matplotlib', 'inline')
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.express as px
import numpy as np

Produce a pie chart to visualize the marital status category

In [None]:
dist = df['sex'].value_counts()
colors = ['mediumturquoise', 'darkorange']
trace = go.Pie(values=(np.array(dist)),labels=dist.index)
layout = go.Layout(title='Distribution of patient gender')
data = [trace]
fig = go.Figure(trace,layout)
fig.update_traces(marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()

In [None]:
profile.to_file("output.html")


The primariy objective of this Kernel is to introduce this amazing Python Module `pandas_profiling` that does an excellent job in aiding you perform a simple quick **EDA**. You can refer more about this module here on github: [https://github.com/pandas-profiling/pandas-profiling](https://github.com/pandas-profiling/pandas-profiling)