In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<h1 style="background-color:LimeGreen; font-family:newtimeroman; font-size:200%; text-align:center; border-radius: 15px 50px;"> 1) Introduction </h1>

<img src="https://drive.google.com/drive/u/0/folders/14_dydbYZxGsojzYhf0CaBnmUhuBiLDdY/AutoViz_01.png" width="600px">

- Visualization is a technique that is used to visualize the data using different graphs and plots. In data science, we generally use data visualization techniques to understand the dataset and find the relation between the data. Visualization can also help in finding the pattern in the dataset which is used for further analysis.

- There are different techniques/libraries in python which are used for **Data Visualization like Matplotlib, Seaborn, Plotly, etc.** But while using all these libraries we need to define the type of graph we want to visualize and the arguments which we need to visualize.

- In this article, we will learn about a python library **AutoViz** which can **automate the whole process** of Data Visualization in just a **single line of code.**

- AutoViz performs automatic visualization of any dataset with just one line of code. AutoViz can find the most important features and plot impactful visualizations only using those automatically selected features. Also, AutoViz is incredibly fast so it creates visualization within seconds.

 
 
       -------------------------------------------------
       
         pip install autoviz
         
       -------------------------------------------------
       
         from autoviz.AutoViz_Class import AutoViz_Class

         AV = AutoViz_Class()
      
      -----------------------------------------
      
         filename = ""
         sep = ","
         dft = AV.AutoViz(
             filename,
             sep=",",
             depVar="",
             dfte=None,
             header=0,
             verbose=0,
             lowess=False,
             chart_format="svg",
             max_rows_analyzed=150000,
             max_cols_analyzed=30,
             )

      -------------------------------------------
      
 **filename** - which is the name of the file

 **Sep** - The seperators that are used in the dataset . example: ',' for csv files

 **Target** - The target variable in the dataset.
 
 **chart_format**
 
  - chart_format ='png'

      -------------------------------------------
      
### filename:
- Make sure that you give filename as empty string ("") if there is no filename associated with this data and you want to use a dataframe, then use dfte to give the name of the dataframe. Otherwise, fill in the file name and leave dfte as empty string. Only one of these two is needed to load the data set.

### sep:
- This is the separator in the file. It can be **comma, semi-colon or tab or any value** that you see in your file that separates each column.

### depVar:
- **Target variable** in your dataset. You can leave it as **empty string** if you **don't have a target variable** in your data.

### dfte:
- This is the input dataframe in case you want to load a pandas dataframe to plot charts. In that case, leave filename as an empty string.

### header:
- The row number of the header row in your file. If it is the first row, then this must be zero.

### verbose:

 - if **0**, **display minimal information** but **displays charts** on your notebook

 - if **1**, print **extra information** on the notebook and also **display charts**

 - if **2**, will **not display any charts**, it will simply save them in your local machine under **AutoViz_Plots directory**. Make sure you **delete this folder periodically**, otherwise, you will have lots of charts saved here if you used verbose=2 option a lot.

### lowess:
- This option is very nice for small datasets where you can see regression lines for each pair of continuous variable against the target variable. Don't use this for large data sets (that is over 100,000 rows)

### chart_format:
- This can be SVG, PNG or JPG. You will get charts generated and saved in this format if you used verbose=2 option. Very useful for generating charts and using them later.

### max_rows_analyzed:
- Limits the max number of rows that is used to display charts. If you have a very large data set with millions of rows, then use this option to limit the amount of time it takes to generate charts. We will take a statistically valid sample.

### max_cols_analyzed:
- limits the number of continuous vars that can be analyzed

<h2 style=color:green align="left"> Reference: </h2>

 - https://github.com/AutoViML/AutoViz
 
 - https://www.kaggle.com/general/233832
 
 - https://topicplay.com/v/415840
 
 - https://towardsdatascience.com/autoviz-a-new-tool-for-automated-visualization-ec9c1744a6ad

<h2 style=color:green align="left"> Understanding Output </h2>

- **1st** plot will provide us the **summary of columns.**

 - Number of interger,category, strin columns etc.

 - Autoviz will also tell us that whether we have provided any target variable or not?


- **2nd** plot will be of **pair wise scatter plot of all the continuous variables**. If we are having multiple continuous variables autoviz will create multiple pair wise plots.

- **3rd** chart will be the **distribution plot of the continuous variables**.

 - Plot will allow us to find out **outliers using boxplot**
 
 - We can find out whether the data is **skewed or not?**
 
- **4th** Chart will be **violin plot of all the continuous variables**

- **5th** chart will be a **heatmap** correlation of all the variables correlation plot will allow us to find out **collinearity** of the various variables.

- **6th** next chart will be visualizing the **continuous values grouping by all the categorical variables**. This will provide a meaningful insight.

<h1 style="background-color:LimeGreen; font-family:newtimeroman; font-size:200%; text-align:center; border-radius: 15px 50px;"> 2) Load Required Libraries </h1>

In [None]:
import pandas as pd
import datetime as dt

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

<h1 style="background-color:LimeGreen; font-family:newtimeroman; font-size:200%; text-align:center; border-radius: 15px 50px;"> 3) Read Data </h1>

In [None]:
ECommerce = pd.read_csv("/kaggle/input/customer-analytics/Train.csv")
ECommerce.head()

In [None]:
print('Shape of Dataset :', ECommerce.shape)

In [None]:
# Missing values
ECommerce.isnull().sum()

In [None]:
sns.countplot('Reached.on.Time_Y.N', data=ECommerce);

In [None]:
plt.figure(figsize=(9,7))
sns.heatmap(ECommerce.corr(), annot=True, square=True, fmt='.1f', cbar=False);

<h1 style="background-color:LimeGreen; font-family:newtimeroman; font-size:200%; text-align:center; border-radius: 15px 50px;"> 4) AutoViz (AutoEDA) </h1>

In [None]:
!pip install AutoViz
!pip install xlrd

In [None]:
# Start of AutoViz process
start_time = dt.datetime.now()
print("Started at ", start_time)

In [None]:
#importing Autoviz class
from autoviz.AutoViz_Class import AutoViz_Class

#Instantiate the AutoViz class
AV = AutoViz_Class()

--

<h2 style=color:blue align="left"> Generate the Profiling Report in three ways </h2>

<h3 style=color:green align="left"> Method 1: </h3>

##### dftc = AV.AutoViz('/kaggle/input/customer-analytics/Train.csv')

--------------------------------------------------------------

<h3 style=color:green align="left"> Method 2: </h3>

##### argument depVar which is the dependent variable "Reached.on.Time_Y.N"
##### argument dfte which is the dataset "ECommerce"

dftc = AV.AutoViz(filename='', 
                  sep ='' , 
                  depVar ='Reached.on.Time_Y.N', 
                  dfte = ECommerce, 
                  header = 0, 
                  verbose = 1, 
                  lowess = False, 
                  chart_format ='png', 
                  max_rows_analyzed = 15000, 
                  max_cols_analyzed = 30
)

---------------------------------------------------------

<h3 style=color:green align="center"> (or) </h3>


##### dftc = AV.AutoViz('/kaggle/input/customer-analytics/Train.csv', depVar='Reached.on.Time_Y.N')

--------------------------------------------------------

<h3 style=color:green align="left">  Method 3: Welcome to AutoViz </h3>

https://autoviz.io/

#### Get Started for free

#### 1. Enter a Separator or a Delimiter that separates columns in your file...
 - Typical delimiters to enter are (no quotes required): Comma = , Tab = \t Colon = : Semi-Colon = ; Pipe = | Tilda = ~ 
 
#### 2. Upload your file...

#### 3. Enter your Email address...

--

<h1 style="background-color:orange; font-family:newtimeroman; font-size:200%; text-align:left;"> Method 1: </h1>

 - Visualization for **all variables**

In [None]:
dftc = AV.AutoViz('/kaggle/input/customer-analytics/Train.csv')

--

In [None]:
print('AutoViz finished!!')
finish_time = dt.datetime.now()
print("Finished at ", finish_time)
elapsed = finish_time - start_time
print("Elapsed time: ", elapsed)

<h1 style="background-color:LimeGreen; font-family:newtimeroman; font-size:200%; text-align:left;"> Method 2: </h1>

#### Visualization for **independant Vs Dependant variable**

In [None]:
# Start of AutoViz process
start_time = dt.datetime.now()
print("Started at ", start_time)

In [None]:
# argument depVar which is the dependent variable "Reached.on.Time_Y.N"
# argument dfte which is the dataset "ECommerce"

dftc = AV.AutoViz(filename='', 
                  sep ='' , 
                  depVar ='Reached.on.Time_Y.N', 
                  dfte = ECommerce, 
                  header = 0, 
                  verbose = 1,  # print extra information on the notebook and also display charts
                  lowess = False, 
                  chart_format ='png', 
                  max_rows_analyzed = 15000, 
                  max_cols_analyzed = 30
)

In [None]:
print('AutoViz finished!!')
finish_time = dt.datetime.now()
print("Finished at ", finish_time)
elapsed = finish_time - start_time
print("Elapsed time: ", elapsed)

<h1 style="background-color:LimeGreen; font-family:newtimeroman; font-size:200%; text-align:center; border-radius: 15px 50px;"> If tou like the kernal... Don't forget to comment!!!!!!!!! </h1>