# Auto EDA
The decision to use auto EDA depends on:
- specific needs of the project
- size and complexity of the dataset
- expertise of the user

**Auto EDA** tools offer a quick and efficient way to get a broad overview of the data.
<br> **Manual EDA** allows for more nuanced and detailed analysis.

In [None]:
#import these packages to start
import numpy as np
import pandas as pd

In [None]:
#import your data from XLSX or CSV
data = pd.read_excel('your_dataset.xlsx') #xlsx
data = pd.read_csv('your_dataset.csv') #csv

#skip to DataPrep section for how to install a test dataset

### YData Profiling (Formerly Pandas Profiling)
Pros:
- Comprehensive EDA report with just a single line of code.
- Provides correlations, missing value statistics, and a lot more.
- Interactive and visually pleasing output.

Cons:
- Can be very slow for large datasets.
- Might produce extremely large reports for wide datasets.

Documentation: https://pypi.org/project/ydata-profiling/

In [None]:
pip install ydata-profiling

In [None]:
from ydata_profiling import ProfileReport

profile = ProfileReport(data, title="Profiling Report")
profile.to_notebook_iframe() #display within notebook
profile.to_file("your_report.html") #send to html file

### Sweetviz
Pros:
- Generates visually appealing reports.
- Can compare two datasets (e.g., training vs testing).
- Handles large datasets better than YData Profiling.

Cons:
- Less detailed than YData Profiling.

Documentation: https://pypi.org/project/sweetviz/

In [None]:
pip install sweetviz

In [None]:
import sweetviz as sv

report = sv.analyze(data)
report.show_html('report.html') #will display in a seperate browser tab

### DataPrep
Pros:
- Fast for large datasets.
- Interactive visualizations.
- Provides data cleaning and collection modules along with EDA.

Cons:
- Task-centric design may not offer the flexibility needed for all EDA tasks.
- The ease of use might come at the cost of detailed customization options.

Documentation: https://pypi.org/project/dataprep/

In [None]:
pip install -U dataprep

In [None]:
from dataprep.datasets import load_dataset #dataprep has a dataset repository you can test with
from dataprep.eda import create_report

data = load_dataset("titanic")

create_report(data).show_browser()

### D-Tale
Pros:
- Offers interactive UI with many functionalities (e.g., filtering, sorting, correlations).
- Allows for data transformations.

Cons:
- Web-based UI might not be everyone's preference.
- May require a steeper learning curve due to its powerful and versatile nature.

Documentation: https://pypi.org/project/dtale/

In [None]:
pip install dtale

In [None]:
import dtale

d = dtale.show(data)
d.open_browser() #will open in browser tab

### AutoViz
Pros:
- Capable of handling datasets of any size with just a few lines of code.
- Automatically visualizes data, making it accessible and user-friendly.

Cons:
- Non-interactive UI (unless you modify code, see documentation)
- Geared toward more experienced Python users that are looking for an ability to customize or explore further.

Documentation: https://pypi.org/project/autoviz/

In [None]:
pip install autoviz

In [None]:
from autoviz import AutoViz_Class
AV = AutoViz_Class()

%matplotlib inline 
#AutoViz does not displays plots automatically. 
#You must perform %matplotlib inline just before you run AutoViz on your data.

# Generate visualizations
AV.AutoViz('synthetic_healthcare_dataset.csv') #loading a filename
#dft = AV.AutoViz('', dfte = data) #loading a dataframe

In [None]:
#data cleaning suggestions
from autoviz import data_cleaning_suggestions
data_cleaning_suggestions(data)