# DataPrep
**DataPrep.eda using Dask works with larger than memory datasets. Dask supports out-of-core and parallel processing so computations on very large datasets can be evaluated efficiently.**


[Dataset](https://docs.dataprep.ai/user_guide/datasets/introduction.html#Datasets) 

> DataPrep provides a collections of datasets. You could easily load them using one line of code and explore the functionalities of dataprep on them.

[Load Dataset](https://docs.dataprep.ai/user_guide/datasets/introduction.html#Load-Dataset) 

> After you know the available dataset names from get_dataset_names. Next you could load the dataset by calling load_dataset.

[Analyze Dataset](https://docs.dataprep.ai/user_guide/datasets/introduction.html#Analyze-Dataset)

> After you get the dataset, you could try to use dataprep to explore the dataset. For example, you may want to create a profiling report of the dataset using dataprep.eda.

[EDA (Exploratory Data Analysis)](https://docs.dataprep.ai/user_guide/eda/introduction.html#EDA)

> Section Contents
- plot(): analyze distributions
- plot_correlation(): analyze correlations
- plot_missing(): analyze missing values
- create_report(): create a profile report
- How-to guide: customize your output
- Parameter configurations: parameter summarysettings
- Case study: Titanic
- Case study: House Prices

[Clean ](https://docs.dataprep.ai/user_guide/clean/introduction.html#Clean)

DataPrep.Clean provides functions for quickly and easily cleaning and validating your data.

> Section Contents
- Column Headers
- Country Names
- Email Addresses
- Geographic Goordinates
- IP Addresses
- Phone Numbers
- URLs
- US Street Addresses

# Problems
**error: Microsoft Visual C++ 14.0 or greater is required**. Get it with "Microsoft C++ Build Tools": [Download](https://visualstudio.microsoft.com/visual-cpp-build-tools/) It require about 30 mb of Data

[Microsoft Blog to install Microsoft C++ Build Tools](https://docs.microsoft.com/en-us/answers/questions/136595/error-microsoft-visual-c-140-or-greater-is-require.html) It requre about 1.59 GB data.

In [None]:
!pip install -U dataprep

[DataPrep Website](https://dataprep.ai/)

[Installation Detail on pupi.org](https://pypi.org/project/dataprep/)

In [None]:
# EDA (Exploratory Data Analysis)
# Create Profile Reports, Fast
from dataprep.datasets import load_dataset
from dataprep.eda import create_report
import warnings
warnings.filterwarnings('ignore')
df = load_dataset('titanic')
create_report(df)

In [None]:
create_report(df).show_browser()

In [None]:
# https://analyticsindiamag.com/tutorial-for-dataprep-a-python-library-to-prepare-your-data-before-training/
# importing the plot function which is used to visualize 
# the statistical plots and properties of the dataset

import plotly.express as px
from dataprep.eda import plot

[Get more details about **Tips Dataset** ](https://www.kaggle.com/sanjanabasu/tips-dataset/data)

In [None]:
# Loading the Dataset
# The dataset contains certain attributes related to 
# hotel bills and tips.
df = px.data.tips() 
df

In [None]:
# Exploratory Data Analysis
# We will start with statistical data exploration and analysis. 
# The plot function is used for preparing this statistical report. This single line of code will create the whole statistical analysis. 
plot(df)

In [None]:
from dataprep.eda.missing import plot_missing
plot_missing(df)

In [None]:
from dataprep.eda import plot_correlation
plot_correlation(df)