# Ydata-Profiling (previously Pandas-Profiling)
The ydata-profiling (previously pandas-profiling) library is a versatile tool for generating comprehensive exploratory data analysis (EDA) reports from a Pandas DataFrame. It automates and simplifies the process of understanding the structure, quality, and statistics of a dataset, providing detailed insights into the data characteristics.

### Key Features of Ydata-Profiling:

#### 1. **Automated Report Generation:**
   - **Descriptive Statistics:** Summary statistics such as mean, median, standard deviation, quartiles, etc.
   - **Variable Types:** Identification and classification of data types (numeric, categorical, etc.).
   - **Missing Values:** Analysis of missing data patterns.

#### 2. **Visualization:**
   - **Histograms:** Visualizing distribution of numerical data.
   - **Bar Charts:** Representing categorical variables.
   - **Correlation Matrices:** Displaying correlations between variables.
   - **Scatterplots:** Visualizing relationships between pairs of variables.

#### 3. **Interactivity:**
   - **Interactive HTML Report:** Easily accessible and interactive report generated as an HTML file.
   - **Customization:** Configurable settings for report generation and visualization.

#### 4. **Data Quality Insights:**
   - **Duplicate Rows:** Identification of duplicated records.
   - **Cardinality:** Analysis of unique values in categorical variables.

### How to Use Pandas Profiling:

#### Installation:
You can install the Pandas Profiling library via pip:
```bash
pip install ydata-profiling
```

#### Generating Reports:
```python
import pandas as pd
from ydata_profiling import ProfileReport

# Load your dataset as a Pandas DataFrame
data = pd.read_csv('your_dataset.csv')

# Generate the report
profile = ProfileReport(data, title="Your report title")
profile.to_file("your_report.html")
```

This code snippet loads a dataset, generates a Pandas Profiling report, and saves it as an HTML file (`your_report.html`). You can then open this HTML file in a web browser to interactively explore the EDA report.

### Usefulness:
- **Exploratory Data Analysis:** Provides a quick overview of the dataset.
- **Data Cleaning:** Identifies missing values, duplicates, and outliers.
- **Insights Generation:** Helps in understanding data distribution and relationships.

Ydata-Profiling serves as a powerful and convenient tool for data scientists and analysts to perform a thorough initial analysis of a dataset, enabling informed decisions about data cleaning, feature selection, and further analysis. Adjust settings and explore the generated report to suit specific analysis needs.

In [1]:
%pip install ydata-profiling

Collecting ydata-profiling
  Using cached ydata_profiling-4.6.3-py2.py3-none-any.whl.metadata (20 kB)
Collecting visions==0.7.5 (from visions[type_image_path]==0.7.5->ydata-profiling)
  Using cached visions-0.7.5-py3-none-any.whl (102 kB)
Collecting numpy<1.26,>=1.16.0 (from ydata-profiling)
  Using cached numpy-1.25.2-cp311-cp311-win_amd64.whl.metadata (5.7 kB)
Collecting seaborn<0.13,>=0.10.1 (from ydata-profiling)
  Using cached seaborn-0.12.2-py3-none-any.whl (293 kB)
Collecting statsmodels<1,>=0.13.2 (from ydata-profiling)
  Using cached statsmodels-0.14.1-cp311-cp311-win_amd64.whl.metadata (9.8 kB)
Collecting wordcloud>=1.9.1 (from ydata-profiling)
  Using cached wordcloud-1.9.3-cp311-cp311-win_amd64.whl.metadata (3.5 kB)
Collecting dacite>=1.8 (from ydata-profiling)
  Using cached dacite-1.8.1-py3-none-any.whl.metadata (15 kB)
Collecting numba<0.59.0,>=0.56.0 (from ydata-profiling)
  Using cached numba-0.58.1-cp311-cp311-win_amd64.whl.metadata (2.8 kB)
Collecting llvmlite<0.42,>

In [2]:
import seaborn as sns

In [3]:
titanic = sns.load_dataset('titanic')

In [4]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [5]:
titanic = titanic.iloc[:,:8]
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked
0,0,3,male,22.0,1,0,7.25,S
1,1,1,female,38.0,1,0,71.2833,C
2,1,3,female,26.0,0,0,7.925,S
3,1,1,female,35.0,1,0,53.1,S
4,0,3,male,35.0,0,0,8.05,S


In [6]:
from ydata_profiling import ProfileReport
prof = ProfileReport(titanic, title='Titanic EDA')
prof.to_file(output_file='output.html')

  from .autonotebook import tqdm as notebook_tqdm
  is_valid_dtype = pdt.is_categorical_dtype(series) and not pdt.is_bool_dtype(
  is_valid_dtype = pdt.is_categorical_dtype(series) and not pdt.is_bool_dtype(
  not pdt.is_categorical_dtype(series)
  not pdt.is_categorical_dtype(series)
  if pdt.is_categorical_dtype(series):
  if pdt.is_categorical_dtype(series):
  if pdt.is_categorical_dtype(series):
  if pdt.is_categorical_dtype(series):
  is_valid_dtype = pdt.is_categorical_dtype(series) and not pdt.is_bool_dtype(
  not pdt.is_categorical_dtype(series)
  is_valid_dtype = pdt.is_categorical_dtype(series) and not pdt.is_bool_dtype(
  not pdt.is_categorical_dtype(series)
  if pdt.is_categorical_dtype(series):
  if pdt.is_categorical_dtype(series):
  if pdt.is_categorical_dtype(series):
  if pdt.is_categorical_dtype(series):
(using `df.profile_report(correlations={"auto": {"calculate": False}})`
If this is problematic for your use case, please report this as an issue:
https://github.com/y