# Pandas Profiling

- Pandas Profiling is a tool that generates a comprehensive report for a DataFrame, including descriptive statistics, visualizations, and insights.
- It helps in understanding the data better by providing a detailed summary of each feature.

In [None]:
#!pip install pydantic-settings

In [None]:
# Upgrade necessary packages
#!pip install --upgrade typing-extensions
!pip install ydata-profiling
#!pip install --upgrade pydantic

In [None]:
import pandas as pd
import seaborn as sns
from ydata_profiling import ProfileReport

In [None]:
# Load the Titanic dataset
titanic = sns.load_dataset('titanic')

In [None]:
sns.get_dataset_names()

### Why Pandas Profiling?

- Pandas Profiling helps to quickly understand the structure, quality, and distribution of the data.
- It saves time by automating the data exploration process, allowing you to focus on the analysis and modeling steps.

#### Pandas Profiling in Action

- Generate a Pandas Profiling report for the Titanic dataset

In [None]:
# Generate the profile report
profile1 = ProfileReport(titanic, title='Titanic Dataset Profile', explorative=True)

In [None]:
# Save the report as an HTML file
profile1.to_file("titanic_profile_report.html")
# The profile report can be viewed in a web browser for an interactive analysis.

In [None]:
# Generate the report in the notebook
profile1.to_notebook_iframe()

In [None]:
# Practice Questions:
# 1. Generate a Pandas Profiling report for the Diamonds dataset and save it as 'Diamonds_profile_report.html'.
# 2. Open the generated HTML report in a web browser and explore the summary provided.

### Pandas Profiling Overview

- The Pandas Profiling report includes various sections such as:
1. Overview: Summary of the dataset including the number of variables, observations, missing values, and memory usage.
2. Variables: Detailed analysis of each feature including type, unique values, missing values, and descriptive statistics.
3. Interactions: Visualizations showing relationships between pairs of variables.
4. Correlations: Correlation matrix and heatmaps to identify relationships between variables.
5. Missing Values: Analysis of missing data including a heatmap and bar charts.
6. Samples: A preview of a few rows from the dataset.

---
_**Your Dataness**_,  
`Obinna Oliseneku` (_**Hybraid**_)  
**[LinkedIn](https://www.linkedin.com/in/obinnao/)** | **[GitHub](https://github.com/hybraid6)**  