                                   Pandas Profiling

Pandas Profiling is a powerful tool for generating detailed reports on datasets, making it easier to understand data characteristics and quality. It automates the process of Exploratory Data Analysis (EDA) and provides insights into the data structure, missing values, correlations, and more.


📖 Explanation and Key Components
The profiling report includes the following sections:
✅ 1. Overview / Dataset Statistics
Shows the total number of variables (columns), observations (rows), missing values, duplicate entries, and memory usage.
Helps understand dataset size and potential quality issues.

✅ 2. Variable-Level Analysis
For each column in the dataset:
Numeric variables: mean, std, min/max, skewness, histograms
Categorical variables: count of unique values, mode frequency, bar charts
Date/time variables: range, frequency patterns

✅ 3. Missing Values Visualization
Heatmaps and matrices showing where missing data exists.
Helps in planning data cleaning or imputation strategies.

✅ 4. Correlation Analysis
Computes correlations between numeric features using:
Pearson
Spearman
Kendall
Visualizes correlation matrix and detects multicollinearity (e.g., features that are too similar).

✅ 5. Alerts & Warnings
Automatically flags:
Constant columns (no variability)
High cardinality features
High correlation pairs
Columns with missing values or skewed distributions. 
These help the analyst quickly identify problems in the data.

✅ 6. Interactive HTML Report
Outputs a professional-grade report in HTML format (report.html)
Fully interactive — easy to explore distributions, statistics, and charts
Useful for documentation, team collaboration, and client presentations




                    why it is important in a project

Speeds up the EDA phase of any data science or machine learning project.
Identifies data quality issues early (missing values, outliers, duplicates).
Helps make decisions about feature selection, data cleaning, and modeling strategies.
Saves hours of manual work and improves reproducibility.

In [None]:
pip install pandas

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('c:\\Users\\MMP\\Desktop\\data_science\\data_sets\\Titanic-Dataset.csv')

In [None]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:
pip install pandas-profiling


Collecting pandas-profiling
  Using cached pandas_profiling-3.2.0-py2.py3-none-any.whl.metadata (21 kB)
Collecting joblib~=1.1.0 (from pandas-profiling)
  Using cached joblib-1.1.1-py2.py3-none-any.whl.metadata (5.2 kB)
Collecting scipy>=1.4.1 (from pandas-profiling)
  Downloading scipy-1.16.0-cp312-cp312-win_amd64.whl.metadata (60 kB)
Collecting matplotlib>=3.2.0 (from pandas-profiling)
  Downloading matplotlib-3.10.3-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting pydantic>=1.8.1 (from pandas-profiling)
  Downloading pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)
Collecting PyYAML>=5.0.0 (from pandas-profiling)
  Downloading PyYAML-6.0.2-cp312-cp312-win_amd64.whl.metadata (2.1 kB)
Collecting jinja2>=2.11.1 (from pandas-profiling)
  Downloading jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting markupsafe~=2.1.1 (from pandas-profiling)
  Downloading MarkupSafe-2.1.5-cp312-cp312-win_amd64.whl.metadata (3.1 kB)
Collecting visions==0.7.4 (from visions[type_image_path]==0


[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
pip install "pydantic<2"


Collecting pydantic<2
  Downloading pydantic-1.10.22-cp312-cp312-win_amd64.whl.metadata (155 kB)
Downloading pydantic-1.10.22-cp312-cp312-win_amd64.whl (2.2 MB)
   ---------------------------------------- 0.0/2.2 MB ? eta -:--:--
   ---- ----------------------------------- 0.3/2.2 MB ? eta -:--:--
   -------------- ------------------------- 0.8/2.2 MB 2.6 MB/s eta 0:00:01
   ----------------------- ---------------- 1.3/2.2 MB 2.8 MB/s eta 0:00:01
   --------------------------------- ------ 1.8/2.2 MB 2.9 MB/s eta 0:00:01
   ---------------------------------------- 2.2/2.2 MB 2.5 MB/s eta 0:00:00
Installing collected packages: pydantic
  Attempting uninstall: pydantic
    Found existing installation: pydantic 2.11.7
    Uninstalling pydantic-2.11.7:
      Successfully uninstalled pydantic-2.11.7
Successfully installed pydantic-1.10.22
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import sys
!{sys.executable} -m pip install ydata-profiling "pydantic<2"


Collecting ydata-profiling
  Downloading ydata_profiling-4.16.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting scipy<1.16,>=1.4.1 (from ydata-profiling)
  Downloading scipy-1.15.3-cp312-cp312-win_amd64.whl.metadata (60 kB)
Collecting matplotlib<=3.10,>=3.5 (from ydata-profiling)
  Downloading matplotlib-3.10.0-cp312-cp312-win_amd64.whl.metadata (11 kB)
INFO: pip is looking at multiple versions of ydata-profiling to determine which version is compatible with other requirements. This could take a while.
Collecting ydata-profiling
  Downloading ydata_profiling-4.16.0-py2.py3-none-any.whl.metadata (22 kB)
  Downloading ydata_profiling-4.15.1-py2.py3-none-any.whl.metadata (22 kB)
  Downloading ydata_profiling-4.15.0-py2.py3-none-any.whl.metadata (22 kB)
  Downloading ydata_profiling-4.14.0-py2.py3-none-any.whl.metadata (22 kB)
  Downloading ydata_profiling-4.13.0-py2.py3-none-any.whl.metadata (22 kB)
Collecting scipy<1.14,>=1.4.1 (from ydata-profiling)
  Downloading scipy-1.13.1-cp312-cp3

ERROR: Cannot install pydantic<2, ydata-profiling==4.10.0, ydata-profiling==4.11.0, ydata-profiling==4.12.0, ydata-profiling==4.12.1, ydata-profiling==4.12.2, ydata-profiling==4.13.0, ydata-profiling==4.14.0, ydata-profiling==4.15.0, ydata-profiling==4.15.1, ydata-profiling==4.16.0, ydata-profiling==4.16.1, ydata-profiling==4.7.0, ydata-profiling==4.8.3 and ydata-profiling==4.9.0 because these package versions have conflicting dependencies.

[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts


In [None]:
from ydata_profiling import ProfileReport


In [2]:
import sys
print(sys.executable)


c:\Users\MMP\AppData\Local\Programs\Python\Python312\python.exe


In [2]:
from ydata_profiling import ProfileReport
import pandas as pd

df = pd.read_csv('c:\\Users\\MMP\\Desktop\\data_science\\data_sets\\Titanic-Dataset.csv')
profile = ProfileReport(df)
profile.to_file("report.html")


Summarize dataset: 100%|██████████| 47/47 [00:07<00:00,  6.42it/s, Completed]                       
Generate report structure: 100%|██████████| 1/1 [00:06<00:00,  6.41s/it]
Render HTML: 100%|██████████| 1/1 [00:02<00:00,  2.63s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 86.82it/s]


In [3]:
import os
print(os.path.abspath("report.html"))


c:\Users\MMP\Desktop\data_science\Pandas_profilling_report\report.html


In [4]:
from pandas_profiling import ProfileReport
prof = ProfileReport(df)
prof.to_file('titanic_report.html')

  from pandas_profiling import ProfileReport
Summarize dataset: 100%|██████████| 47/47 [00:05<00:00,  8.77it/s, Completed]                       
Generate report structure: 100%|██████████| 1/1 [00:06<00:00,  6.63s/it]
Render HTML: 100%|██████████| 1/1 [00:01<00:00,  1.21s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 99.03it/s]
