# Features of Pandas Profiling

1. **Overview of Variables**:
   - Provides a detailed summary of each variable in the dataset.
   - Includes counts, mean, median, mode, minimum, maximum, standard deviation, and more.
   - Visualizes the distribution of numerical and categorical variables.

2. **Missing Values Analysis**:
   - Identifies missing values and calculates their percentage.
   - Visualizes missing value patterns with heatmaps and bar charts.

3. **Correlations**:
   - Computes and displays correlation coefficients such as Pearson, Spearman, and Kendall.
   - Highlights highly correlated features for easier identification of redundant variables.

4. **Data Quality Assessment**:
   - Detects potential issues like:
     - Constant values.
     - High cardinality variables (too many unique values in categorical columns).
     - Duplicate rows.
   - Flags potential data cleaning requirements.

5. **Interactive Visualizations**:
   - Offers visualizations such as:
     - Histograms and KDE plots for numerical distributions.
     - Box plots to identify outliers.
     - Pie charts for categorical data representation.
     - Heatmaps for correlations.

6. **Customizable Reports**:
   - Generates detailed, interactive reports in HTML or JSON format.
   - Easily shared or embedded in data pipelines.

7. **Descriptive Warnings**:
   - Highlights:
     - Columns with zero variance.
     - Skewed distributions.
     - Sparse data in columns.
   - Provides insights on potential preprocessing needs.

8. **Explorative Mode**:
   - Enables more flexible reporting with additional insights for exploratory data analysis (EDA).

9. **Support for Large Datasets**:
   - Optimized for handling large datasets with configurations to reduce computation time.

10. **Integration with Pandas**:
    - Works seamlessly with pandas DataFrames, making it a convenient tool for data scientists and analysts.


In [2]:
!python -m pip install --user pandas

Collecting pandas
  Using cached pandas-2.0.3-cp38-cp38-win_amd64.whl.metadata (18 kB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.1 (from pandas)
  Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting numpy>=1.20.3 (from pandas)
  Using cached numpy-1.24.4-cp38-cp38-win_amd64.whl.metadata (5.6 kB)
Using cached pandas-2.0.3-cp38-cp38-win_amd64.whl (10.8 MB)
Using cached numpy-1.24.4-cp38-cp38-win_amd64.whl (14.9 MB)
Using cached pytz-2025.2-py2.py3-none-any.whl (509 kB)
Using cached tzdata-2025.2-py2.py3-none-any.whl (347 kB)
Installing collected packages: pytz, tzdata, numpy, pandas
Successfully installed numpy-1.24.4 pandas-2.0.3 pytz-2025.2 tzdata-2025.2




In [4]:
!python -m pip install --user ydata_profiling

Collecting ydata_profiling
  Using cached ydata_profiling-4.16.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting scipy<1.16,>=1.4.1 (from ydata_profiling)
  Using cached scipy-1.10.1-cp38-cp38-win_amd64.whl.metadata (58 kB)
Collecting matplotlib<=3.10,>=3.5 (from ydata_profiling)
  Using cached matplotlib-3.7.5-cp38-cp38-win_amd64.whl.metadata (5.8 kB)
Collecting pydantic>=2 (from ydata_profiling)
  Using cached pydantic-2.10.6-py3-none-any.whl.metadata (30 kB)
Collecting PyYAML<6.1,>=5.0.0 (from ydata_profiling)
  Using cached PyYAML-6.0.2-cp38-cp38-win_amd64.whl.metadata (2.1 kB)
Collecting jinja2<3.2,>=2.11.1 (from ydata_profiling)
  Using cached jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting visions<0.8.2,>=0.7.5 (from visions[type_image_path]<0.8.2,>=0.7.5->ydata_profiling)
  Using cached visions-0.8.1-py3-none-any.whl.metadata (11 kB)
Collecting htmlmin==0.1.12 (from ydata_profiling)
  Using cached htmlmin-0.1.12-py3-none-any.whl
Collecting phik<0.13,>=0.11.1 (from y



In [7]:
import pandas as pd
#from pandas_profiling import ProfileReport
from ydata_profiling import ProfileReport

  from .autonotebook import tqdm as notebook_tqdm
Matplotlib is building the font cache; this may take a moment.


In [5]:
url = 'https://raw.githubusercontent.com/rashakil-ds/Public-Datasets/refs/heads/main/credit.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_1,PAY_2,PAY_3,PAY_4,PAY_5,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default
0,20000,2,2,1,24,2,2,-1,-1,-2,...,0.0,0.0,0.0,0.0,689.0,0.0,0.0,0.0,0.0,1
1,90000,2,2,2,34,0,0,0,0,0,...,14331.0,14948.0,15549.0,1518.0,1500.0,1000.0,1000.0,1000.0,5000.0,0
2,50000,2,2,1,37,0,0,0,0,0,...,28314.0,28959.0,29547.0,2000.0,2019.0,1200.0,1100.0,1069.0,1000.0,0
3,50000,1,2,1,57,-1,0,-1,0,0,...,20940.0,19146.0,19131.0,2000.0,36681.0,10000.0,9000.0,689.0,679.0,0
4,50000,1,1,2,37,0,0,0,0,0,...,19394.0,19619.0,20024.0,2500.0,1815.0,657.0,1000.0,1000.0,800.0,0


In [6]:
df.shape

(24000, 24)

# Generate a profiling report

In [14]:
# from IPython.display import display
# from ydata_profiling.report.presentation.flavours.widget.notebook import (
#      get_notebook_iframe,
# )

In [20]:
profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True)
# profile
# print(profile)
# Jupyter Notebook / VS Code Notebook এ দেখতে চাইলে
profile.to_notebook_iframe()

# আর যদি শুধু HTML ফাইল হিসেবে নিতে চান
profile.to_file("report.html")
print("✅ report.html ফাইল তৈরি হয়ে গেছে")

ModuleNotFoundError: No module named 'ipywidgets'

In [11]:
import pandas as pd
from ydata_profiling import ProfileReport

# df = pd.read_csv("your_file.csv")
profile = ProfileReport(df)
# profile.to_notebook_iframe()
profile.to_file("report.html")   # Jupyter Notebook এর ভেতরে দেখাবে

100%|██████████| 24/24 [00:00<00:00, 39.71it/s]0<00:00, 33.40it/s, Describe variable: default]  
Summarize dataset: 100%|██████████| 474/474 [01:13<00:00,  6.43it/s, Completed]                   
Generate report structure: 100%|██████████| 1/1 [00:11<00:00, 11.17s/it]
Render HTML: 100%|██████████| 1/1 [00:15<00:00, 15.91s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00,  1.70it/s]
