# Exploratory Data Analysis (EDA) Template

## Objective
To systematically analyze datasets and extract meaningful insights before modeling.

## Steps Covered
1. Imports
2. Data Loading
3. Data Overview
4. Missing Value Analysis
5. Univariate Analysis
6. Bivariate Analysis
7. Feature Engineering Ideas
8. Business Insights Summary

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("ggplot")
%matplotlib inline

## 1. Data Loading

In [15]:
# Example
# df = pd.read_csv("data.csv")

## 2. Data Overview

In [None]:
df.shape
df.head()
df.info()
df.describe(include='all')

## 3. Missing Value Analysis

Goal: Identify missing data patterns and determine appropriate handling strategies.

In [None]:
# Count missing values
missing_counts = df.isnull().sum().sort_values(ascending=False)

# Percentage missing
missing_percent = (df.isnull().sum() / len(df)) * 100

missing_summary = pd.DataFrame({
    "Missing Count": missing_counts,
    "Missing %": missing_percent
}).sort_values(by="Missing %", ascending=False)

missing_summary

In [None]:
# Visualize missing values
sns.heatmap(df.isnull(), cbar=False)
plt.title("Missing Values Heatmap")
plt.show()

## 4. Univariate Analysis

Analyze numerical and categorical features separately.

In [None]:
numerical_cols = df.select_dtypes(include=np.number).columns

df[numerical_cols].hist(figsize=(12, 8))
plt.tight_layout()
plt.show()

In [None]:
categorical_cols = df.select_dtypes(exclude=np.number).columns

for col in categorical_cols:
    print(f"\nValue counts for {col}")
    print(df[col].value_counts())

## 5. Bivariate Analysis

Examine relationships between features and target variable.

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()

In [None]:
# Example: numeric feature vs target
sns.boxplot(x="target", y="feature_name", data=df)
plt.show()

## 6. Feature Engineering Ideas

- Consider log transformation for skewed variables.
- Create interaction terms.
- Generate ratios where relevant.
- Extract date/time components if applicable.
- Encode categorical variables appropriately.

## 7. Business Insights Summary

Key findings:

1. 
2. 
3. 

Recommendations:

- 
- 
- 