## Intelligent Complaint Analysis

This system is designed to analyze customer complaints across the following financial product categories:

- **Credit Card**: Includes issues related to billing disputes, unauthorized transactions, APR confusion, and credit limit concerns.
- **Personal Loan**: Covers complaints such as hidden fees, unclear loan terms, payment difficulties, and debt collection practices.
- **Buy Now, Pay Later (BNPL)**: Focuses on complaints involving payment delays, refund issues, and deceptive interest disclosures.
- **Savings Account**: Analyzes concerns related to account freezes, unrecognized withdrawals, interest rate changes, and minimum balance requirements.
- **Money Transfer**: Investigates delays, transaction failures, incorrect account deposits, and excessive service charges.


In [None]:
import pandas as pd 
import numpy as np
import matplotlib.pylab as plt 
import seaborn as sns

: 

In [None]:
from pathlib import Path

# Get project root relative to the current script location
BASE_DIR = Path.cwd()
data_path = BASE_DIR / "data" / "raw" / "complaints.csv"

df = pd.read_csv(data_path)

In [None]:
df.head()

In [None]:
df.describe()
# df.info()

In [None]:
# NaN count and percentile of the whole data

nan_info = pd.DataFrame({
    "missing_count": df.isna().sum(),
    "missing_percent": df.isna().mean() * 100
})

In [None]:
percentiles = df.describe(percentiles=[.25, .5, .75, .9, .95]).T
print("NaN count and Percent: \n", nan_info)
print("\nPercentiles: \n", percentiles)

### **Summary**
NaN count and Missing percent

Columns with 0% Missing:
- Date received
- Product
- Company
- Submitted via
- Date sent to company
- Timely response?
- Complaint ID

Columns with Low Missing ( < 1% )
- Sub-product: 2.44%
- Issue: 0.00006%
- State: 0.56%
- ZIP code: 0.31%
- Company response to consumer: 0.0002%

Moderate Missing (10–50%)
- Consumer consent provided?: 17.16%
- Company public response: 49.64

Very High Missing ( > 50%)
- Consumer complaint narrative: 68.98%
- Tags: 93.45%
- Consumer disputed?: 92.00%

In [None]:
# Visualize missing Values
plt.figure(figsize=(8,8)) 
sns.heatmap(df.isnull(), cbar=False, cmap="viridis")
plt.title("Missing values heatmap")
plt.show()