# 03 Data Visualization

### Objective

This notebook uses the cleaned dataset to explore patterns, trends and relationships through visualizations. The goal is to transform cleaned data into clear, meaningful insights.

### Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("seaborn-v0_8")
%matplotlib inline

### Load Cleaned Data

In [None]:
df = pd.read_csv("../data/processed/clean_dataset.csv")

df.head()

Confirmation: Data loaded from the processed folder to ensure separation from raw data.

### Overview of the Dataset

In [None]:
df.info()

Observation: All columns now have correct data types andminimal missing values, making the dataset suitable for analysis.

### Distribution of Transaction Amounts

In [None]:
plt.figure()
sns.histplot(df['amount'], kde=True)
plt.title("Distribution of Transaction Amounts")
plt.xlabel("Amount")
plt.ylabel("Frequency")
plt.show()

Insight: The distribution is right-skewed, indicating the most transactions are small with a few high-value transactions.

### Transaction Amount by Category

In [None]:
plt.figure()
sns.boxplot(x='category', y='amount', data=df)
plt.xticks(rotation=45)
plt.title("Transaction Amount by Category")
plt.show()

### Transactions Over Time

In [None]:
daily_totals = df.groupby('date')['amount'].sum().reset_index()

plt.figure()
plt.plot(daily_totals['date'], daily_totals['amount'])
plt.title("Total Transaction Amount Over Time")
plt.xlabel("Date")
plt.ylabel("Total Amount")
plt.show()

Insight:
Transaction totals fluctuate over time, with visible peaks that may correspond to specific events or periods.

### Category Contribution to Total Amount

In [None]:
category_totals = df.groupby('category')['amount'].sum().sort_values(ascending=False)


plt.figure()
category_totals.plot(kind='bar')
plt.title("Total Amount by Category")
plt.xlabel("Category")
plt.ylabel("Total Amount")
plt.show()

Insight:
A small number of categories contribute disproportionately to the total transaction amount.

### Relationship Between Frequency and Value

In [None]:
transaction_counts = df['category'].value_counts()


summary = pd.DataFrame({
'count': transaction_counts,
'total_amount': category_totals
}).dropna()


plt.figure()
sns.scatterplot(x='count', y='total_amount', data=summary)
plt.title("Transaction Frequency vs Total Amount by Category")
plt.xlabel("Number of Transactions")
plt.ylabel("Total Amount")
plt.show()

Insight:
High transaction frequency does not always correspond to higher total value, indicating different category dynamics.

### Key Findings Summary

- Most transactions are low-value, with a few significant outliers
- Spending patterns vary noticeably by category
- A small subset of categories drives most of the total value
- Temporal trends suggest periods of increased activity

### Business / Analytical Implications

These insights could inform:
- Resource allocation toward high-value categories
- Further investigation into peak transaction periods
- Targeted strategies based on category behavior

### Conclusion

This notebook demonstrates how cleaned data can be transformed into actionable insights through effective visualization. The project successfully moves from raw data to meaningful conclusions.