# Project: Data Cleaning and Visualization

### Objective
This notebook explores the raw dataset to understand its structure and identify data quality issues. At this stage, no cleaning or transformation are performed. The goal is to observe, document and plan.

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("seaborne-v0_8")
%matplotlib inline

### Load Raw Data

In [None]:
df = pd.read_csv("../data/raw/dataset.csv")
df.head()

Observation:
Each row represents a single record. The dataset appears to be transactional in nature.

### Dataset Size

In [None]:
df.shape

Observation:
The dataset contains several thousand records, making it suitable for exploratory analysis and visualization.

### Column Overview

In [None]:
df.columns

Observation:
Column names are understandable but some use inconsistent formatting, which may affect readability and analysis.

### Data Types and Missing Values

In [None]:
df.info()

Key Observations:
- Some numeric fields are stored as text
- Data fields are not ywt in datetime format
- Several columns contain missing values

### Summary Statistics

In [None]:
df.describe(include='all')

Observation:
Numeric columns show wide ranges, suggesting the possible presence of outliers

### Missing Values Check

In [None]:
df.isnull().sum()

Observation:
Certain columns have a high proportion of missing values and will require careful handling during cleaning