# Task 01 — Data Visualization (Histogram + Bar Chart)

**Goal:** Visualize the distribution of a numeric feature and communicate insights clearly.

This notebook uses a small CSV dataset (`data/ages_sample.csv`) so it works offline.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

DATA_PATH = "../../data/ages_sample.csv"
df = pd.read_csv(DATA_PATH)

df.head()

In [None]:
df.describe()

## 1) Histogram — Age distribution

In [None]:
plt.figure(figsize=(8,4))
plt.hist(df["age"], bins=10)
plt.title("Age Distribution (sample)")
plt.xlabel("Age")
plt.ylabel("Count")
plt.tight_layout()
plt.show()

## 2) Bar chart — Age groups

Sometimes grouping makes the story clearer than raw numbers.


In [None]:
bins = [15,20,25,30,35,40,45,50,55,60,65]
labels = ["16-20","21-25","26-30","31-35","36-40","41-45","46-50","51-55","56-60","61-65"]
df["age_group"] = pd.cut(df["age"], bins=bins, labels=labels, right=True, include_lowest=True)

age_counts = df["age_group"].value_counts().sort_index()
age_counts

In [None]:
plt.figure(figsize=(9,4))
plt.bar(age_counts.index.astype(str), age_counts.values)
plt.title("Age Groups (sample)")
plt.xlabel("Age group")
plt.ylabel("Count")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()

## Insights (example)

- Majority of records fall in the **20–40** range.
- Very few records are above **55**, so any model/analysis should be careful about small-sample noise there.
