<a href="https://colab.research.google.com/github/iotpelican/ml-llm-course-work/blob/main/ml_w02_class_excercise_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Part 1: Age Data Binning**

Generate random age data for 100 samples and perform binning to categorize
ages into different groups (e.g., 0-18, 19-35, 36-50, 51+).

In [1]:
import numpy as np
import pandas as pd

# 1. Generate random age data for 100 samples
np.random.seed(42) # for reproducibility
ages = np.random.randint(0, 91, 100) # Ages from 0 to 90

# Create a DataFrame for the ages
age_df = pd.DataFrame({'Age': ages})

# 2. Define the age bins and labels
bins = [0, 18, 35, 50, np.inf] # np.inf for the upper bound of the last bin
labels = ['0-18', '19-35', '36-50', '51+']

# 3. Perform binning to categorize ages
age_df['Age_Category'] = pd.cut(age_df['Age'], bins=bins, labels=labels, right=True, include_lowest=True)

# 4. Display the distribution of ages within these bins
print("Distribution of Ages by Category:\n")
print(age_df['Age_Category'].value_counts().sort_index().to_markdown(numalign="left", stralign="left"))

# Display the first few rows of the DataFrame with raw and binned ages
print("\nFirst 10 rows of the generated age data with categories:\n")
print(age_df.head(10).to_markdown(index=False, numalign="left", stralign="left"))

Distribution of Ages by Category:

| Age_Category   | count   |
|:---------------|:--------|
| 0-18           | 21      |
| 19-35          | 14      |
| 36-50          | 15      |
| 51+            | 50      |

First 10 rows of the generated age data with categories:

| Age   | Age_Category   |
|:------|:---------------|
| 51    | 51+            |
| 14    | 0-18           |
| 71    | 51+            |
| 60    | 51+            |
| 20    | 19-35          |
| 82    | 51+            |
| 86    | 51+            |
| 74    | 51+            |
| 74    | 51+            |
| 87    | 51+            |


**Part 2: Generate Random Data and Create Plots**

Create a dataset with 100 random samples and generate four different plots:
scatter plot, pie chart, histogram, and box plot.

Tools: numpy, pandas, matplotlib

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Create a dataset with 100 random samples
np.random.seed(42) # for reproducibility

data = {
    'Feature_X': np.random.randint(0, 101, 100),
    'Feature_Y': np.random.randint(0, 101, 100),
    'Category': np.random.choice(['A', 'B', 'C', 'D'], 100),
    'Value_for_Histogram': np.random.normal(loc=50, scale=15, size=100) # Normal distribution around 50
}
df_random = pd.DataFrame(data)

print("First 5 rows of the generated random dataset:")
print(df_random.head().to_markdown(index=False, numalign="left", stralign="left"))

# Set Seaborn style for better aesthetics
sns.set_style("whitegrid")

# 2. Generate Scatter Plot
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Feature_X', y='Feature_Y', data=df_random, hue='Category', palette='viridis')
plt.title('Scatter Plot of Feature_X vs Feature_Y')
plt.xlabel('Feature_X')
plt.ylabel('Feature_Y')
plt.grid(True)
plt.savefig('scatter_plot.png')
plt.close() # Close plot to free memory

# 3. Generate Pie Chart
plt.figure(figsize=(8, 8))
category_counts = df_random['Category'].value_counts()
plt.pie(category_counts, labels=category_counts.index, autopct='%1.1f%%', startangle=140, colors=sns.color_palette('pastel'))
plt.title('Distribution of Categories')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.savefig('pie_chart.png')
plt.close()

# 4. Generate Histogram
plt.figure(figsize=(8, 6))
sns.histplot(df_random['Value_for_Histogram'], bins=10, kde=True, color='skyblue')
plt.title('Histogram of Value for Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.grid(True)
plt.savefig('histogram_plot.png')
plt.close()

# 5. Generate Box Plot
plt.figure(figsize=(8, 6))
sns.boxplot(x='Category', y='Value_for_Histogram', data=df_random, hue='Category', palette='coolwarm', legend=False)
plt.title('Box Plot of Value_for_Histogram by Category')
plt.xlabel('Category')
plt.ylabel('Value_for_Histogram')
plt.grid(True)
plt.savefig('box_plot.png')
plt.close()

print("\nGenerated scatter_plot.png, pie_chart.png, histogram_plot.png, and box_plot.png.")

First 5 rows of the generated random dataset:
| Feature_X   | Feature_Y   | Category   | Value_for_Histogram   |
|:------------|:------------|:-----------|:----------------------|
| 51          | 23          | C          | 38.2512               |
| 92          | 25          | B          | 45.1691               |
| 14          | 88          | D          | 62.2028               |
| 71          | 59          | C          | 31.537                |
| 60          | 40          | A          | 53.4119               |

Generated scatter_plot.png, pie_chart.png, histogram_plot.png, and box_plot.png.
