
# Assessment Module 1: Exploration & Analysis of Real-Time Data Modalities

## Chosen Modality: Image Data

**Objective:**  
Explore, preprocess, and perform basic exploratory data analysis (EDA) on a real-world image dataset.



## 1. Dataset Identification

- **Dataset Name:** CIFAR-10  
- **Source:** https://www.cs.toronto.edu/~kriz/cifar.html  
- **Type of Data:** Image Data  
- **Problem Domain:** Computer Vision / Object Recognition  
- **Number of Samples:**  
  - Training images: 50,000  
  - Test images: 10,000  
- **Image Size:** 32 × 32 pixels  
- **Color Channels:** RGB (3 channels)  
- **Classes:** 10 (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck)


In [None]:

# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.datasets import cifar10



## 2. Load the Dataset


In [None]:

# Load CIFAR-10 dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

print("Training data shape:", X_train.shape)
print("Test data shape:", X_test.shape)



## 3. Sample Image Visualization


In [None]:

# Class labels
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Display sample images
plt.figure(figsize=(10,5))
for i in range(10):
    plt.subplot(2,5,i+1)
    plt.imshow(X_train[i])
    plt.title(class_names[y_train[i][0]])
    plt.axis('off')
plt.tight_layout()
plt.show()



## 4. Image Dimension & Channel Analysis


In [None]:

# Image dimensions
height, width, channels = X_train.shape[1:]
print(f"Image Height: {height}")
print(f"Image Width: {width}")
print(f"Color Channels: {channels}")



## 5. Pixel Intensity Distribution


In [None]:

# Flatten pixel values
pixels = X_train.reshape(-1, 3)

# Plot pixel intensity distribution
plt.figure(figsize=(8,5))
plt.hist(pixels[:,0], bins=50, alpha=0.5, label='Red')
plt.hist(pixels[:,1], bins=50, alpha=0.5, label='Green')
plt.hist(pixels[:,2], bins=50, alpha=0.5, label='Blue')
plt.xlabel("Pixel Intensity")
plt.ylabel("Frequency")
plt.legend()
plt.title("Pixel Intensity Distribution (RGB Channels)")
plt.show()



## 6. Noise / Distortion Observation

- CIFAR-10 images are low resolution (32×32).
- Some images appear blurry or pixelated.
- Background noise is common due to real-world image capture.



## 7. Basic Statistical Analysis


In [None]:

# Basic statistics
mean_pixel = np.mean(X_train)
std_pixel = np.std(X_train)

print("Mean Pixel Value:", mean_pixel)
print("Standard Deviation of Pixel Values:", std_pixel)



## Conclusion

- Image data modality was explored using CIFAR-10.
- Basic EDA included visualization, pixel analysis, and statistical summaries.
- This dataset is suitable for learning foundational image processing and classification techniques.
