# PCA with Image Dataset CIFAR-10

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.  

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.  

The classes are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck  

In [1]:
import tensorflow as tf

# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

# The dataset is now loaded into memory and ready for use.




Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 0us/step


### Reshape

In [10]:
print(train_images.shape)
print(test_images.shape)

(50000, 32, 32, 3)
(10000, 32, 32, 3)


Since each image in CIFAR-10 is a 3-dimensional array (32x32 pixels with 3 color channels), and PCA requires a 2-dimensional input (samples x features), you will need to reshape the image arrays into vectors. This means converting the 32x32x3 arrays into 3072-length vectors.

In [11]:
# Reshape the training and testing image arrays
train_images_flat = train_images.reshape((train_images.shape[0], -1))
test_images_flat = test_images.reshape((test_images.shape[0], -1))

### Normalize

In [12]:
# Normalize the pixel values to range 0-1
train_images_flat = train_images_flat.astype('float32') / 255
test_images_flat = test_images_flat.astype('float32') / 255

In [14]:
# Check shapes and data ranges
print("Training images shape:", train_images_flat.shape)
print("Test images shape:", test_images_flat.shape)
print("Min and max pixel values:", train_images_flat.min(), train_images_flat.max())


Training images shape: (50000, 3072)
Test images shape: (10000, 3072)
Min and max pixel values: 0.0 1.0
