CIFAR - 10

Understanding the Data

The CIFAR-10 (Canadian Institute For Advanced Research) dataset consists of 60000 images each of 32x32x3 color images having ten classes, with 6000 images per category.

The dataset consists of 50000 training images and 10000 test images.

The classes in the dataset are airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

Data Exploration

The above line of code returns training and test images along with the labels.

Let's quickly print the shape of training and testing images shape.

Let's also find out the total number of labels and the various kinds of classes the data has.

Now, plot the CIFAR-10 images.

Feature Reduction Using Principal Component Analysis (PCA)

Now comes the most exciting part. We will see how PCA turn high-dimensional data into a low-dimensional principal components.

But, Why use PCA ?

PCA is applied on images for a number of reasons. They are:

Number of features: Images have huge number of features. Even a small image of 28 x 28 pixels will have 784 features to deal with.
Lot of Covariance: When you look at an image, it is easy to understand that if we have the values for one pixel 'p', the pixels in the vicinity of 'p' will generally have similar values.
Other fundamental reasons to apply PCA are that it reduces memory consumption and Speed Up the training models, reducing time taken.

Apply PCA

But before that, let's reshape the image dimensions from three to one (flatten the images).

Next, make the instance of the PCA model. And lets fit PCA on the whole training data.

So, currently our training data possess total of 3072 features. Now, we will decide the value of 'k' on the basis of amount of variance we want PCA to retain.

Thus, by keeping 217 components out of 3072, we shall retain 95% variance.

Finally, fit the PCA model again on the training data, but this time we'll also pass the optimal k (i.e. 217) as a parameter

We have reduced the number of components from 3072 to 217. Reproduced the Images, by getting the approximation from the reduced data.

This data will not be the same as the original data, as after applying PCA we lose some of the original data. But lets try and plot these images and see how much difference is achieved.

Next Step is to fit the same model to test data and transform it.

Apply Classification Models

Models that we're using :

Logistic Regression
Random Forest
Gaussian NaiveBayes
K Nearest Neighbor
Support Vector Machines

Lets fit all models on the training data and get the testing accuracy score.

Algorithms	Accuracy Score
Logistic Regression	0.4081
Random Forest	0.441
Gaussian NaiveBayes	0.3147
K Nearest Neighbor	0.2279
Support Vector Machines	0.5518

Looks like, we're achieving the maximum score with SVM ( i.e. 0.5518 ). Still its not a good score. But, the best that can be achieved right now!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CIFAR-10		CIFAR-10
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIFAR-10

CIFAR-10

.DS_Store

.DS_Store

README.md

README.md

Repository files navigation

CIFAR - 10

Understanding the Data

Data Exploration

Feature Reduction Using Principal Component Analysis (PCA)

But, Why use PCA ?

Apply PCA

Apply Classification Models

About

Releases

Packages

Languages

savss624/CIFAR-10

Folders and files

Latest commit

History

Repository files navigation

CIFAR - 10

Understanding the Data

Data Exploration

Feature Reduction Using Principal Component Analysis (PCA)

But, Why use PCA ?

Apply PCA

Apply Classification Models

About

Resources

Stars

Watchers

Forks

Languages