A Clustering and KNN model for image classification

Bag of Visual Words for MNIST-Fashion dataset

A Clustering and KNN model for image classification

Extracts features from each image in training set and generates a dictionary (Bag of Words) using K-means clustering with K Plus Plus centroid initialisation.

Representative cluster images are saved in './centroid_images'

Note: The algorithm runs three independent BoW extraction, clustering and predictions, and uses MajorityVote() to obtain final predictions. This improved the accuracy from 79% to 82% in the final test.

The clustering and prediction (MatchHistogram) process takes considerable time (24h+ on Ryzen 5 3600, 32gb Ram), by default the RunAll script loads the vocab and the prediction files. Standard flow:

If datasets exist, load into memory and extract features else: load the (precomputed) predictions.csv and compute metrics directly, quit script.
If vocab database exists, load the BoW dictionaries (3) else: generate the BoW dictionaries for each feature set
If prediction file exists; load it else: make predictions (24+ h)
Display performance metrics
End.

Feature extraction: Primary method is computing a HOG and returning a flattened vector for each image. There is the option to first pre-process each image into 'patches' of a certain size, with optional overlap (e.g. 8x8 patch with 4 overlap) and compute HOG on each of these. Three seperate HOG descriptors / BoW clusters are used, with these parameters: km=100, patch=8, stride=4, cells=4, blocks=2, orients=8 km=100, patch=14, stride=7, cells=7, blocks=2, orients=12 km=100, patch=28, stride=1, cells=9, blocks=2, orients=12

(km refers to dictionary size (words), patch is 'window size' to calculate HOG over, stride is window stride, cells is pixels per cell for HOG, blocks is cells per block, orientations is # bins for HOG

Representing images as BoW representation: All images in train and test sets have histograms (attribution vectors) computed based on a soft assignment of the dictionary words, inversely proportional to the distance between image feature vector and each BoW word vector.

Prediction: Test images labels are predicted by conducting a KNN search through the featurespace, with Default KNN size of 6.

Methods

CreateDictionary()

Inputs: patches - ndarray(N,height,width) - A set of images feat_vec - ndarray(N, length) - A corresponding set of feature vectors k - int - size of visual dictionary (aka cluster centroids) desired

Outputs: a tuple of (words, patches) which are the words in the visual dictionary resulting from Clustering

A kmeans clustering algorithm to create a dictionary of representative feature vectors based on the input feat_vec

steps:

Initialise k centroids using the 'kmeans++' method (http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf) In a nutshell it picks points from the feat_vec set with probability proportional to the distance between them. Results in a possibly better spready of starting centroids than simple
Iterate the following process until convergence:

Assign vectors to nearest (euclidian) centroid
Update centroid to be the mean of the cluster (least squared distance)

(convergence threshold given as centroid update below a 1% shift in all directions)

ComputeHistogram()

Inputs: feat_vec - (1 x length) - Single feature vector vis_dict - the visual dictionary generated by CreateDictionary()

Outputs: vector of length K - K=#words in dictionary.

Calculates a soft assignment of the feature vector to the visual dictionary Assignment is depended on the inverse square of the l2 norm to each word in the dictionary (closer vectors get a higher attribution)

SumHistogram()

Inputs: img_feat_set - (N x length) set of feature vectors representing a single image vis_dict - the visual dictionary generated by CreateDictionary()

Outputs: The composite histogram for the image- a soft assignment to the words in the visual dictionary

Simply sums all of the individual histograms from ComputeHistogram() for a given image

MatchHistogram()

Inputs: h1, h2 - composite assignment vectors produced by SumHistogram() method - 'intersection' or 'chisquared'. Note that 'chisquared' seems to perform much better

Outputs: A float, the distance between the two histograms/vectors using the chosen method

MakePredictions() Inputs: test_hists, train_hists - python lists of image histograms knn - int, specify number of neighbours to consider in KNN Outputs: prediction label: a python list of predicted class for test images

Finds the K nearest training images (by MatchHistogram() calculation) and performs a majority vote to get the predicted class of the test image

GetMetrics() Inputs: pred, label - prediction list and true label Outputs: {'accuracy', "cw_accuracy", "precision", "recall"} a dictionary of GetMetrics

Calculates various metrics for the model

MajorityVote() As 3 different feature sets / BoW dictionaries / histogram sets / predictions are made, MajorityVote() is used to obtain to obtain the majority. Ties are broken choosing most accurate (iter 3) To speed up prediction time, disable use of this, and use only one of the HOG pipelines.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
centroid_images		centroid_images
data		data
README.md		README.md
Run.py		Run.py
bow_libraries.py		bow_libraries.py
predictions_final.csv		predictions_final.csv
vocabs.bak		vocabs.bak
vocabs.dat		vocabs.dat
vocabs.dir		vocabs.dir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

centroid_images

centroid_images

data

data

README.md

README.md

Run.py

Run.py

bow_libraries.py

bow_libraries.py

predictions_final.csv

predictions_final.csv

vocabs.bak

vocabs.bak

vocabs.dat

vocabs.dat

vocabs.dir

vocabs.dir

Repository files navigation

Bag of Visual Words for MNIST-Fashion dataset

A Clustering and KNN model for image classification

Methods

About

Releases

Packages

Languages

mwcoleman/BoVW-classifier

Folders and files

Latest commit

History

Repository files navigation

Bag of Visual Words for MNIST-Fashion dataset

A Clustering and KNN model for image classification

Methods

About

Resources

Stars

Watchers

Forks

Languages