Skip to content

Bag of words model image classifier with evaluation on MNIST-fashion data set

Notifications You must be signed in to change notification settings

mwcoleman/BoVW-classifier

Repository files navigation

Bag of Visual Words for MNIST-Fashion dataset

A Clustering and KNN model for image classification

Extracts features from each image in training set and generates a dictionary (Bag of Words) using K-means clustering with K Plus Plus centroid initialisation.

Representative cluster images are saved in './centroid_images'

Note: The algorithm runs three independent BoW extraction, clustering and predictions, and uses MajorityVote() to obtain final predictions. This improved the accuracy from 79% to 82% in the final test.

The clustering and prediction (MatchHistogram) process takes considerable time (24h+ on Ryzen 5 3600, 32gb Ram), by default the RunAll script loads the vocab and the prediction files. Standard flow:

  • If datasets exist, load into memory and extract features else: load the (precomputed) predictions.csv and compute metrics directly, quit script.
  • If vocab database exists, load the BoW dictionaries (3) else: generate the BoW dictionaries for each feature set
  • If prediction file exists; load it else: make predictions (24+ h)
  • Display performance metrics
  • End.

Feature extraction: Primary method is computing a HOG and returning a flattened vector for each image. There is the option to first pre-process each image into 'patches' of a certain size, with optional overlap (e.g. 8x8 patch with 4 overlap) and compute HOG on each of these. Three seperate HOG descriptors / BoW clusters are used, with these parameters: km=100, patch=8, stride=4, cells=4, blocks=2, orients=8 km=100, patch=14, stride=7, cells=7, blocks=2, orients=12 km=100, patch=28, stride=1, cells=9, blocks=2, orients=12

(km refers to dictionary size (words), patch is 'window size' to calculate HOG over, stride is window stride, cells is pixels per cell for HOG, blocks is cells per block, orientations is # bins for HOG

Representing images as BoW representation: All images in train and test sets have histograms (attribution vectors) computed based on a soft assignment of the dictionary words, inversely proportional to the distance between image feature vector and each BoW word vector.

Prediction: Test images labels are predicted by conducting a KNN search through the featurespace, with Default KNN size of 6.

Methods

CreateDictionary()

Inputs: patches - ndarray(N,height,width) - A set of images feat_vec - ndarray(N, length) - A corresponding set of feature vectors k - int - size of visual dictionary (aka cluster centroids) desired

Outputs: a tuple of (words, patches) which are the words in the visual dictionary resulting from Clustering

A kmeans clustering algorithm to create a dictionary of representative feature vectors based on the input feat_vec

steps:

  1. Initialise k centroids using the 'kmeans++' method (http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf) In a nutshell it picks points from the feat_vec set with probability proportional to the distance between them. Results in a possibly better spready of starting centroids than simple

  2. Iterate the following process until convergence:

  • Assign vectors to nearest (euclidian) centroid
  • Update centroid to be the mean of the cluster (least squared distance)

(convergence threshold given as centroid update below a 1% shift in all directions)

ComputeHistogram()

Inputs: feat_vec - (1 x length) - Single feature vector vis_dict - the visual dictionary generated by CreateDictionary()

Outputs: vector of length K - K=#words in dictionary.

Calculates a soft assignment of the feature vector to the visual dictionary Assignment is depended on the inverse square of the l2 norm to each word in the dictionary (closer vectors get a higher attribution)

SumHistogram()

Inputs: img_feat_set - (N x length) set of feature vectors representing a single image vis_dict - the visual dictionary generated by CreateDictionary()

Outputs: The composite histogram for the image- a soft assignment to the words in the visual dictionary

Simply sums all of the individual histograms from ComputeHistogram() for a given image

MatchHistogram()

Inputs: h1, h2 - composite assignment vectors produced by SumHistogram() method - 'intersection' or 'chisquared'. Note that 'chisquared' seems to perform much better

Outputs: A float, the distance between the two histograms/vectors using the chosen method

MakePredictions() Inputs: test_hists, train_hists - python lists of image histograms knn - int, specify number of neighbours to consider in KNN Outputs: prediction label: a python list of predicted class for test images

Finds the K nearest training images (by MatchHistogram() calculation) and performs a majority vote to get the predicted class of the test image

GetMetrics() Inputs: pred, label - prediction list and true label Outputs: {'accuracy', "cw_accuracy", "precision", "recall"} a dictionary of GetMetrics

Calculates various metrics for the model

MajorityVote() As 3 different feature sets / BoW dictionaries / histogram sets / predictions are made, MajorityVote() is used to obtain to obtain the majority. Ties are broken choosing most accurate (iter 3) To speed up prediction time, disable use of this, and use only one of the HOG pipelines.

About

Bag of words model image classifier with evaluation on MNIST-fashion data set

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published