Skip to content

A computer vision application that retrieves the most similar video frames to selected image/object/character

Notifications You must be signed in to change notification settings

jschhie/Interactive-Image-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Image Similarity Search for Friends (TV Show)

MATLAB

Project Overview

This interactive image classifier allows users to select a specific region/object within an image, or an entire image. The program retrieves the top n = 5 most similar photos based on the selection.

Similarity scores are calculated using bag-of-words modeling and k-means clustering, using a dataset of 6,600+ distinct video frames from the American TV series Friends.

Note: The dataset is not included in this repository. Please see the visual demo for examples.

For a brief description of the terminology used, see here.

Table of Contents

Sample Results

This program uses Scale-Invariant Feature Transform (SIFT) descriptors, along with their associated images.

Below are sample results for both full-frame and region-based queries.

Example I: Full-Frame Query

Retrieves top n = 5 most similar video frames to the selected image.

alt text

Example II: Region-Based Query

Retrieves top n = 5 most similar video frames containing the queried region/object (in this example, a kitchen table, outlined in blue).

Query Retrieved Images
alt text alt text

Please refer to the sample_outputs directory for additional examples. Its layout and contents are detailed in the following section.

Directory Layout and Contents

This section outlines the structure and contents of the sample_outputs directory, including its subdirectories.

Subdirectory Name Description of Contents
full_frames Sample results based on full-frame queries.
full_frames_comparison Visual comparison between AlexNet Image Classification and SIFT-based descriptors, illustrating the program's accuracy and effectiveness.
raw_matches Sample queried region compared to computed SIFT descriptors.
region_based Sample results based on region-based queries.
visual_vocab Sample visual vocabulary (also known as bag-of-words), where each image patch represents a "word".

Terminology

Terminology Description
Bag-of-Words (BoW) Modeling A histogram of visual image patches or literal words within a given image or text, describing the frequency of unique (visual) words
SIFT (Scale-Invariant Feature Transform) A method for detecting and describing local, unique features within images
AlexNet A well-known Computer Vision model designed by Alex Krizhevsky for detecting and classifying objects