Skip to content
Final Project Repo for CVIP
C MATLAB HTML Python Makefile CSS Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Project Summary

This project talks about the detailed implementation of different categories based scene recognition using geometric correspondence of different interest points detected on the images using SIFT (Scale Invariant Feature Transform). These interest points are based upon the changing gradient of the pixel intensities and summarized to form the local key descriptors for a particular image. In this project we have used another form of SIFT, known as Dense-SIFT as they are supposed to perform better in the object categorization. Our proposed system builds a representation based on bag of visual words and uses SVM classifier along with spatial pyramid matching for classifying the landscape categories. For classification of images, we tried classifiers like SVM with different kernels and KNN and achieved highest accuracy on linear SVM.


Project report can be found at report link


General flow of the implementation

Four Steps of Constructing the Visual Words, courtesy ISRN AI

Feature extraction using SIFT

Keypoint detection using SIFT


We used two datasets of images, first dataset is already avaialble on Web, second dataset has been created by us using the photographs clicked at different places in SUNY-Buffalo:

  1. SUN - Data Set: We worked with SUN Dataset which consisted of 8 categories with total 1509 images. Out of these, 160 images were used as testing images and the remaining 1349 were training images.

2. University at Buffalo - SUNY Campus Pictures: We have a total of 532 images. Out of these, 112 images were used as testing dataset (16 images for each category) and the remaining were training images.

Comparision in accuracy between methods used; 1- Spatial Pyramid Matching (Level 3) with K-Means Clustering (HW1) and SIFT Descriptors with SVM (Proposed method)

Scene Accuracy on UB Campus Data

Best Accuracy achieved for UB Campus Data

Overall result

Plot Confusion for Sun DataSet at K = 180 and Alpha = 120, Accuracy: 83.75%, using Support Vector Machine algo with SIFT

Plot Confusion for UB Campus DataSet at K = 180 and Alpha = 500, Accuracy: 93.75%, using Support Vector Machine algo with SIFT


In this project, we attempted to improve the performance of scene classification using a different method for feature extraction and classification. For feature extraction, we used SIFT and SURF for keypoint detection on images. We found out that SURF is efficient than SIFT, but SIFT is more robust. Using SURF, we got an accuracy of 74% and using SIFT, we achieved an accuracy of 86%. Also, we tried different classifiers such as linear SVM and KNN. Linear SVM provided far better accuracy than KNN.

Folder Tree

  • docs contains documentation and paper
  • release contains implmentation codes and libraries
  • data contains image data


  1. MATLAB R2017a
  2. vlfeat-0.9.20
  3. Intel i7, 16GB RAM, Windows 10


  1. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, Kristen Grauman and Trevor Darrell
  2. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
  3. Distinctive Image Features from Scale-Invariant Keypoints, David G. Lowe
  4. VLfeat: An open and Portable Library of Computer Vision Algorithms, Andrea Vedaldi, Brian Fulkerson
You can’t perform that action at this time.