This project talks about the detailed implementation of different categories based scene recognition using geometric correspondence of different interest points detected on the images using SIFT (Scale Invariant Feature Transform). These interest points are based upon the changing gradient of the pixel intensities and summarized to form the local key descriptors for a particular image. In this project we have used another form of SIFT, known as Dense-SIFT as they are supposed to perform better in the object categorization. Our proposed system builds a representation based on bag of visual words and uses SVM classifier along with spatial pyramid matching for classifying the landscape categories. For classification of images, we tried classifiers like SVM with different kernels and KNN and achieved highest accuracy on linear SVM.
Project report can be found at report link
General flow of the implementation
Four Steps of Constructing the Visual Words, courtesy ISRN AI
Feature extraction using SIFT
Keypoint detection using SIFT
We used two datasets of images, first dataset is already avaialble on Web, second dataset has been created by us using the photographs clicked at different places in SUNY-Buffalo:
- SUN - Data Set: We worked with SUN Dataset which consisted of 8 categories with total 1509 images. Out of these, 160 images were used as testing images and the remaining 1349 were training images.
Comparision in accuracy between methods used; 1- Spatial Pyramid Matching (Level 3) with K-Means Clustering (HW1) and SIFT Descriptors with SVM (Proposed method)
Scene Accuracy on UB Campus Data
Best Accuracy achieved for UB Campus Data
Plot Confusion for Sun DataSet at K = 180 and Alpha = 120, Accuracy: 83.75%, using Support Vector Machine algo with SIFT
Plot Confusion for UB Campus DataSet at K = 180 and Alpha = 500, Accuracy: 93.75%, using Support Vector Machine algo with SIFT
In this project, we attempted to improve the performance of scene classification using a different method for feature extraction and classification. For feature extraction, we used SIFT and SURF for keypoint detection on images. We found out that SURF is efficient than SIFT, but SIFT is more robust. Using SURF, we got an accuracy of 74% and using SIFT, we achieved an accuracy of 86%. Also, we tried different classifiers such as linear SVM and KNN. Linear SVM provided far better accuracy than KNN.
- docs contains documentation and paper
- release contains implmentation codes and libraries
- data contains image data
- MATLAB R2017a
- Intel i7, 16GB RAM, Windows 10
- The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, Kristen Grauman and Trevor Darrell
- Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
- Distinctive Image Features from Scale-Invariant Keypoints, David G. Lowe
- VLfeat: An open and Portable Library of Computer Vision Algorithms, Andrea Vedaldi, Brian Fulkerson