Project work for the UNIBO Computer Vision M course. Group members:
For a detailed report, consult this document.
The aim of this project is to develop a Computer Vision algorithm for the recognition of cereal boxes on store shelves, given
- a set of Scene images, which can be found in this repository at
.\images\scenes
, depicting store shelves with cereal boxes in different setups; - a set of Model images, which can be found in this repository at
.\images\models
, representing various cereal boxes, and will be the templates that the algorithm will search for in the scenes.
The scene images are categorized as either easy, medium or hard, depending on the quality of the image, as well as the number of objects represented and the presence of nuisances. Thus, three separated pipelines were developed to take care of the problem at hand.
The first subset of scenes contains only a limited number of boxes, each present only one time, without repeated boxes and at a high enough resolution. For this scenario, the pipeline is:
- SIFT feature detection and Flann matching
- Match validation
The evaluation process is very efficient and does not have a significant impact on execution time.
The second subset of images contains a larger number of boxes, with the possibility of multiple instances for each box. The pipeline consists in:
- SIFT feature detection and Flann matching
- Generalized Hough Transform
- Match validation
The adopted strategy yields good results, correctly finding all the cereal boxes in each scene.
The last subset of images represent a very large amount of boxes, around 40, on multiple shelves, with the presence of distractor elements such as the prices tags and are low-resolution. This last pipeline consists in:
- Shelf splitting
- Sub-scene processing
- SIFT feature detection and Flann matching
- Generalized Hough Transform
- Match validation
Even with some imperfections, the number of boxes correctly labeled is overall satisfying.
The pipelines can be excuted on the corresponding subsets by using the options
-e
for the easy scenes,-m
for the medium scenes,-h
for the hard scenes.