Leaflet-Product-Classification

This git repository contains the code to the paper "Fine-Grained Product Classification on Leaflet Advertisements".

Abstract

We describe a first publicly available fine-grained product recognition dataset based on leaflet images. Using advertisement leaflets, collected over several years from different European retailers, we provide a total of 41.6k manually annotated product images in 832 classes. Further, we investigate three different approaches for this fine-grained product classification task, Classification by Image, by Text, as well as by Image and Text. The approach "Classification by Text" uses the text extracted directly from the leaflet product images. We show, that the combination of image and text as input improves the classification of visual difficult to distinguish products. The final model leads to an accuracy of 96.4% with a Top-3 score of 99.2%.

The figure depicts an example of the promotions of the same product in the leaflets of two different retailers. Price monitoring based on printed leaflets is a key data analysis task in retail, which technically can be defined as a fine-grained, multi-modal classification problem.

Data

The Dataset can be found here: Products Leaflets Dataset

Paper

Accepted at the CVPR 23 Workshop on Fine-Grained Visual Categorization

Preprint is available here: Fine-Grained Product Classification on Leaflet Advertisements

cite:

@misc{ladwig2023finegrained,
      title={Fine-Grained Product Classification on Leaflet Advertisements}, 
      author={Daniel Ladwig and Bianca Lamm and Janis Keuper},
      year={2023},
      eprint={2305.03706},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Code

The code is written in Python, the models are build with Pytorch. It includes the image classification, text extraction, text classification and model combination.

Installation

Linux:

pip install split-folders
apt install tesseract-ocr
apt-get install tesseract-ocr-deu
pip install pytesseract

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
notebooks		notebooks
reports		reports
src/models		src/models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

reports

reports

src/models

src/models

README.md

README.md

Repository files navigation

Leaflet-Product-Classification

Abstract

Data

Paper

Code

Installation

About

Releases

Packages

Contributors 3

Languages

ladwigd/Leaflet-Product-Classification

Folders and files

Latest commit

History

Repository files navigation

Leaflet-Product-Classification

Abstract

Data

Paper

Code

Installation

About

Resources

Stars

Watchers

Forks

Languages