Skip to content

epfl-dlab/wiki_image_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Categorization of Wikipedia Images

Besides their regular text, Wikipedia articles are rich multimedia documents containing video, audio, and above all, images. The volume of Wikipedia images is very large: English Wikipedia articles alone contain more than 5 million unique images. As observed in recent work, images represent an essential component of Wikipedia readers' experiences with significant user engagement[1]. At the same time, from a data perspective, the geographical and cultural diversity of Wikipedia makes its image data unprecedented and extremely valuable for the research community. Despite its volume and value, navigating, retrieving, and re-using visual content on Wikipedia is hard, due to the lack of labels, categories, and metadata. Classification of this content for research and editing purposes is becoming increasingly important. Unfortunately, the value offered by its uniqueness comes with the disadvantage that common off-the-shelf classification models based on ImageNet give unsatisfactory results, requiring a custom solution.

This project is inspired by the textual counterpart ORES, and the goal is two-fold: 1) develop a classification taxonomy to label images on Wikipedia and 2) develop a model for image classification and embedding. The first part requires familiarity with semantic network data such as Wikipedia/Commons category network, and it aims to identify the best way to label images on Wikipedia based on existing metadata (e.g. Wikipedia/Commons templates, categories, and tags). The second part will focus on training and evaluating a deep learning model to predict the binary relevance of a set of relevant labels.

More info: https://meta.wikimedia.org/wiki/Research:Automated_Categorization_of_Wikipedia_Images

Taxonomy

Classification

About

Wikipedia Image Classification project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published