Code for our COLING 2022 paper Multilingual and Multimodal Topic Modelling with Pretrained Embeddings
We present M3L-Contrast--—a novel multimodal multilingual (M3L) neural topic model for comparable data that maps multilingual texts and images into a shared topic space using a contrastive objective. As a multilingual topic model, it produces aligned language-specific topics and as multimodal model, it infers textual representations of semantic concepts in images. We also show that our model performs almost as well on unaligned embeddings as it does on aligned embeddings.
Our proposed topic model is:
- multilingual
- multimodal (image-text)
- multimodal and multilingual (M3L)
Our model is based on the Contextualized Topic Model (Bianchi et al., 2021)
We use the PyTorch Metric Learning library for the InfoNCE/NTXent loss
- Aligned articles from the Wikipedia Comparable Corpora
- Images from the WIT dataset
- We will release the article titles and image urls in the train and test sets (soon!)
We shared some of the models we trained:
- M3L topic model trained with CLIP embeddings for texts and images
- M3L topic model trained with multilingual SBERT for text and CLIP for images
- M3L topic model trained with monolingual SBERT models for the English and German texts and CLIP for images
@inproceedings{zosa-pivovarova-2022-multilingual,
title = "Multilingual and Multimodal Topic Modelling with Pretrained Embeddings",
author = "Zosa, Elaine and Pivovarova, Lidia",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "International Committee on Computational Linguistics",
url = "https://aclanthology.org/2022.coling-1.355",
pages = "4037--4048",
}