Skip to content
This repository has been archived by the owner on Feb 27, 2020. It is now read-only.

Implementation of joint text and image representations, using VSE++ losses, and implementation of t-SNE

License

Notifications You must be signed in to change notification settings

swasun/Joint-Text-and-Image-Representation

Repository files navigation

Overview

This project contains an implementation of VSE++ losses by [Faghri, Fartash et al., 2017] that is a technique for learning visual-semantic embeddings for cross-modal retrieval, and an implementation of t-SNE by [van der Maaten et al., 2008] (school project, Signal Learning and Multimedia class, 2019).

It is applied on MSCOCO image captioning dataset by [Lin, Tsung-Yi et al., 2014], in particular with the val2014 data which contains a set of 40k images annotated with five captions each. We also used Resnet50 features by [He, Kaiming et al., 2016] and glove embeddings by [Pennington, Jeffrey et al., 2014].

A good introduction of Representation Learning would be [Bengio, Y. et al., 2013].

Features

Installation

It requires python3, python3-pip, the packages listed in requirements.txt and a recent version of git that supports git-lfs.

To install the required packages:

pip3 install -r requirements.txt

Usage

A notebook is available, and each feature is illustrated in an example in test directory.

References

Authors

  • Charly Lamothe
  • Guillaume Ollier
  • Balthazar Casalé

About

Implementation of joint text and image representations, using VSE++ losses, and implementation of t-SNE

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published