Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Support Image Captioning (and other tasks) #12

Closed
dennybritz opened this issue Mar 4, 2017 · 1 comment
Closed

Support Image Captioning (and other tasks) #12

dennybritz opened this issue Mar 4, 2017 · 1 comment
Assignees
Labels

Comments

@dennybritz
Copy link
Contributor

dennybritz commented Mar 4, 2017

In theory it should be easy to support Image Captioning by just swapping out the encoder with something like ResNet/Inception (e.g. tensorflow.contrib.slim.python.slim.nets.inception_v3). However, there are a few things that need to happen to support problems other than text-to-text.

  • Currently, the parameters to the train/inference scripts are specific to text Sequence-To-Sequence, e.g. source_vocabulary, source_delimiter, etc. We probably need another abstraction layer that defines what kind of task the user is solving and adjust flags/parameters based on it. For example, I could imagine having a Task class, with TextToText, ImageToText, ..., subclasses. The user then passes the type of task as part of the config and the task class is responsible for setting the appropriate parameters and creating the model.
  • Support for pre-trained networks. For example, when training image captioning models one typically initializes the encoder network with pre-trained image classification network weights. This can probably the done through some kind of SessionRunHook that loads a subset of the variables. In other words, the hooks used in the training script must be configurable.
@ayushidalmia
Copy link

@dennybritz What is the status of this one?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants