Skip to content

Commit

Permalink
Merge pull request #5 from YknZhu/master
Browse files Browse the repository at this point in the history
Adding CPU support
  • Loading branch information
ryankiros committed Nov 14, 2015
2 parents 95f8359 + b626b22 commit 61e12a7
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 17 deletions.
20 changes: 15 additions & 5 deletions README.md
Expand Up @@ -59,14 +59,15 @@ This code is written in python. To use it you will need:
* [Lasagne](https://github.com/Lasagne/Lasagne) * [Lasagne](https://github.com/Lasagne/Lasagne)
* A version of Theano that Lasagne supports * A version of Theano that Lasagne supports


Note that a GPU is required. For running on CPU, you will need to install [Caffe](http://caffe.berkeleyvision.org) and its python interface.



## Getting started ## Getting started


You will first need to download some pre-trained models and style vectors. Most of the materials are available in a single compressed file, which you can obtain by running You will first need to download some pre-trained models and style vectors. Most of the materials are available in a single compressed file, which you can obtain by running


wget http://www.cs.toronto.edu/~rkiros/neural_storyteller.zip wget http://www.cs.toronto.edu/~rkiros/neural_storyteller.zip

Included is a pre-trained decoder on romance novels, the decoder dictionary, caption and romance style vectors, MS COCO training captions and a pre-trained image-sentence embedding model. Included is a pre-trained decoder on romance novels, the decoder dictionary, caption and romance style vectors, MS COCO training captions and a pre-trained image-sentence embedding model.


Next, you need to obtain the pre-trained skip-thoughts encoder. Go [here](https://github.com/ryankiros/skip-thoughts) and follow the instructions on the main page to obtain the pre-trained model. Next, you need to obtain the pre-trained skip-thoughts encoder. Go [here](https://github.com/ryankiros/skip-thoughts) and follow the instructions on the main page to obtain the pre-trained model.
Expand All @@ -77,6 +78,15 @@ Finally, we need the VGG-19 ConvNet parameters. You can obtain them by running


Note that this model is for non-commercial use only. Once you have all the materials, open `config.py` and specify the locations of all of the models and style vectors that you downloaded. Note that this model is for non-commercial use only. Once you have all the materials, open `config.py` and specify the locations of all of the models and style vectors that you downloaded.


For running on CPU, you will need to download the VGG-19 prototxt and model by:

wget http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_19_layers.caffemodel
wget https://gist.githubusercontent.com/ksimonyan/3785162f95cd2d5fee77/raw/bb2b4fe0a9bb0669211cf3d0bc949dfdda173e9e/VGG_ILSVRC_19_layers_deploy.prototxt

You also need to modify pycaffe and model path in `config.py`, and modify the flag in line 8 as:

FLAG_CPU_MODE = True

## Generating a story ## Generating a story


The images directory contains some sample images that you can try the model on. In order to generate a story, open Ipython and run the following: The images directory contains some sample images that you can try the model on. In order to generate a story, open Ipython and run the following:
Expand All @@ -98,7 +108,7 @@ where k is the number of captions to condition on and bw is the beam width. Thes
If you bias by song lyrics, you can turn on the lyric flag which will print the output in multiple lines by comma delimiting. `neural_storyteller.zip` contains an additional bias vector called `swift_style.npy` which is the mean of skip-thought vectors across Taylor Swift lyrics. If you point `path_to_posbias` to this vector in `config.py`, you can generate captions in the style of Taylor Swift lyrics. For example: If you bias by song lyrics, you can turn on the lyric flag which will print the output in multiple lines by comma delimiting. `neural_storyteller.zip` contains an additional bias vector called `swift_style.npy` which is the mean of skip-thought vectors across Taylor Swift lyrics. If you point `path_to_posbias` to this vector in `config.py`, you can generate captions in the style of Taylor Swift lyrics. For example:


generate.story(z, './images/ex1.jpg', lyric=True) generate.story(z, './images/ex1.jpg', lyric=True)

should output should output


You re the only person on the beach right now You re the only person on the beach right now
Expand All @@ -107,7 +117,7 @@ should output
and when the sea breeze hits me and when the sea breeze hits me
I thought I thought
Hey Hey

## Reference ## Reference


This project does not have any associated paper with it. If you found this code useful, please consider citing: This project does not have any associated paper with it. If you found this code useful, please consider citing:
Expand All @@ -120,7 +130,7 @@ Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba,
journal={arXiv preprint arXiv:1506.06726}, journal={arXiv preprint arXiv:1506.06726},
year={2015} year={2015}
} }

If you also use the BookCorpus data for training new models, please also consider citing: If you also use the BookCorpus data for training new models, please also consider citing:


Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler. Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler.
Expand Down
8 changes: 8 additions & 0 deletions config.py
Expand Up @@ -2,6 +2,10 @@
Configuration for the generate module Configuration for the generate module
""" """


#-----------------------------------------------------------------------------#
# Flags for running on CPU
#-----------------------------------------------------------------------------#
FLAG_CPU_MODE = True


#-----------------------------------------------------------------------------# #-----------------------------------------------------------------------------#
# Paths to models and biases # Paths to models and biases
Expand All @@ -21,6 +25,10 @@


# VGG-19 convnet # VGG-19 convnet
paths['vgg'] = '/ais/gobi3/u/rkiros/vgg/vgg19.pkl' paths['vgg'] = '/ais/gobi3/u/rkiros/vgg/vgg19.pkl'
paths['pycaffe'] = '/u/yukun/Projects/caffe-run/python'
paths['vgg_proto_caffe'] = '/ais/guppy9/movie2text/neural-storyteller/models/VGG_ILSVRC_19_layers_deploy.prototxt'
paths['vgg_model_caffe'] = '/ais/guppy9/movie2text/neural-storyteller/models/VGG_ILSVRC_19_layers.caffemodel'



# COCO training captions # COCO training captions
paths['captions'] = '/ais/gobi3/u/rkiros/storyteller/coco_train_caps.txt' paths['captions'] = '/ais/gobi3/u/rkiros/storyteller/coco_train_caps.txt'
Expand Down
43 changes: 31 additions & 12 deletions generate.py
Expand Up @@ -11,14 +11,15 @@
import decoder import decoder
import embedding import embedding


from config import paths import config


import lasagne import lasagne
from lasagne.layers import InputLayer, DenseLayer, NonlinearityLayer, DropoutLayer from lasagne.layers import InputLayer, DenseLayer, NonlinearityLayer, DropoutLayer
from lasagne.layers.corrmm import Conv2DMMLayer as ConvLayer
from lasagne.layers import MaxPool2DLayer as PoolLayer from lasagne.layers import MaxPool2DLayer as PoolLayer
from lasagne.nonlinearities import softmax from lasagne.nonlinearities import softmax
from lasagne.utils import floatX from lasagne.utils import floatX
if not config.FLAG_CPU_MODE:
from lasagne.layers.corrmm import Conv2DMMLayer as ConvLayer


from scipy import optimize, stats from scipy import optimize, stats
from collections import OrderedDict, defaultdict, Counter from collections import OrderedDict, defaultdict, Counter
Expand Down Expand Up @@ -72,33 +73,44 @@ def story(z, image_loc, k=100, bw=50, lyric=False):
else: else:
print passage print passage



def load_all(): def load_all():
""" """
Load everything we need for generating Load everything we need for generating
""" """
print paths['decmodel'] print config.paths['decmodel']


# Skip-thoughts # Skip-thoughts
print 'Loading skip-thoughts...' print 'Loading skip-thoughts...'
stv = skipthoughts.load_model(paths['skmodels'], paths['sktables']) stv = skipthoughts.load_model(config.paths['skmodels'],
config.paths['sktables'])


# Decoder # Decoder
print 'Loading decoder...' print 'Loading decoder...'
dec = decoder.load_model(paths['decmodel'], paths['dictionary']) dec = decoder.load_model(config.paths['decmodel'],
config.paths['dictionary'])


# Image-sentence embedding # Image-sentence embedding
print 'Loading image-sentence embedding...' print 'Loading image-sentence embedding...'
vse = embedding.load_model(paths['vsemodel']) vse = embedding.load_model(config.paths['vsemodel'])


# VGG-19 # VGG-19
print 'Loading and initializing ConvNet...' print 'Loading and initializing ConvNet...'
net = build_convnet(paths['vgg'])
if config.FLAG_CPU_MODE:
sys.path.insert(0, config.paths['pycaffe'])
import caffe
caffe.set_mode_cpu()
net = caffe.Net(config.paths['vgg_proto_caffe'],
config.paths['vgg_model_caffe'],
caffe.TEST)
else:
net = build_convnet(config.paths['vgg'])


# Captions # Captions
print 'Loading captions...' print 'Loading captions...'
cap = [] cap = []
with open(paths['captions'], 'rb') as f: with open(config.paths['captions'], 'rb') as f:
for line in f: for line in f:
cap.append(line.strip()) cap.append(line.strip())


Expand All @@ -108,8 +120,8 @@ def load_all():


# Biases # Biases
print 'Loading biases...' print 'Loading biases...'
bneg = numpy.load(paths['negbias']) bneg = numpy.load(config.paths['negbias'])
bpos = numpy.load(paths['posbias']) bpos = numpy.load(config.paths['posbias'])


# Pack up # Pack up
z = {} z = {}
Expand Down Expand Up @@ -161,7 +173,14 @@ def compute_features(net, im):
""" """
Compute fc7 features for im Compute fc7 features for im
""" """
fc7 = numpy.array(lasagne.layers.get_output(net['fc7'], im, deterministic=True).eval()) if config.FLAG_CPU_MODE:
net.blobs['data'].reshape(* im.shape)
net.blobs['data'].data[...] = im
net.forward()
fc7 = net.blobs['fc7'].data
else:
fc7 = numpy.array(lasagne.layers.get_output(net['fc7'], im,
deterministic=True).eval())
return fc7 return fc7


def build_convnet(path_to_vgg): def build_convnet(path_to_vgg):
Expand Down

0 comments on commit 61e12a7

Please sign in to comment.