image2speech

This repo is my attempt to reconstruct all of the stages in the paper http://www.isle.illinois.edu/sst/pubs/2018/hasegawajohnson_isga18.pdf

It is not yet complete. Currently it downloads the image set, and the captions, and the speech files, and their forced alignments, and generates cnnfeats from the images, and then runs XNMT to train the image-to-phone transducer. But the phone-to-speech transducer isn't there yet.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
flickr8k		flickr8k
frossard_vgg16		frossard_vgg16
phones2words		phones2words
speechcoco		speechcoco
.gitignore		.gitignore
README.md		README.md
file_listing.txt		file_listing.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flickr8k

flickr8k

frossard_vgg16

frossard_vgg16

phones2words

phones2words

speechcoco

speechcoco

.gitignore

.gitignore

README.md

README.md

file_listing.txt

file_listing.txt

run.sh

run.sh

Repository files navigation

image2speech

About

Releases

Packages

Languages

jhasegaw/image2speech

Folders and files

Latest commit

History

Repository files navigation

image2speech

About

Resources

Stars

Watchers

Forks

Languages