speech-to-image-chatbot

A web application that generates an image for corresponding textual description. The user is required to enter a text description of a scene. The application will then generate an image that best corresponds to this description. The application uses Generative Adversarial Networks (GANs) trained on a large dataset of images consisting of multiple everyday-object categories.

Explanation:

Input is a text description of a scenario/ object.
GANs accept input in the form of vector representations.
Hence, it is necessary to convert the text description to word embeddings, which are basically vector representations of text description.
Char CNN-RNN model used for conversion to word embeddings.
These vector representations thus produced are then passed to the model through the AJAX calls.
We have used a stacked architecture of Generative Adversarial Networks. This is represented in the form of 2 stages.
Stage-I GAN sketches the primitive shape and colors of a scene.
Stage-II GAN adds finer details to the low-resolution image from the Stage-I.
Final image generated by model is passed back to the chatbot interface through use of AJAX calls.
The image corresponding to the text description is thus rendered in the chabtot interface itself.

For more detailed explanations and involved concepts, please read the Project Report.pdf

How to Run:

Clone/ Download the repository.
Open the folder in terminal.
Type the command : python new_main.py
Open link in terminal in a web browser.
Use the application.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Login_v14		Login_v14
data		data
files		files
flickity-docs		flickity-docs
new_data		new_data
sct		sct
static		static
surprise		surprise
template - Copy		template - Copy
templates		templates
tflearn_logs		tflearn_logs
Project Report.pdf		Project Report.pdf
README.md		README.md
checkpoint		checkpoint
crawler.py		crawler.py
data.pickle		data.pickle
desktop.ini		desktop.ini
hello.py		hello.py
model.tflearn.data-00000-of-00001		model.tflearn.data-00000-of-00001
model.tflearn.index		model.tflearn.index
model.tflearn.meta		model.tflearn.meta
mysqlclient-1.3.13-cp35-cp35m-win_amd64.whl		mysqlclient-1.3.13-cp35-cp35m-win_amd64.whl
new2		new2
new_intent.json		new_intent.json
new_intents.json		new_intents.json
new_main.py		new_main.py
parser.pickle		parser.pickle
script.py		script.py
strings.lua		strings.lua
training_data		training_data
updatedfiles.zip		updatedfiles.zip
upload_to_git.txt		upload_to_git.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speech-to-image-chatbot

About

Releases

Packages

Languages

jmagdum7/speech-to-image-chatbot

Folders and files

Latest commit

History

Repository files navigation

speech-to-image-chatbot

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages