Conv_bot

This repo contains the work done for the summer project - making a conversational bot.:robot:

Group :

Varun Khatri, Prateek Jain, Adit Khokhar, Atharva Umbarkar, Ishir Roongta

(P.S. For checkpoints contact any of the above mentioned 😃 )

Installation and usage

The requirements are in the requirements.txt file.

pip install -r requirements.txt
python3 -m spacy download en

The main file to focus is tf_attention_model.py.

To run the ChatBot -> python3 chatbot.py

To run the ConversationalBot -> python3 conversational_bot.py

A demo video :

Project Brief

The main aim of this project was to make a conversation bot able to take in audio input and output a meaningful reply keeping in mind factors like context ant intent in the input given by user.

The three main parts of this project were:

Speech to text
Topic attention (to generate a response)
Text to speech

CTC_MODEL

This model is implemented to convert the audio messages of the user into text.

Please look at the file for the proper implementation :

ctc_model

Dataset opted for training: Librespeech

ENCODER - DECODER MODEL

This model is implemtented to cover the response generation part of the conversational bot. We trained this model on the dataset Opensubtitles

LDA MODEL

This model is implemented to add topic awareness to ENCODER - DECODER Model for better response generation by focusing it's "attention" to only specific parts of the input rather than the whole sentence.

Optimal Number of Topics

This graph shows the optimal number of topics we need to set for news articles dataset.

Gensim LDA Model parameters

corpus — Stream of document vectors or sparse matrix of shape (num_terms, num_documents) <
id2word – Mapping from word IDs to words. It is used to determine the vocabulary size, as well as for debugging and topic printing.
num_topics — The number of requested latent topics to be extracted from the training corpus.
random_state — Either a randomState object or a seed to generate one. Useful for reproducibility.
update_every — Number of documents to be iterated through for each update. Set to 0 for batch learning, > 1 for online iterative learning.
chunksize — Number of documents to be used in each training chunk.
passes — Number of passes through the corpus during training.
alpha — auto: Learns an asymmetric prior from the corpus
per_word_topics — If True, the model also computes a list of topics, sorted in descending order of most likely topics for each word, along with their phi values multiplied by the feature-length (i.e. word count)

About pyLDAvis

The size of the bubbles tells us how dominant a topic is across all the documents (our corpus)
The words on the right are the keywords driving that topic
The closer the bubbles the more similar the topic. The farther they are apart the less similar
Preferably, we want non-overlapping bubbles as much as possible spread across the chart.

Text to Audio

gTTs, a python library was used to make a function to output audio from the generated responses.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
dic		dic
examples		examples
extra/aiml_chatbots/pizza_chatbot		extra/aiml_chatbots/pizza_chatbot
lda_checkpoint		lda_checkpoint
models		models
prefix-beam-search		prefix-beam-search
training_checkpoints		training_checkpoints
README.md		README.md
chatbot.py		chatbot.py
convbot.mp4		convbot.mp4
conversational_bot.py		conversational_bot.py
ctc_model.py		ctc_model.py
dependencies.sh		dependencies.sh
gen_audio.py		gen_audio.py
gtts_sample_output.mp3		gtts_sample_output.mp3
requirements.txt		requirements.txt
sample.png		sample.png
sample2.png		sample2.png
speech.ipynb		speech.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conv_bot

Installation and usage

Project Brief

CTC_MODEL

ENCODER - DECODER MODEL

LDA MODEL

Optimal Number of Topics

Gensim LDA Model parameters

About pyLDAvis

Text to Audio

References you might find useful

About

Releases

Packages

Contributors 4

Languages

isro01/Conv_bot

Folders and files

Latest commit

History

Repository files navigation

Conv_bot

Installation and usage

Project Brief

CTC_MODEL

ENCODER - DECODER MODEL

LDA MODEL

Optimal Number of Topics

Gensim LDA Model parameters

About pyLDAvis

Text to Audio

References you might find useful

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages