A Knowledge-based Telegram Bot. Final project for the course of NLP (A.Y. 2016/2017) at Sapienza University.
For a better understanding, look at the assignment and at the report.
Where not specified, I assume that the working directory of the terminal is the root of the project.
Install all the needed packages with requirements.txt file:
pip3 install -r requirements.txt
Then download the spaCy model:
python -m spacy download en_core_web_md
And install Torch. For me the following commands worked:
pip3 install http://download.pytorch.org/whl/cu80/torch-0.2.0.post3-cp35-cp35m-manylinux1_x86_64.whl
pip3 install torchvision
You need to install also SQLite since the database uses that engine:
sudo apt-get install sqlite3 libsqlite3-dev
I do not ensure you will have all the dependencies covered, caused by some machine specific package, or something that I've missed. Please be patient, I did my best.
First of all, get the following data and copy them into kbs/data:
Then execute the following commands (it needs 5-10 minutes):
. setup.sh
python make_database.py #answer "yes" to the question about dropping tables
Now you should be able to train all the models by yourself:
python make_models.py
to clean up:
. clean.sh
###Run to run the project:
python main.py
The id name of the telegram bot is: MF_nlp_chatbot
kbs/: the directory which contains the data provided by you, the mirror db and the modules needed for interact with it.ApiManager.py: for interact with the KBDBManager.py: a singleton class for allow easy interaction with dbmodels.py: ORM definitions (SQLAlchemy)kb_mirror.db: the db, can be overwritten and rebuilt with/make_database.py
logs/: the most useful thing I did in my life. The software is highly auditable from that file. I suggest you to check it.models/: the package which contains all the machine learning models of the system and some related utils.workflow/: the core modules for manage the bot workflow. The main classes areWorkflowManagerand the abstract classJob, whose implementations represent each workflow step.main.py: it is the main script; execute it to start the bot.make_database.py: the script that builds all the needed data and populates the database.config.py: file for some global configuration (please, paths should not be changed too easily)constant.py: some global constant useful for the program
exceptions/: single-module package which contains some user-defined exceptions.utils/: misc utils (actually there are implemented very useful methods for the software)handlers/handles.py: the module which contains the main handler for messages. It is only a dispatcher for the right chat_id
If you're lucky, you will find README in package you're interested in with more details.
- the
main.pyinitializes thehandlers/handler.pywhich runs as thread - the handler dispatch by chat_id the message to the correct
WorkflowManagerobject - the Workflow manager, depending on the state in which it is:
- dispatch the message to the scheduled
Jobobject (defined inworkflow/jobs); - waits the end of execution of the called
Job(i.e.Job.__call__()), which returns the new state of theWorkflowManager.
- dispatch the message to the scheduled
- The abstract class overrides the method
__call__()which make it callable. The execution consists of:validate_input(input_text), for validate the input of the Job (eventually overrided by theJobimplementations)the_job(input_text), the actual job. Here the messages are processed, the models are called to predict something etc.get_new_state()has to be overridden in every Job implementation- Sometimes there is the need to share parameters among Jobs: this is done with the
Contextclass, instantiated in the Workflow manager.
The Job classes are defined in workflow/jobs.
Follows the workflow:
- StartJob: The entry point of the application. It expects a "/start" message to begin to chat.
- AskDomainJob: Ask the domain to the user
- ReceiveDomainJob: receive and store the Domain. Schedule the AskQueryEnrichSwitchJob
- AskQueryEnrichSwitchJob: ask to the user if he wants to aks a question or let the bot to ask a question.
- SwitcherQEJob: the job who dispatches the correct Job according to the chosen mode:
- if in Querying mode, the workflow is:
- QueryProcessingJob:
- Predict the relation (
models.relation_classifier) - Detect the concepts into the question (
models.concept_recognizer)
- Predict the relation (
- QueryAnswerGenerationJob:
- Extract the correct concepts
- Query the database to answer correctly
utils/workflow/ConceptSelector - Call the AnswerGenerator (
models.answer_generation)
- QueryProcessingJob:
- if in Enriching mode, the workflow is:
- EnrichingQueryGenerationJob:
- Chose the concept about we want to know more;
- Chose the relation which has less occurrences for the given concept;
- Retrieve a random pattern from the database and replace the concept mentions into the query;
- EnrichingAnswerProcessingJob:
- Detect the concepts into the answer (
models.answer_concept_recognizer) - Update the knowledge base.
- Detect the concepts into the answer (
- EnrichingQueryGenerationJob:
- if in Querying mode, the workflow is:
- FinishWorkflowJob: reschedule the AskDomainJob