Skip to content

heig-iict-ida/PLACAT

Repository files navigation

PLACAT

PLACAT is a voice-based conversational agent built using the Google Home platform, with the goal of combining the advantages of chatbots (user-friendly but not goal- oriented) with the capacities of question answering (QA) systems (which lack interactivity). Thanks to a controller that directs user input either to the chatbot or to the QA system by recognizing dialogue acts, we obtain a spoken QA chatbot over Wikipedia, implemented as a Google Home Action.

The development of PLACAT is supported by a grant from the HES-SO (AGP n. 82681). The main developer is Gabriel Luthier and the principal investigator is Andrei Popescu-Belis, both at HEIG-VD, Yverdon-les-Bains, Switzerland. The outcomes of the PLACAT project are summarized in the following article: Luthier G. and Popescu-Belis A., Chat or Learn: a Data-Driven Robust Question-Answering System, Proceedings of LREC 2020 (12th Language Resources and Evaluation Conference), Marseille, 11-16 May 2020, p. 5474-5480.

Installation

To run this application, you first need to have an Elasticsearch index containing Wikipedia pages. Then you'll have to download the required models. This application has been tested using python=3.7.3 (it appears to break with python 3.8).

Elasticsearch

  1. Download and install Elasticsearch (installation steps are listed at the bottom of the page). We used version 6.3.1.
  2. Download a CirrusSearch dump of Wikipedia (a dump of Wikipedia pages in a format enabling indexing on Elasticsearch). The first file named enwiki-20190114-cirrussearch-content.json.gz (or similar) is a dump of the English Wikipedia. For a smaller file, e.g. for testing, you can try first the Simple English dump named simplewiki-20190114-cirrussearch-content.json.gz.
  3. Run Elasticserach: systemctl start elasticsearch.
  4. Create a new index: curl -X PUT "localhost:9200/enwiki".
  5. Cut the dump in multiple files (the Bulk API accepts mass uploads in ndjson format but does not handle big files):
export dump=enwiki-20190114-cirrussearch-content.json.gz
export index=enwiki

mkdir chunks
cd chunks
zcat ../$dump | split -a 10 -l 500 - $index

for file in *; do
  echo -n "${file}:  "
  took=$(curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/$index/_bulk --data-binary @$file |
    grep took | cut -d':' -f 2 | cut -d',' -f 1)
  printf '%7s\n' $took
  [ "x$took" = "x" ] || rm $file
done
  1. You can now test the index by executing a simple search query:
curl -X GET "localhost:9200/$index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "simple_query_string" : {
        "query": "Switzerland",
        "fields": ["title"]
    }
  }
}
'
  1. [Optional] Download and install Kibana to visualize the data.
  2. [Optional] If you want to keep only some attributes in the index:
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "enwiki",
    "_source": ["title", "opening_text"]
  },
  "dest": {
    "index": "enwiki_clean"
  }
}
'
  1. [Optional] If you want to delete individual pages (which may just add noise to the QA system):
# Find the page's id
curl -X GET "localhost:9200/enwiki/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": { "title": "where is where" }
  }
}
'

# Test if the id is the right one
curl -X GET "localhost:9200/enwiki/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "terms": { "_id": [ "36897462" ] }
  }
}
'

# Delete the page
curl -X DELETE "localhost:9200/enwiki/page/36897462"

[Optional] Google Action & Dialogflow (to use PLACAT on the Google Home smart speaker)

  1. Check that your Google account has the following permissions enabled at the Activity Controls: Web & App Activity, Device Information and Voice & Audio Activity
  2. Create a new Google Action project at the Actions Console. Then create a new Action with a Custom intent.
  3. Once redirected to Dialogflow, create a new agent.
  4. In this agent, create a new intent called question.
  5. Under Action and parameters, add a new parameter named question with required checked, @sys.any as entity, $question as value, and is list unchecked. Write the prompt text that you wish to use.
  6. Under Training phrases, add any noun (for instance "banana") and double click on it to bind it to the @sys.any:question entity.
  7. Delete all text responses under Responses and check Enable webhook call for this intent under Fulfillment.
  8. In the tab Fulfillment for the agent, specify the URL for your webhook, which you must enable. For testing purposes, you can use ngrok.

Models

  1. Download and unzip the chatbot model 8000_checkpoint.tar (488MB) into the data/save/bnc_cornell/2-2_500/ folder. The model has been trained using data from the Cornell Movie-Dialogs Corpus and the British National Corpus zipped up together.
  2. Download and unzip the question-answering model pytorch_model.bin (387MB) into the bert-model/ folder.
  3. Download the controller model controller.pt (132KB) into the data/ folder.
  4. Install the dependent packages, for instance into a virtual environment with conda install --file requirements.txt. You might need to add conda-forge's channel: conda config --add channels conda-forge and then conda config --set channel_priority strict. You might as well need to install some packages manually.
  5. Run python -m spacy download en_core_web_lg to download the model used by the neuralcoref module to enable pronouns resolution.
  6. Execute ./run_backend.sh to run PLACAT

Test the application

Use one of the following methods:

  1. Web interface at http://127.0.0.1:5000/chat once the server is up (adjust address and/or port depending on your server).
  2. qa.py script to test one question: python qa.py -q What is penicillin ?
  3. Simulator on Dialogflow, if you have set it up in the optional step.

About

PLACAT is a voice-based conversational agent with the goal of combining the advantages of chatbots with the capacities of question answering systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •