## WHO questions & answers new coronavirus (COVID-19)

In connection with the coronavirus and the many questions on this subject, I have created a dataset with frequently asked questions and answers from the world health organisation (WHO). The questions come from this website: https://www.who.int/news-room/q-a-detail/q-a-coronaviruses

On this website you can find an answer to a question you have regarding the corona virus fairly quickly. However, with a chatbot it could go a lot faster. You simply ask your question and immediatly receive a corresponding answer. 

In the code below, I first load the dataset that contains the answers under "Answer" and the context of the answers under "Context". These columns should be interpreted as follows:
- `Answer` is a possible answer to a question.
- `Context` is a possible context in which the answer may apply.

In [6]:
import numpy as np
import pandas as pd

# dataset coronavirus WHO
pd.set_option('max_colwidth', 100)  # Increase column width
data = pd.read_excel("WHO_FAQ.xlsx", encoding='utf8')
data.head()

Unnamed: 0,Context,Answer
0,What is a coronavirus?,Coronaviruses are a large family of viruses which may cause illness in animals or humans.
1,What is a coronavirus?,"In humans, several coronaviruses are known to cause respiratory infections ranging from the comm..."
2,What is COVID-19?,COVID-19 is the infectious disease caused by the most recently discovered coronavirus. This new ...
3,What are the symptoms of COVID-19?,"The most common symptoms of COVID-19 are fever, tiredness, and dry cough. Some patients may have..."
4,What are the symptoms of COVID-19?,Some people become infected but don’t develop any symptoms and don't feel unwell. Most people (a...


To use these answers for a chatbot, we will first calculate the corresponding answer encodings.

### Universal Sentence Encoder Multilingual

Now we can use the USE to create sentence encodings for the possible answers with context and the questions! 

To do so we load the module containing the USE, you can find this here: https://tfhub.dev/google/universal-sentence-encoder-multilingual-qa/3

In [2]:
##### Use USE pretrained model to extract response encodings.
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text
import re

def preprocess_sentences(input_sentences):
    return [re.sub(r'(covid-19|covid)', 'coronavirus', input_sentence, flags=re.I) 
            for input_sentence in input_sentences]
        
# Load module containing USE
module = hub.load('https://tfhub.dev/google/universal-sentence-encoder-multilingual-qa/3')

# Create response embeddings
response_encodings = module.signatures['response_encoder'](
        input=tf.constant(preprocess_sentences(data.Answer)),
        context=tf.constant(preprocess_sentences(data.Context)))['outputs']

Now it is time to test!

The sentence encoding for the question is made in the variable `question_encodings`. Based on this, we choose the most representative answer. 

As described in the blog, the most representative answer is the one whose embedding minimizes the angle with the question embedding. This is equivalent to determining the maximum cosine of the angle, but since each embedding is already a vector of length 1, we only need to determine the maximum of the internal product. In the code below, that's the line:

``np.argmax(np.inner(question_encodings, response_encodings), axis=1)``

In [5]:
test_questions = [
    "What about pregnant women?",
    "Wat is de lengte van de incubatietijd?",
    "Are animals contagious COVID-19?",
    "Are there medicine against the coronavirus?",
    "Can I breastfead when I have COVID-19?",
    "Should I stay inside the house?",  # English questions are also possible.
    "Kann ich mit meinem Hund spazieren gehen?"  # As well as German, and all the other languages supported by use-multilingual.
]

# Create encodings for test questions
question_encodings = module.signatures['question_encoder'](
    tf.constant(preprocess_sentences(test_questions))
)['outputs']

# Get the responses
test_responses = data.Answer[np.argmax(np.inner(question_encodings, response_encodings), axis=1)]

# Show them in a dataframe
pd.DataFrame({'Test Questions': test_questions, 'Test Responses': test_responses})

Unnamed: 0,Test Questions,Test Responses
65,What about pregnant women?,"due to changes in their bodies and immune systems, we know that pregnant women can be badly affe..."
44,Wat is de lengte van de incubatietijd?,"Most estimates of the incubation period for COVID-19 range from 1-14 days, most commonly around ..."
45,Are animals contagious COVID-19?,"Coronaviruses are a large family of viruses that are common in animals. Occasionally, people get..."
32,Are there medicine against the coronavirus?,"Not yet. To date, there is no vaccine and no specific antiviral medicine to prevent or treat COV..."
72,Can I breastfead when I have COVID-19?,Yes. Women with COVID-19 can breastfeed if they wish to do so.
17,Should I stay inside the house?,"Stay home if you feel unwell. If you have a fever, cough and difficulty breathing, seek medical ..."
18,Kann ich mit meinem Hund spazieren gehen?,"If possible, avoid traveling to places – especially if you are an older person or have diabetes..."


### Conclusion

As you can see, the chatbot gives correct answers to questions in multiple languages! The reason we need to replace COVID-19 in the coronavirus dataset is because the model has never seen the word COVID-19 before (it did see coronavirus).

Feel free to test multiple questions, you can do this by adding a question to `test_questions`.

For further questions you could always [contact us directly](mailto:gunjanshah.254@gmail.com)!