## Practice: Question Answering with a Fine-Tuned BERT (and TTS example)
This notebook is based on great [post and corresponding notebook](https://mccormickml.com/2020/03/10/question-answering-with-a-fine-tuned-BERT/) *by Chris McCormick*, but contains minor changes.

What does it mean for BERT to achieve "human-level performance on Question Answering"? Is BERT the greatest search engine ever, able to find the answer to any question we pose it?

In **Part 1** of this post / notebook, we will discuss what it really means to apply BERT to QA, and illustrate the details.

**Part 2** contains example code--we'll be downloading a model that's *already been fine-tuned* for question answering, and try it out on our own text! 

For something like text classification, you definitely want to fine-tune BERT on your own dataset. For question answering, however, it seems like you may be able to get decent results using a model that's already been fine-tuned on the SQuAD benchmark. In this Notebook, we'll do exactly that, and see that it performs well on text that wasn't in the SQuAD dataset.

In **Part 3** we will apply the same approach to Russian language using the model pre-trained on SberQuAD dataset.

And in **Part 4** we will generate question and answer as audio (for now in English).

**Links**

* The [video walkthrough](https://youtu.be/l8ZYCvgGu0o) on this topic. 
* The blog post version.
* The [original Colab Notebook](https://colab.research.google.com/drive/1uSlWtJdZmLrI3FCNIlUHFxwAJiSu2J0-).

*If running on colab, uncomment the following cell*

In [None]:
# !pip install numpy scipy librosa unidecode inflect librosa transformers

In [None]:
import torch
if torch.cuda.is_available():
    device = torch.device('cuda:0')
    tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2', **{'map_location': device})
else:
    device = torch.device('cpu')
    tacotron2 = None
    print('Unfortunately, Tacotron2 by NVIDIA infers only on GPU, so the Part 4 will not work on CPU-only machine.')

## Part 1: How BERT is applied to Question Answering

### The SQuAD v1.1 Benchmark

When someone mentions "Question Answering" as an application of BERT, what they are really referring to is applying BERT to the Stanford Question Answering Dataset (SQuAD).

The task posed by the SQuAD benchmark is a little different than you might think. Given a question, and *a passage of text containing the answer*, BERT needs to highlight the "span" of text corresponding to the correct answer. 

The SQuAD homepage has a fantastic tool for exploring the questions and reference text for this dataset, and even shows the predictions made by top-performing models.

For example, here are some [interesting examples](https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/Super_Bowl_50.html?model=r-net+%20(ensemble)%20(Microsoft%20Research%20Asia)&version=1.1) on the topic of Super Bowl 50.


## BERT Input Format

To feed a QA task into BERT, we pack both the question and the reference text into the input.

![Input format for QA](http://www.mccormickml.com/assets/BERT/SQuAD/input_formatting.png)

The two pieces of text are separated by the special `[SEP]` token. 

BERT also uses "Segment Embeddings" to differentiate the question from the reference text. These are simply two embeddings (for segments "A" and "B") that BERT learned, and which it adds to the token embeddings before feeding them into the input layer. 

## Start & End Token Classifiers

BERT needs to highlight a "span" of text containing the answer--this is represented as simply predicting which token marks the start of the answer, and which token marks the end.

![Start token classification](http://www.mccormickml.com/assets/BERT/SQuAD/start_token_classification.png)

For every token in the text, we feed its final embedding into the start token classifier. The start token classifier only has a single set of weights (represented by the blue "start" rectangle in the above illustration) which it applies to every word.

After taking the dot product between the output embeddings and the 'start' weights, we apply the softmax activation to produce a probability distribution over all of the words. Whichever word has the highest probability of being the start token is the one that we pick.

We repeat this process for the end token--we have a separate weight vector this.

![End token classification](http://www.mccormickml.com/assets/BERT/SQuAD/end_token_classification.png)

### Part 2: Example Code

In the example code below, we'll be downloading a model that's *already been fine-tuned* for question answering, and try it out on our own text.

If you do want to fine-tune on your own dataset, it is possible to fine-tune BERT for question answering yourself. See [run_squad.py](https://github.com/huggingface/transformers/blob/master/examples/run_squad.py) in the `transformers` library. However,you may find that the below "fine-tuned-on-squad" model already does a good job, even if your text is from a different domain. 

> Note: The example code in this Notebook is a commented and expanded version of the short example provided in the `transformers` documentation [here](https://huggingface.co/transformers/model_doc/bert.html?highlight=bertforquestionanswering#transformers.BertForQuestionAnswering).

### 1. Install huggingface transformers library

This example uses the `transformers` [library](https://github.com/huggingface/transformers/) by huggingface. We'll start by installing the package.

In [None]:
import torch

### 2. Load Fine-Tuned BERT-large

For Question Answering we use the `BertForQuestionAnswering` class from the `transformers` library.

This class supports fine-tuning, but for this example we will keep things simpler and load a BERT model that has already been fine-tuned for the SQuAD benchmark.

The `transformers` library has a large collection of pre-trained models which you can reference by name and load easily. The full list is in their documentation [here](https://huggingface.co/transformers/pretrained_models.html).

For Question Answering, they have a version of BERT-large that has already been fine-tuned for the SQuAD benchmark. 

BERT-large is really big... it has 24-layers and an embedding size of 1,024, for a total of 340M parameters! Altogether it is 1.34GB, so expect it to take a couple minutes to download to your Colab instance. 

(Note that this download is not using your own network bandwidth--it's between the Google instance and wherever the model is stored on the web).

Note: I believe this model was trained on version 1 of SQuAD, since it's not outputting whether the question is "impossible" to answer from the text (which is part of the task in v2 of SQuAD).


In [None]:
from transformers import DistilBertForQuestionAnswering

model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')


Load the tokenizer as well. 

Side note: Apparently the vocabulary of this model is identicaly to the one in bert-base-uncased. You can load the tokenizer from `bert-base-uncased` and that works just as well.

In [None]:
from transformers import DistilBertTokenizer

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')

### 3. Ask a Question

Now we're ready to feed in an example!

A QA example consists of a question and a passage of text containing the answer to that question.

Let's try an example using the text in this tutorial!

In [None]:
question = "How many parameters does BERT-large have?"
answer_text = "BERT-large is really big... it has 24-layers and an embedding size of 1,024, for a total of 340M parameters! Altogether it is 1.34GB, so expect it to take a couple minutes to download to your Colab instance."

We'll need to run the BERT tokenizer against both the `question` and the `answer_text`. To feed these into BERT, we actually concatenate them together and place the special [SEP] token in between.


In [None]:
# Apply the tokenizer to the input text, treating them as a text-pair.
input_ids = tokenizer.encode(question, answer_text)

print('The input has a total of {:} tokens.'.format(len(input_ids)))

Just to see exactly what the tokenizer is doing, let's print out the tokens with their IDs.

In [None]:
# BERT only needs the token IDs, but for the purpose of inspecting the 
# tokenizer's behavior, let's also get the token strings and display them.
tokens = tokenizer.convert_ids_to_tokens(input_ids)

# For each token and its id...
for token, id in zip(tokens, input_ids):
    
    # If this is the [SEP] token, add some space around it to make it stand out.
    if id == tokenizer.sep_token_id:
        print('')
    
    # Print the token string and its ID in two columns.
    print('{:<12} {:>6,}'.format(token, id))

    if id == tokenizer.sep_token_id:
        print('')
    

We've concatenated the `question` and `answer_text` together, but BERT still needs a way to distinguish them. BERT has two special "Segment" embeddings, one for segment "A" and one for segment "B". Before the word embeddings go into the BERT layers, the segment A embedding needs to be added to the `question` tokens, and the segment B embedding needs to be added to each of the `answer_text` tokens. 

These additions are handled for us by the `transformer` library, and all we need to do is specify a '0' or '1' for each token. 

Note: In the `transformers` library, huggingface likes to call these `token_type_ids`, but I'm going with `segment_ids` since this seems clearer, and is consistent with the BERT paper.

In [None]:
# # Search the input_ids for the first instance of the `[SEP]` token.
# sep_index = input_ids.index(tokenizer.sep_token_id)

# # The number of segment A tokens includes the [SEP] token istelf.
# num_seg_a = sep_index + 1

# # The remainder are segment B.
# num_seg_b = len(input_ids) - num_seg_a

# # Construct the list of 0s and 1s.
# segment_ids = [0]*num_seg_a + [1]*num_seg_b

# # There should be a segment_id for every input token.
# assert len(segment_ids) == len(input_ids)

>*Side Note: Where's the padding?*
>
> The original [example code](https://huggingface.co/transformers/model_doc/bert.html?highlight=bertforquestionanswering#transformers.BertForQuestionAnswering) does not perform any padding. I suspect that this is because we are only feeding in a *single example*. If we instead fed in a batch of examples, then we would need to pad or truncate all of the samples in the batch to a single length, and supply an attention mask to tell BERT to ignore the padding tokens. 

We're ready to feed our example into the model!



In [None]:
# input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
# start_positions = torch.tensor([1])
# end_positions = torch.tensor([3])
# outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)

In [None]:
# Run our example through the model.
start_scores, end_scores = model(torch.tensor([input_ids])) # The tokens representing our input text.
                                #  token_type_ids=torch.tensor([segment_ids])) # The segment IDs to differentiate question from answer_text


Now we can highlight the answer just by looking at the most probable start and end words. 

In [None]:
# Find the tokens with the highest `start` and `end` scores.
answer_start = torch.argmax(start_scores)
answer_end = torch.argmax(end_scores)

# Combine the tokens in the answer and print it out.
answer = ' '.join(tokens[answer_start:answer_end+1])

print('Answer: "' + answer + '"')

It got it right! Awesome :)

> *Side Note: It's a little naive to pick the highest scores for start and end--what if it predicts an end word that's before the start word?! The correct implementation is to pick the highest total score for which end >= start.*

With a little more effort, we can reconstruct any words that got broken down into subwords.

In [None]:
# Start with the first token.
answer = tokens[answer_start]

# Select the remaining answer tokens and join them with whitespace.
for i in range(answer_start + 1, answer_end + 1):
    
    # If it's a subword token, then recombine it with the previous token.
    if tokens[i][0:2] == '##':
        answer += tokens[i][2:]
    
    # Otherwise, add a space then the token.
    else:
        answer += ' ' + tokens[i]

print('Answer: "' + answer + '"')

### 4. Visualizing Scores

Let's see what the scores were for all of the words. The following cells generate bar plots showing the start and end scores for every word in the input.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Use plot styling from seaborn.
sns.set(style='darkgrid')

# Increase the plot size and font size.
#sns.set(font_scale=1.5)
plt.rcParams["figure.figsize"] = (16,8)

Retrieve all of the start and end scores, and use all of the tokens as x-axis labels.

In [None]:
# Pull the scores out of PyTorch Tensors and convert them to 1D numpy arrays.
s_scores = start_scores.detach().numpy().flatten()
e_scores = end_scores.detach().numpy().flatten()

# We'll use the tokens as the x-axis labels. In order to do that, they all need
# to be unique, so we'll add the token index to the end of each one.
token_labels = []
for (i, token) in enumerate(tokens):
    token_labels.append('{:} - {:>2}'.format(token, i))


Create a bar plot showing the score for every input word being the "start" word.

In [None]:
import matplotlib

In [None]:
matplotlib.rcParams.update({'figure.figsize': (16, 12), 'font.size': 16})

In [None]:
# Create a barplot showing the start word score for all of the tokens.
ax = sns.barplot(x=token_labels, y=s_scores, ci=None)

# Turn the xlabels vertical.
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="center")

# Turn on the vertical grid to help align words to scores.
ax.grid(True)

plt.title('Start Word Scores')

plt.show()

Create a second bar plot showing the score for every input word being the "end" word.

In [None]:
# Create a barplot showing the end word score for all of the tokens.
ax = sns.barplot(x=token_labels, y=e_scores, ci=None)

# Turn the xlabels vertical.
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="center")

# Turn on the vertical grid to help align words to scores.
ax.grid(True)

plt.title('End Word Scores')

plt.show()

**Alternate View**

I also tried visualizing both the start and end scores on a single bar plot, but I think it may actually be more confusing then seeing them separately. 

In [None]:
import pandas as pd

# Store the tokens and scores in a DataFrame. 
# Each token will have two rows, one for its start score and one for its end
# score. The "marker" column will differentiate them. A little wacky, I know.
scores = []
for (i, token_label) in enumerate(token_labels):

    # Add the token's start score as one row.
    scores.append({'token_label': token_label, 
                   'score': s_scores[i],
                   'marker': 'start'})
    
    # Add  the token's end score as another row.
    scores.append({'token_label': token_label, 
                   'score': e_scores[i],
                   'marker': 'end'})
    
df = pd.DataFrame(scores)


In [None]:
sns.set(font_scale=1.8)

In [None]:
# Draw a grouped barplot to show start and end scores for each word.
# The "hue" parameter is where we tell it which datapoints belong to which
# of the two series.
g = sns.catplot(x="token_label", y="score", hue="marker", data=df,
                kind="bar", height=6, aspect=4)

# Turn the xlabels vertical.
g.set_xticklabels(g.ax.get_xticklabels(), rotation=90, ha="center")

# Turn on the vertical grid to help align words to scores.
g.ax.grid(True)


## 5. More Examples

Turn the QA process into a function so we can easily try out other examples.

In [None]:
def answer_question(question, answer_text):
    '''
    Takes a `question` string and an `answer_text` string (which contains the
    answer), and identifies the words within the `answer_text` that are the
    answer. Prints them out.
    '''
    # ======== Tokenize ========
    # Apply the tokenizer to the input text, treating them as a text-pair.
    input_ids = tokenizer.encode(question, answer_text)

    # Report how long the input sequence is.
    print('Query has {:,} tokens.\n'.format(len(input_ids)))

    # ======== Evaluate ========
    # Run our example question through the model.
    start_scores, end_scores = model(torch.tensor([input_ids]))#, # The tokens representing our input text.
                                    # token_type_ids=torch.tensor([segment_ids])) # The segment IDs to differentiate question from answer_text

    # ======== Reconstruct Answer ========
    # Find the tokens with the highest `start` and `end` scores.
    answer_start = torch.argmax(start_scores)
    answer_end = torch.argmax(end_scores)

    # Get the string versions of the input tokens.
    tokens = tokenizer.convert_ids_to_tokens(input_ids)

    # Start with the first token.
    answer = tokens[answer_start]

    # Select the remaining answer tokens and join them with whitespace.
    for i in range(answer_start + 1, answer_end + 1):
        
        # If it's a subword token, then recombine it with the previous token.
        if tokens[i][0:2] == '##':
            answer += tokens[i][2:]
        
        # Otherwise, add a space then the token.
        else:
            answer += ' ' + tokens[i]

    print('Answer: "' + answer + '"')
    return answer

As our reference text, we've taken the Abstract of the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf).


In [None]:
import textwrap

# Wrap text to 80 characters.
wrapper = textwrap.TextWrapper(width=80) 

bert_abstract = "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement)."

print(wrapper.fill(bert_abstract))

-----------------------------
Ask BERT what its name stands for (the answer is in the first sentence of the abstract).

In [None]:
question = "What does the 'B' in BERT stand for?"

ans = answer_question(question, bert_abstract)

---------------------
Ask BERT about example applications of itself :)

The answer to the question comes from this passage from the abstract: 

> "...BERT model can be finetuned with just one additional output
layer to create state-of-the-art models for **a wide range of tasks, such as
question answering and language inference,** without substantial taskspecific
architecture modifications."

In [None]:
question = "What are some example applications of BERT?"

ans = answer_question(question, bert_abstract)

### Part 3. RuBERT for question answering.
Here we will use the model pre-trained on the SberQuAD dataset from the [SDSJ-2017 challenge problem B](https://github.com/sberbank-ai/data-science-journey-2017/tree/master/problem_B).

In [3]:
from deeppavlov import build_model, configs

In [4]:
model_ru = build_model(configs.squad.squad_ru_rubert, download=True)

2020-06-02 21:53:13.528 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/squad_model_ru_rubert.tar.gz download because of matching hashes
2020-06-02 21:53:15.431 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz download because of matching hashes
[nltk_data] Downloading package punkt to /Users/nexes/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /Users/nexes/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package perluniprops to
[nltk_data]     /Users/nexes/nltk_data...
[nltk_data]   Package perluniprops is already up-to-date!
[nltk_data] Downloading package nonbreaking_prefixes to
[nltk_data]     /Users/nexes/nltk_data...
[nltk_data]   Package nonbreaking_prefixes is already up-to-date!












The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Use keras.layers.Dense instead.
Instructions for updating:
Please use `layer.__call__` method instead.


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.





Instructions for updating:
Use standard file APIs to check for files with this prefix.


2020-06-02 21:53:58.581 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /Users/nexes/.deeppavlov/models/squad_ru_bert/model_rubert]



INFO:tensorflow:Restoring parameters from /Users/nexes/.deeppavlov/models/squad_ru_bert/model_rubert


In [5]:
text = """Курорты Крыма смогут принимать туристов с 15 июня, сначала — только жителей 
этого региона. Основной поток внешних туристов ожидается в июле и августе, 
говорит Наталья Стамбульникова. 
Отели закупают рециркуляторы, маски, средства личной гигиены, отметила 
председатель АСМОК. В номерах будут размещаться антибактериальные гели, 
салфетки и кварцевые лампы.
20 мая Роспотребнадзор рекомендовал санаториям селить туристов по одному, 
если это не члены одной семьи, отменить групповые процедуры, обеззараживать 
медкабинеты после каждого посетителя и приносить еду в номер, если в столовой 
невозможно рассадить людей на дистанции. Также в требованиях указано, что 
сотрудники должны подтвердить отсутствие коронавирусной инфекции справкой и 
тестироваться не реже раза в неделю.
Глава Крыма Сергей Аксенов назвал некоторые из требований Роспотребнадзора 
невыполнимыми и попросил их смягчить. В частности, он выступил против 
необходимости тестировать всех сотрудников и против рекомендации 
переоборудовать отель в «обсерватор», если в нем будут выявлены больные 
коронавирусом. Также Аксенов считает необходимым смягчить требования к 
социальному дистанцированию на пляжах."""

In [6]:
model_ru([text], ['Кто выступил против необходимости тестировать всех сотрудников?'])

[['Сергей Аксенов'], [791], [5127675.0]]

In [7]:
model_ru([text], ['О каком регионе идет речь?'])

[['Крыма'], [8], [2516.33154296875]]

In [8]:
model_ru([text], ['О каком времени года идет речь?'])

[['в июле и августе'], [134], [1029.145751953125]]

In [9]:
text = """Первая многоразовая ступень ракеты-носителя Falcon 9 успешно отделилась через две с половиной минуты после старта и автоматически приземлилась на плавучую платформу Of Course I Still Love You у берегов Флориды. Через 12 минут после запуска космический корабль Crew Dragon вышел на расчетную орбиту и отделился от второй ступени ракеты. Сближение корабля Crew Dragon с Международной космической станцией запланировано на 31 мая. К стыковочному адаптеру на узловом модуле «Гармония» американского сегмента МКС Crew Dragon должен причалить в ручном или, при необходимости, в автоматическом режиме. Эта процедура запланирована на 10:29 по времени Восточного побережья США (17:29 по московскому времени). В испытательном полете DM2 астронавт Херли является командиром космического корабля (spacecraft commander), а его напарник Бенкен — командир по операциям стыковки и расстыковки (joint operations commander). Фактически это означает, что именно Херли управляет Crew Dragon в полете к МКС, к которой они должны пристыковаться в течение суток после старта. Херли и Бенкен также будут выполнять необходимые для сертификации НАСА проверки систем корабля в полете. Во время полета Херли и Бенкен провели небольшую экскурсию по Crew Dragon."""

In [14]:
text = """Первая многоразовая ступень ракеты-носителя Falcon 9 успешно отделилась через две с половиной минуты после старта и автоматически приземлилась на плавучую платформу Of Course I Still Love You у берегов Флориды. Через 12 минут после запуска космический корабль Crew Dragon вышел на расчетную орбиту и отделился от второй ступени ракеты.

Сближение корабля Crew Dragon с Международной космической станцией запланировано на 31 мая. К стыковочному адаптеру на узловом модуле «Гармония» американского сегмента МКС Crew Dragon должен причалить в ручном или, при необходимости, в автоматическом режиме. Эта процедура запланирована на 10:29 по времени Восточного побережья США (17:29 по московскому времени).

В испытательном полете DM2 астронавт Херли является командиром космического корабля (spacecraft commander), а его напарник Бенкен — командир по операциям стыковки и расстыковки (joint operations commander). Фактически это означает, что именно Херли управляет Crew Dragon в полете к МКС, к которой они должны пристыковаться в течение суток после старта. Херли и Бенкен также будут выполнять необходимые для сертификации НАСА проверки систем корабля в полете.

Во время полета Херли и Бенкен провели небольшую экскурсию по Crew Dragon."""

In [15]:
print(text)

Первая многоразовая ступень ракеты-носителя Falcon 9 успешно отделилась через две с половиной минуты после старта и автоматически приземлилась на плавучую платформу Of Course I Still Love You у берегов Флориды. Через 12 минут после запуска космический корабль Crew Dragon вышел на расчетную орбиту и отделился от второй ступени ракеты.

Сближение корабля Crew Dragon с Международной космической станцией запланировано на 31 мая. К стыковочному адаптеру на узловом модуле «Гармония» американского сегмента МКС Crew Dragon должен причалить в ручном или, при необходимости, в автоматическом режиме. Эта процедура запланирована на 10:29 по времени Восточного побережья США (17:29 по московскому времени).

В испытательном полете DM2 астронавт Херли является командиром космического корабля (spacecraft commander), а его напарник Бенкен — командир по операциям стыковки и расстыковки (joint operations commander). Фактически это означает, что именно Херли управляет Crew Dragon в полете к МКС, к которой о

In [16]:
model_ru([text], ['Когда отделилась первая ступень?'])

[['через две с половиной минуты после старта'], [72], [2055723.75]]

In [17]:
model_ru([text], ['На какую дату запланирована стыковка?'])

[['на 31 мая'], [418], [31752.916015625]]

In [23]:
model_ru([text], ['Кто участвует в полете?'])

[['астронавт Херли'], [729], [139.6282958984375]]

In [24]:
model_ru([text], ['Кто участвует в полете кроме астронавта Херли?'])

[['Бенкен'], [1063], [13.483194351196289]]

In [27]:
model_ru([text], ['Какая ступень приземлилась на плавучую платформу Of Course I Still Love You?'])

[['Первая многоразовая ступень ракеты-носителя Falcon 9'], [0], [582402.8125]]

### Part 4. Text to speech (with Tacotron 2).

In [None]:
import numpy as np
from scipy.io.wavfile import write

In [None]:
assert tacotron2 is not None, 'Tacotron2 by NVIDIA requires CUDA-compatible GPU to infer'

In [None]:
tacotron2 = tacotron2.to(device)
tacotron2.eval()

In [None]:
waveglow = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_waveglow')
waveglow = waveglow.remove_weightnorm(waveglow)
waveglow = waveglow.to('cuda')
waveglow.eval()

In [None]:
text = question

In [None]:
def get_audio(text, audio_name='audio.wav'):
    # preprocessing
    sequence = np.array(tacotron2.text_to_sequence(text, ['english_cleaners']))[None, :]
    sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64)

    # run the models
    with torch.no_grad():
        _, mel, _, _ = tacotron2.infer(sequence)
        audio = waveglow.infer(mel)
    audio_numpy = audio[0].data.cpu().numpy()
    rate = 22050
    write(audio_name, rate, audio_numpy)
    from IPython.display import Audio
    return Audio(audio_numpy, rate=rate)

In [None]:
text = 'Your question is: {}\n Answer is: {}'.format(question, ans)

In [None]:
get_audio(text)