# Deeppavlov BERT Showcase


In [None]:
! pip install deeppavlov

## BERT for text classification

Install requirements for BERT-based classification model trained to detect insults in [Social Commentary](https://www.kaggle.com/c/detecting-insults-in-social-commentary):

In [None]:
! python -m deeppavlov install insults_kaggle_bert

Interact with text classification model with DeepPavlov Python API:

In [None]:
from deeppavlov import build_model, configs

model = build_model(configs.classifiers.insults_kaggle_bert, download=True) # download=True if model is not downloaded yet

2020-01-05 17:51:53.448 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip download because of matching hashes
2020-01-05 17:51:53.810 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/datasets/insults_data.tar.gz download because of matching hashes
2020-01-05 17:52:02.904 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/classifiers/insults_kaggle_v3.tar.gz download because of matching hashes
2020-01-05 17:52:02.988 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/classes.dict]
2020-01-05 17:52:22.193 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/model]


INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/model


In [None]:
model(['hey, how are you?', 'You are so stupid!'])

['Not Insult', 'Insult']

**Showing How to train on your own Data set**

In [None]:
import json
from pprint import pprint
model_config = json.load(open(configs.classifiers.insults_kaggle_bert))

pprint(model_config['dataset_reader'])
pprint(model_config['metadata']['variables'])

{'class_name': 'basic_classification_reader',
 'data_path': '{DOWNLOADS_PATH}/insults_data',
 'x': 'Comment',
 'y': 'Class'}
{'DOWNLOADS_PATH': '{ROOT_PATH}/downloads',
 'MODELS_PATH': '{ROOT_PATH}/models',
 'MODEL_PATH': '{MODELS_PATH}/classifiers/insults_kaggle_v3',
 'ROOT_PATH': '~/.deeppavlov'}


Keep your data in this directory like this:

In [None]:
! ls ~/.deeppavlov/downloads/insults_data/

test.csv  train.csv  valid.csv


In [None]:
! head ~/.deeppavlov/downloads/insults_data/train.csv

Comment,Class
"""You fuck your dad.""",Insult
"""i really don't understand your point.\xa0 It seems that you are mixing apples and oranges.""",Not Insult
"""A\\xc2\\xa0majority of Canadians can and has been wrong before now and will be again.\\n\\nUnless you're supportive of the idea that nothing is full proof or perfect so you take your chances and if we should inadvertently kill your son or daughter then them's the breaks and we can always regard you as collateral damage like in wartime - and sorry, but\\xc2\\xa0the cheques in the mail. """,Not Insult
"""listen if you dont wanna get married to a man or a women DONT DO IT. what would it bother you if gay people got married stay in your lane do you let them do them. And your god is so nice but quick to judg if your not like him, thought you wasnt suppose to judge people.""",Not Insult
"""C\xe1c b\u1ea1n xu\u1ed1ng \u0111\u01b0\u1eddng bi\u1ec3u t\xecnh 2011 c\xf3 \xf4n ho\xe0 kh\xf4ng ? \nC\xe1c ng\u01b0 d\xe2n ng\u1ed3i cu\xed \u0111\

In [None]:
from deeppavlov import train_model
model = train_model(model_config)




2020-01-05 14:20:59.435 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/classes.dict]
2020-01-05 14:20:59.442 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 101: [saving vocabulary to /root/.deeppavlov/models/classifiers/insults_kaggle_v3/classes.dict]
2020-01-05 14:21:20.125 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/model]


INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/model


2020-01-05 14:27:19.0 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 198: Initial best roc_auc of 0.9255


{"valid": {"eval_examples_count": 2647, "metrics": {"roc_auc": 0.9255, "accuracy": 0.8659, "f1_macro": 0.8075}, "time_spent": "0:05:58", "epochs_done": 0, "batches_seen": 0, "train_examples_seen": 0, "impatience": 0, "patience_limit": 5}}
{"train": {"eval_examples_count": 64, "metrics": {"roc_auc": 0.9989, "accuracy": 0.9688, "f1_macro": 0.9626}, "time_spent": "0:30:52", "epochs_done": 1, "batches_seen": 62, "train_examples_seen": 3947, "learning_rate": 1e-05, "loss": 0.17253795109929576}}


2020-01-05 14:58:08.576 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 211: Did not improve on the roc_auc of 0.9255


{"valid": {"eval_examples_count": 2647, "metrics": {"roc_auc": 0.9158, "accuracy": 0.8632, "f1_macro": 0.825}, "time_spent": "0:36:48", "epochs_done": 1, "batches_seen": 62, "train_examples_seen": 3947, "impatience": 1, "patience_limit": 5}}
{"train": {"eval_examples_count": 64, "metrics": {"roc_auc": 1.0, "accuracy": 1.0, "f1_macro": 1.0}, "time_spent": "1:01:30", "epochs_done": 2, "batches_seen": 124, "train_examples_seen": 7894, "learning_rate": 1e-05, "loss": 0.061773255557542844}}


2020-01-05 15:28:47.120 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 211: Did not improve on the roc_auc of 0.9255


{"valid": {"eval_examples_count": 2647, "metrics": {"roc_auc": 0.9098, "accuracy": 0.8655, "f1_macro": 0.8172}, "time_spent": "1:07:26", "epochs_done": 2, "batches_seen": 124, "train_examples_seen": 7894, "impatience": 2, "patience_limit": 5}}
{"train": {"eval_examples_count": 64, "metrics": {"roc_auc": 1.0, "accuracy": 1.0, "f1_macro": 1.0}, "time_spent": "1:32:04", "epochs_done": 3, "batches_seen": 186, "train_examples_seen": 11841, "learning_rate": 1e-05, "loss": 0.033550254579993985}}


2020-01-05 15:59:48.526 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 211: Did not improve on the roc_auc of 0.9255


{"valid": {"eval_examples_count": 2647, "metrics": {"roc_auc": 0.9102, "accuracy": 0.8674, "f1_macro": 0.83}, "time_spent": "1:38:28", "epochs_done": 3, "batches_seen": 186, "train_examples_seen": 11841, "impatience": 3, "patience_limit": 5}}
{"train": {"eval_examples_count": 64, "metrics": {"roc_auc": 0.9985, "accuracy": 0.9844, "f1_macro": 0.9765}, "time_spent": "2:03:09", "epochs_done": 4, "batches_seen": 248, "train_examples_seen": 15788, "learning_rate": 1e-05, "loss": 0.01995556393352848}}


2020-01-05 16:30:54.306 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 211: Did not improve on the roc_auc of 0.9255


{"valid": {"eval_examples_count": 2647, "metrics": {"roc_auc": 0.9063, "accuracy": 0.8663, "f1_macro": 0.8272}, "time_spent": "2:09:33", "epochs_done": 4, "batches_seen": 248, "train_examples_seen": 15788, "impatience": 4, "patience_limit": 5}}
{"train": {"eval_examples_count": 64, "metrics": {"roc_auc": 1.0, "accuracy": 1.0, "f1_macro": 1.0}, "time_spent": "2:34:46", "epochs_done": 5, "batches_seen": 310, "train_examples_seen": 19735, "learning_rate": 1e-05, "loss": 0.018101048049321698}}


2020-01-05 17:02:33.515 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 211: Did not improve on the roc_auc of 0.9255
2020-01-05 17:02:34.222 INFO in 'deeppavlov.core.models.lr_scheduled_model'['lr_scheduled_model'] at line 429: New learning rate dividor = 2.0


{"valid": {"eval_examples_count": 2647, "metrics": {"roc_auc": 0.9114, "accuracy": 0.8708, "f1_macro": 0.8265}, "time_spent": "2:41:13", "epochs_done": 5, "batches_seen": 310, "train_examples_seen": 19735, "impatience": 5, "patience_limit": 5}}


2020-01-05 17:02:34.920 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 328: Ran out of patience
2020-01-05 17:02:35.316 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/classes.dict]
2020-01-05 17:02:56.859 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/model]


INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/model
{"train": {"eval_examples_count": 3947, "metrics": {"roc_auc": 0.9905, "accuracy": 0.9572, "f1_macro": 0.9431}, "time_spent": "0:09:43"}}
{"valid": {"eval_examples_count": 2647, "metrics": {"roc_auc": 0.9255, "accuracy": 0.8659, "f1_macro": 0.8075}, "time_spent": "0:06:35"}}
{"test": {"eval_examples_count": 2235, "metrics": {"roc_auc": 0.8612, "accuracy": 0.74, "f1_macro": 0.7235}, "time_spent": "0:05:29"}}


2020-01-05 17:24:44.154 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/classes.dict]
2020-01-05 17:25:05.488 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/model]


INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/classifiers/insults_kaggle_v3/model


In [None]:
model(['Your work was great', 'Your content is Horrible!!!'])

['Not Insult', 'Insult']

## BERT for tagging (Named Entity Recognition)

In [None]:
! python -m deeppavlov interact ner_ontonotes_bert -d

email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
2020-01-05 17:25:09.346 INFO in 'deeppavlov.core.common.file'['file'] at line 30: Interpreting 'ner_ontonotes_bert' as '/usr/local/lib/python3.6/dist-packages/deeppavlov/configs/ner/ner_ontonotes_bert.json'
2020-01-05 17:25:10.120 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/deeppavlov_data/ner_ontonotes_bert_v1.tar.gz?config=ner_ontonotes_bert to /root/.deeppavlov/ner_ontonotes_bert_v1.tar.gz
100% 805M/805M [02:11<00:00, 6.10MB/s]
2020-01-05 17:27:21.934 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/ner_ontonotes_bert_v1.tar.gz archive into /root/.deeppavlov/models
2020-01-05 17:27:32.127 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip?config=ner_ontonotes_bert download because of matching

In [None]:
from deeppavlov import build_model, configs

model = build_model(configs.ner.ner_ontonotes_bert_mult, download=True)

2020-01-05 20:57:52.217 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip to /root/.deeppavlov/downloads/multi_cased_L-12_H-768_A-12.zip
100%|██████████| 663M/663M [03:03<00:00, 3.62MB/s]
2020-01-05 21:00:55.477 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/downloads/multi_cased_L-12_H-768_A-12.zip archive into /root/.deeppavlov/downloads/bert_models
2020-01-05 21:01:02.384 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/deeppavlov_data/ner_ontonotes_bert_mult_v1.tar.gz to /root/.deeppavlov/ner_ontonotes_bert_mult_v1.tar.gz
100%|██████████| 1.32G/1.32G [06:53<00:00, 3.19MB/s]
2020-01-05 21:07:55.756 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/ner_ontonotes_bert_mult_v1.tar.gz archive into /root/.deeppavlov/models
[nltk_data] Downloading package 




2020-01-05 21:08:12.287 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /root/.deeppavlov/models/ner_ontonotes_bert_mult/tag.dict]











Using TensorFlow backend.


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
In

2020-01-05 21:08:45.466 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/ner_ontonotes_bert_mult/model]



INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/ner_ontonotes_bert_mult/model


In [None]:
model(['Curling World Championship will be held in Antananarivo'])

[[['Curling',
   'World',
   'Championship',
   'will',
   'be',
   'held',
   'in',
   'Antananarivo']],
 [['B-EVENT', 'I-EVENT', 'I-EVENT', 'O', 'O', 'O', 'O', 'B-GPE']]]

## BERT for Question Answering (Stanford Question Answering Dataset)

In [None]:
from deeppavlov import build_model, configs

model = build_model(configs.squad.squad_bert, download=True)

2020-01-05 21:14:22.951 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip download because of matching hashes
2020-01-05 21:14:31.665 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/squad_bert.tar.gz download because of matching hashes








2020-01-05 21:14:52.81 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/squad_bert/model]


INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/squad_bert/model


In [None]:
model(['In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls under gravity. The main forms of precipitation include drizzle, rain, sleet, snow, graupel and hail… Precipitation forms as smaller droplets coalesce via collision with other rain drops or ice crystals within a cloud. Short, intense periods of rain in scattered locations are called “showers”.'], 
      ['Where do water droplets collide with ice crystals to form precipitation?'])[0]

['within a cloud']

## Check if the paraphrases of both texts are same

In [None]:
! python -m deeppavlov install paraphraser_bert

email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
2020-01-05 17:42:02.700 INFO in 'deeppavlov.core.common.file'['file'] at line 30: Interpreting 'paraphraser_bert' as '/usr/local/lib/python3.6/dist-packages/deeppavlov/configs/classifiers/paraphraser_bert.json'
Collecting git+https://github.com/deepmipt/bert.git@feat/multi_gpu
  Cloning https://github.com/deepmipt/bert.git (to revision feat/multi_gpu) to /tmp/pip-req-build-lnv0je0j
  Running command git clone -q https://github.com/deepmipt/bert.git /tmp/pip-req-build-lnv0je0j
  Running command git checkout -b feat/multi_gpu --track origin/feat/multi_gpu
  Switched to a new branch 'feat/multi_gpu'
  Branch 'feat/multi_gpu' set up to track remote branch 'feat/multi_gpu' from 'origin'.
Building wheels for collected packages: bert-dp
  Building wheel for bert-dp (setup.py) ... [?25l[?25hdone
  Created wheel for bert-dp: filename=bert_dp-1.0-cp36-none-any.whl size=23580 sha256=

In [None]:
from deeppavlov import build_model, configs

model = build_model(configs.classifiers.paraphraser_bert, download=True) # download=True if model is not downloaded yet

2020-01-05 17:59:44.548 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/datasets/paraphraser.zip download because of matching hashes
INFO:deeppavlov.download:Skipped http://files.deeppavlov.ai/datasets/paraphraser.zip download because of matching hashes
2020-01-05 17:59:46.317 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip download because of matching hashes
INFO:deeppavlov.download:Skipped http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip download because of matching hashes
2020-01-05 17:59:49.572 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/classifiers/paraphraser_bert_v0.tar.gz download because of matching hashes
INFO:deeppavlov.download:Skipped http://files.deeppavlov.ai/deeppavlov_data/classifiers/paraphraser_bert_v0.tar.gz download because of matching has

INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/paraphraser_bert_v0/model_multi


INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/paraphraser_bert_v0/model_multi


In [None]:
model(["I am great"], ["I am good"])

array([1])

In [None]:
model(["This is worse than I expected"], ["I thing this is great"])

array([0])

## Summarizer

In [None]:
! python -m deeppavlov install bert_as_summarizer

email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
2020-01-05 17:56:00.910 INFO in 'deeppavlov.core.common.file'['file'] at line 30: Interpreting 'bert_as_summarizer' as '/usr/local/lib/python3.6/dist-packages/deeppavlov/configs/summarization/bert_as_summarizer.json'
Collecting git+https://github.com/deepmipt/bert.git@feat/multi_gpu
  Cloning https://github.com/deepmipt/bert.git (to revision feat/multi_gpu) to /tmp/pip-req-build-2f7kestp
  Running command git clone -q https://github.com/deepmipt/bert.git /tmp/pip-req-build-2f7kestp
  Running command git checkout -b feat/multi_gpu --track origin/feat/multi_gpu
  Switched to a new branch 'feat/multi_gpu'
  Branch 'feat/multi_gpu' set up to track remote branch 'feat/multi_gpu' from 'origin'.
Building wheels for collected packages: bert-dp
  Building wheel for bert-dp (setup.py) ... [?25l[?25hdone
  Created wheel for bert-dp: filename=bert_dp-1.0-cp36-none-any.whl size=23580 s

In [None]:
from deeppavlov import build_model, configs

model = build_model(configs.summarization.bert_as_summarizer , download=True) # download=True if model is not downloaded yet

2020-01-05 17:56:09.577 INFO in 'deeppavlov.download'['download'] at line 117: Skipped http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v2.tar.gz download because of matching hashes
2020-01-05 17:56:16.125 INFO in 'deeppavlov.models.bert.bert_as_summarizer'['bert_as_summarizer'] at line 102: [initializing model with Bert from /root/.deeppavlov/downloads/bert_models/rubert_cased_L-12_H-768_A-12_v2/bert_model.ckpt]


In [None]:
model(["The U.S. is ready to engage in talks about North Korea’s nuclear program even as it maintains pressure on Kim Jong Un’s regime, \
the Washington Post reported, citing an interview with Vice President Mike Pence. Pence and South Korea’s President Moon Jae-in agreed on \
a post-Olympics strategy during conversations at the Winter Olympics in the South Korean resort of Pyeongchang that Pence dubbed “maximum pressure \
and engagement at the same time.” Pence spoke in an interview on his way home from the Winter Olympics. “The point is, no pressure comes off until they \
are actually doing something that the alliance believes represents a meaningful step toward denuclearization,” the Post quoted Pence as saying. \
“So the maximum pressure campaign is going to continue and intensify. But if you want to talk, we’ll talk.”"])

[['The U.S. is ready to engage in talks about North Korea’s nuclear program even as it maintains pressure on Kim Jong Un’s regime, the Washington Post reported, citing an interview with Vice President Mike Pence.',
  '“The point is, no pressure comes off until they are actually doing something that the alliance believes represents a meaningful step toward denuclearization,” the Post quoted Pence as saying.']]

## Open Domain QA

In [None]:
!python -m deeppavlov install en_odqa_infer_wiki

email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
2020-01-05 18:48:34.790 INFO in 'deeppavlov.core.common.file'['file'] at line 30: Interpreting 'en_odqa_infer_wiki' as '/usr/local/lib/python3.6/dist-packages/deeppavlov/configs/odqa/en_odqa_infer_wiki.json'


Training for own data

In [None]:
from deeppavlov import configs
from deeppavlov.core.commands.train import train_evaluate_model_from_config

train_evaluate_model_from_config(configs.doc_retrieval.en_ranker_tfidf_wiki, download=True)
train_evaluate_model_from_config(configs.squad.multi_squad_noans, download=True)

Build Model with the training done

In [None]:
from deeppavlov import configs
from deeppavlov.core.commands.infer import build_model

odqa = build_model(configs.odqa.en_odqa_infer_wiki, load_trained=True)

Build Model from the available pretrained weights

In [None]:
from deeppavlov import configs
from deeppavlov.core.commands.infer import build_model
odqa = build_model(configs.odqa.en_odqa_infer_wiki, download = True)

2020-01-05 19:23:30.608 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/deeppavlov_data/multi_squad_model_noans_1.1.tar.gz to /root/.deeppavlov/multi_squad_model_noans_1.1.tar.gz
100%|██████████| 265M/265M [01:48<00:00, 2.44MB/s]
2020-01-05 19:25:19.82 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/multi_squad_model_noans_1.1.tar.gz archive into /root/.deeppavlov/models
2020-01-05 19:25:26.175 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/datasets/wikipedia/enwiki.tar.gz to /root/.deeppavlov/enwiki.tar.gz
100%|██████████| 4.81G/4.81G [20:40<00:00, 3.88MB/s]
2020-01-05 19:46:06.726 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/enwiki.tar.gz archive into /root/.deeppavlov/downloads
2020-01-05 19:49:45.870 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppa








Using TensorFlow backend.







2020-01-05 20:25:36.164 INFO in 'deeppavlov.core.layers.tf_layers'['tf_layers'] at line 615: 


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Please use `layer.add_weight` method instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead 

2020-01-05 20:25:37.749 INFO in 'deeppavlov.core.layers.tf_layers'['tf_layers'] at line 615: 
2020-01-05 20:25:37.890 INFO in 'deeppavlov.core.layers.tf_layers'['tf_layers'] at line 615: 
2020-01-05 20:25:37.986 INFO in 'deeppavlov.core.layers.tf_layers'['tf_layers'] at line 615: 


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

Instructions for updating:
Use keras.layers.Dense instead.
Instructions for updating:
Please use `layer.__call__` method instead.



Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.




Instructions for updating:
Use standard file APIs to check for files with this prefix.


2020-01-05 20:25:51.395 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/multi_squad_model_noans/model]



INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/multi_squad_model_noans/model




['Argentina', '15-16 August 1952', '14 July 1789']

In [None]:
predicted_answers = odqa(["Where did guinea pigs originate?", "When did the Lynmouth floods happen?", 'What is the name of Darth Vader\'s son?'])
real_answers = ["Argentina", "15-16 August 1952", "Anakin Skywalker"]



In [None]:
for i, j in zip(predicted_answers, real_answers):
  print(f"Predicted - {i} \nReal- {j}\n\n")

Predicted - Andes of South America 
Real- Argentina


Predicted - 1804 
Real- 15-16 August 1952


Predicted - Fox Mulder 
Real- Anakin Skywalker




## Search Ranking from Databse about a given question


In [None]:
!python -m deeppavlov install ranking_insurance

email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
2020-01-05 12:49:37.414 INFO in 'deeppavlov.core.common.file'['file'] at line 30: Interpreting 'ranking_insurance' as '/usr/local/lib/python3.6/dist-packages/deeppavlov/configs/ranking/ranking_insurance.json'
Collecting pybind11==2.2.3
[?25l  Downloading https://files.pythonhosted.org/packages/12/90/0f92a575dc60c8fba6d0c91d6b45abdb1058da9ebed40400cbcfad2ac0a7/pybind11-2.2.3-py2.py3-none-any.whl (144kB)
[K     |████████████████████████████████| 153kB 2.6MB/s 
[?25hInstalling collected packages: pybind11
Successfully installed pybind11-2.2.3
Collecting fastText==0.8.22
  Cloning https://github.com/deepmipt/fastText.git to /tmp/pip-install-yw6j_u_0/fastText
  Running command git clone -q https://github.com/deepmipt/fastText.git /tmp/pip-install-yw6j_u_0/fastText
Building wheels for collected packages: fastText
  Building wheel for fastText (setup.py) ... [?25l[?25hdone
  C

In [None]:
from deeppavlov import build_model, configs

rank_model = build_model(configs.ranking.ranking_insurance_interact, download=True)

2020-01-05 12:50:27.386 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/datasets/insuranceQA-master.zip to /root/.deeppavlov/downloads/insuranceQA-master.zip
100%|██████████| 277M/277M [01:39<00:00, 2.79MB/s]
2020-01-05 12:52:06.609 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/downloads/insuranceQA-master.zip archive into /root/.deeppavlov/downloads/insurance_data
2020-01-05 12:52:09.217 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/deeppavlov_data/insurance_ranking.tar.gz to /root/.deeppavlov/insurance_ranking.tar.gz
100%|██████████| 8.49M/8.49M [00:04<00:00, 1.79MB/s]
2020-01-05 12:52:13.963 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/insurance_ranking.tar.gz archive into /root/.deeppavlov/models
2020-01-05 12:52:15.297 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading f

In [None]:
rank_model(['how much to pay for auto insurance?'])   

[['there be several carrier who will offer an 18 year old in Minnesota car insurance coverage what be probably important to you be who will give you the good coverage at the low price get this you shall seek out the help of an agent who be license sell auto insurance in your state who can show you various auto insurance quote',
  'cheap car insurance for young driver be available if you know where look many reputable insurer offer low down payment and cheap rate shopping for these good offer be best handle an experienced auto insurance broker with preferably 20 year or more experience not only do we help our client find the good deal but our website offer free information and advice that can often help reduce premium there be no fee or charge and take care of client be the top priority broker do not charge fee and we do the shopping so our customer save that be not a bad deal try us and let us find cheap auto insurance price for teenager',
  'you can get free auto insurance quote a num