<a href="https://colab.research.google.com/github/srmykola/qa-bert/blob/master/public_question_answering_with_bert_and_ktrain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Build an Open-Domain Question-Answering System With BERT and `ktrain`

We first install `ktrain` and load a dataset into a Python list. We use the [20 Newsgroups dataset](https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html) in this example.

In [1]:
!pip3 install -q ktrain

[K     |████████████████████████████████| 25.2MB 132kB/s 
[K     |████████████████████████████████| 421.8MB 29kB/s 
[K     |████████████████████████████████| 983kB 50.0MB/s 
[K     |████████████████████████████████| 245kB 58.4MB/s 
[K     |████████████████████████████████| 573kB 56.0MB/s 
[K     |████████████████████████████████| 471kB 59.2MB/s 
[K     |████████████████████████████████| 450kB 65.1MB/s 
[K     |████████████████████████████████| 3.9MB 45.4MB/s 
[K     |████████████████████████████████| 1.0MB 48.6MB/s 
[K     |████████████████████████████████| 890kB 48.8MB/s 
[K     |████████████████████████████████| 3.7MB 42.2MB/s 
[?25h  Building wheel for ktrain (setup.py) ... [?25l[?25hdone
  Building wheel for keras-bert (setup.py) ... [?25l[?25hdone
  Building wheel for langdetect (setup.py) ... [?25l[?25hdone
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
  Building wheel for syntok (setup.py) ... [?25l[?25hdone
  Building wheel for gast (setup.py)

In [2]:
# load 20newsgroups datset into an array
from sklearn.datasets import fetch_20newsgroups
remove = ('headers', 'footers', 'quotes')
newsgroups_train = fetch_20newsgroups(subset='train', remove=remove)
newsgroups_test = fetch_20newsgroups(subset='test', remove=remove)
docs = newsgroups_train.data +  newsgroups_test.data

Downloading 20news dataset. This may take a few minutes.
Downloading dataset from https://ndownloader.figshare.com/files/5975967 (14 MB)


Next, we will import `ktrain` modules and set the location of the search index.

In [0]:
import ktrain
from ktrain import text

In [0]:
INDEXDIR = '/tmp/myindex'

### STEP 1: Create a Search Index

In [5]:
text.SimpleQA.initialize_index(INDEXDIR)
text.SimpleQA.index_from_list(docs, INDEXDIR, commit_every=len(docs))

### STEP 2: Create a QA Instance

In [6]:
qa = text.SimpleQA(INDEXDIR)

HBox(children=(IntProgress(value=0, description='Downloading', max=398, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=1341090760, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Downloading', max=231508, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=231508, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=361, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=536063208, style=ProgressStyle(description_…




### Ask Questions!

##### Space Question

In [10]:
answers = qa.ask('Are there astronauts in space now?')
qa.display_answers(answers[:5])

Unnamed: 0,Candidate Answer,Context,Confidence,Document Reference
0,and the other nations have so few astronauts,"we will assume you mean a nasa astronaut, since it ' s probably impossible for a non russian to get into the cosmonaut corps (paying passengers are not professional cosmonauts), and the other nations have so few astronauts (and fly even fewer) that you ' re better off hoping to win a lottery.",0.541088,545
1,nasa is now accepting on a continuous basis and plans to select astronaut candidates as needed.,nasa is now accepting on a continuous basis and plans to select astronaut candidates as needed.,0.283745,545
2,their astronauts travelled during the night,their astronauts travelled during the night ...,0.069213,1571
3,when nasa decides to select additional astronaut candidates,"when nasa decides to select additional astronaut candidates , consideration will be given only to those applications on hand on the date of decision is made.",0.020838,545
4,there is now a museum with a space shop,there is now a museum with a space shop .,0.017493,4282


As shown above, the top candidate answer of **October 1997** is the correct one.  (This won't always be the case, but it is here.)

##### Technical Support Question

In [0]:
answers = qa.ask('What causes computer images to be too dark?')
qa.display_answers(answers[:5])

Unnamed: 0,Candidate Answer,Context,Confidence,Document Reference
0,that not all display programs do gamma correction,the problem is that not all display programs do gamma correction .,0.848455,13873
1,if your viewer does not do gamma correction,"if your viewer does not do gamma correction , then linear images will look too dark, and gamma corrected images will ok.",0.042678,13873
2,altering the intensity in the hsv controls,"altering the intensity in the hsv controls does not do the right thing, as it fails to take account of the effect gamma has on h and s.",0.040854,13873
3,is gamma correction,"this, is gamma correction (or the lack of it).",0.019406,13873
4,if your viewer does not do gamma correction,"if your viewer does not do gamma correction , then left hand ramp will have a long dark part and a short white part, and the point of equal brightness will be above the center.",0.013617,13873


It looks like a **lack of gamma correction** is a cause of this technical problem.

##### Religious Question

In [0]:
answers = qa.ask('Who was Jesus Christ?')
qa.display_answers(answers[:5])

Unnamed: 0,Candidate Answer,Context,Confidence,Document Reference
0,is god incarnate,jesus isn ' t god ? when jesus returns some people may miss him ? what version of the bible do you read mike ? jesus is god incarnate (in flesh).,0.569717,6356
1,the incarnation of the son,jesus is the incarnation of the son .,0.32892,11661
2,is god ' s son,") you seem to be suggesting the jesus is god ' s son in a physical sense, with the holy spirit as father and mary as mother.",0.069267,11661
3,was god ' s only begotten son,the fact that jesus was god ' s only begotten son does not seem to me to have much meaning since god can beget as many sons as he wants to.,0.016456,11661
4,jesus god only of the jews,"which is more important : 1) the recorded word of jesus or 2) indications that you can deduce from the bible ? was jesus god only of the jews , or god of all humankind of all race and sex ?",0.005702,7842


Here, we see different views on Jesus Christ buried within this dataset.