# Teach a machine to understand human language
##### WeAreDeveloper World Congress 2019

## Part 3 - Machine Reading Comprehension of text snippets

**Goal**: We want to answer to questions by giving the machine a text context that contains the answer

**Step 1**: Find a matching text document in our database<br>
**Step 2**: Use Reading Comprehension models to derive a good answer

### Find a matching text document in our database

In [1]:
import json
import os

In [2]:
with open('/home/paul/Documents/wwc-demo/train-v2.0.json') as squad2_file:
    squad2_data = json.load(squad2_file)

In [27]:
from whoosh import index
from whoosh.fields import ID, STORED, TEXT, Schema
from whoosh.analysis import RegexTokenizer, StopFilter, CharsetFilter, LowercaseFilter
from whoosh.filedb.filestore import RamStorage

In [62]:
from whoosh.support.charset import accent_map

stem_ana = RegexTokenizer() | CharsetFilter(accent_map) | LowercaseFilter() | StopFilter() #| 
schema = Schema(
    topic=TEXT(stored=True, analyzer=stem_ana),
    content=TEXT(stored=True, analyzer=stem_ana)
)
if not os.path.exists("indexdir"):
    os.mkdir("indexdir")

ix = index.create_in("indexdir", schema)

In [63]:
#writer.cancel()
from whoosh import writing
with ix.writer() as mywriter:
    mywriter.mergetype = writing.CLEAR

In [64]:
ix = index.open_dir("indexdir")
writer = ix.writer()

In [None]:
for topic in squad2_data['data']:
    topic_name = topic['title']
    #print(topic_name)
    for paragraph in topic['paragraphs']:
        doc = paragraph['context']
        writer.add_document(topic=topic_name, content=doc)
writer.commit()

In [69]:
from whoosh.qparser import MultifieldParser, OrGroup
from whoosh.analysis import StopFilter, LowercaseFilter

with open('/home/paul/Documents/wwc-demo/nltk_stopwords.txt') as sw_file:
    sw = [s.strip() for s in sw_file.readlines()]
    
def search_whoosh(question, n_results):
    ana = RegexTokenizer() | LowercaseFilter() | StopFilter(stoplist=sw) | CharsetFilter(accent_map)
    question = ' '.join([t.text for t in ana(question, mode="index")])
    with ix.searcher() as searcher:
        parsed_query = MultifieldParser(
            ["topic", "content"], schema=ix.schema, group=OrGroup
        ).parse(question)
        hits = searcher.search(parsed_query, limit=n_results)
        results = [dict(hit.items()) for hit in hits]
    return results

In [70]:
search_whoosh('How old is beyonce', 1)

[{'content': 'On January 7, 2012, Beyoncé gave birth to a daughter, Blue Ivy Carter, at Lenox Hill Hospital in New York under heavy security. Two days later, Jay Z released "Glory", a song dedicated to their child, on his website Lifeandtimes.com. The song detailed the couple\'s pregnancy struggles, including a miscarriage Beyoncé suffered before becoming pregnant with Blue Ivy. Blue Ivy\'s cries are included at the end of the song, and she was officially credited as "B.I.C." on it. At two days old, she became the youngest person ever to appear on a Billboard chart when "Glory" debuted on the Hot R&B/Hip-Hop Songs chart.',
  'topic': 'Beyoncé'}]

### Use Reading Comprehension models to derive a good answer

In [72]:
import spacy
spacy.load('en_core_web_sm')

<spacy.lang.en.English at 0x7f3f410baf60>

In [73]:
passage = "The Matrix is a 1999 science fiction action film written and directed by The Wachowskis, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano."
question="Who stars in The Matrix?"
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz")
result = predictor.predict(
  passage=passage,
  question=question
)
result['best_span_str']

  "num_layers={}".format(dropout, num_layers))


'Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano'

Let's test that model on some sample data

In [75]:
text = """
To connect your phone to a mobile network, you'll need an active SIM. Without one, you’ll see a “No SIM card” message.
Note: Some of these steps work only on Android 9 and up. Learn how to check your Android version.

Tap here to see an interactive tutorial
Get a SIM card

All Pixel phones can use nano SIM cards. Some Pixel phones can also use eSIM.

If you buy a Pixel 3 or Pixel 2 phone on the Google Store:

    In the U.S., you can pick no SIM card or a pre-inserted Verizon SIM card. If Verizon is your mobile carrier, activate your SIM card on their site (www.vzw.com/google-activate).
    In other countries, your phone comes with no SIM card.

If you need a SIM card

To get a nano SIM card, contact your mobile service provider. If you're asked for your phone's IMEI number, learn where to see your IMEI number.
If you have a SIM card

You can move your current phone's nano SIM card to your Pixel phone instead of getting a new one.
If you want to use eSIM

Some Pixel phones can use eSIM, depending on the device and mobile carrier. For details, check with your carrier. 

Carriers that work with eSIM on Pixel phones include: 

    Pixel 3a: Sprint, Telekom.de, or Google Fi
    Pixel 3a and 3 can't use eSIM if the phone was purchased in Japan, or if bought with Verizon or Charter service.
    Pixel 3: Sprint, Telekom.de, or Google Fi
    Pixel 2: Google Fi

Insert a SIM card

With your phone off:

    Into the small hole on the phone's left edge, insert the SIM ejection tool. Firmly but gently, push until the tray pops out.
    Note: On Pixel 3 (2018), the SIM card slot is on the phone's bottom edge.
    Remove the tray, and put the nano SIM card in the tray. Gently, push the tray back into its slot.

You could need to restart your phone to start getting mobile service. To restart a phone that's on, press the power button for about 3 seconds. Then tap Restart Restart.

Insert SIM card

Find your phone's IMEI number

You can find your phone's IMEI number:

    On your phone's box.
    On your phone's SIM card tray.
    In your phone's Settings app Settings app: Tap System and then About phone, then look for "IMEI."

Tip: Take a photo of your IMEI number or write it on your paper Quick Start Guide.
"""

In [76]:
question = "How to change the SIM card in the Google Pixel"
result = predictor.predict(
  passage=text,
  question=question
)
result['best_span_str']

"Into the small hole on the phone's left edge"