<a href="https://colab.research.google.com/github/racheltlw/htx_qa_demo/blob/main/QA_Single_Demo_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Demo for Question and Answering System (Single) 

This demo will walk you through how to pull and use a simple Question and Answering transformer model from the Hugging Face repository 

#### 1. Setup

In [1]:
!pip install numpy
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.21.1-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 5.2 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 28.9 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.9.0-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 53.6 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.0 tokenizers-0.12.1 transformers-4.21.1


import libraries

In [2]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
import numpy as np

#### 2. Load the Model and Tokenizer from Hugging Face

In [3]:
transformer_name = 'deepset/roberta-base-squad2'

If need be you can download the models for use locally (without internet)

In [4]:
#tokenizer = AutoTokenizer.from_pretrained(transformer_name)
#model = AutoModelForQuestionAnswering.from_pretrained(transformer_name)

In [5]:
#model.save_pretrained('local_models/local-roberta-base-squad2/model')
#tokenizer.save_pretrained('local_models/local-roberta-base-squad2/tokenizer')

#### 3. Read in data

Upload the single_doc.txt file

In [6]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving single_doc.txt to single_doc.txt
User uploaded file "single_doc.txt" with length 1286 bytes


In [7]:
with open('single_doc.txt', encoding="utf8") as f:
    data = f.read().replace('\n', ' ')

In [8]:
print(data)

The driver of a tourist bus was left with serious head injuries after being beaten unconscious with a metal water bottle on a Hong Kong street on Thursday.  Police arrested the 56-year-old driver of a minibus on suspicion of assault in connection with the attack, which took place in Yue Man Square, Kwun Tong, soon after 7am.  His 42-year-old victim was taken to Queen Elizabeth Hospital in Yau Ma Tei, and was in serious condition in the intensive care unit, police said.  According to police, the driver of the tourist bus was standing in the road tidying up the vehicle’s luggage compartment when the minibus drove by and almost hit him.  During the resulting argument, a police source said the minibus driver was believed to have hit the victim several times in the head “using a metal water bottle”.  The minibus driver, who was arrested for assault, complained of feeling unwell after the incident and was taken to United Christian Hospital in Kwun Tong.  Earlier on Thursday, a motorist escap

#### 4. Create a pipeline and ask a question!

Note: If you need to install pytorch again: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

In [9]:
qa_pipeline = pipeline('question-answering', model=transformer_name, tokenizer=transformer_name)

Downloading config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/473M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [10]:
QA_input = {
    'question': 'What is the weapon?',
    'context': data
}
result = qa_pipeline(QA_input)
result

{'score': 0.4121095538139343,
 'start': 102,
 'end': 120,
 'answer': 'metal water bottle'}

In [11]:
QA_input = {
    'question': 'Who was hurt?',
    'context': data
}
result = qa_pipeline(QA_input)
result

{'score': 0.35013526678085327,
 'start': 0,
 'end': 27,
 'answer': 'The driver of a tourist bus'}