### Watson Discovery with PrimeQA Reader example
This notebook shows an example of a retriever-reader process that uses Watson Discovery to search a document collection and a PrimeQA Extractive Reader to find answers in the retrieved documents.

In [65]:
# Additional Dependencies - ibm_watson
! pip install ibm_watson

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


3783.72s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Looking in indexes: https://bsiyer%40us.ibm.com:****@na.artifactory.swg-devops.com/artifactory/api/pypi/wcp-ai-foundation-team-pypi-virtual/simple


In [76]:
# Set these parameters to configure the Watson Discovery Search
endpoint="https://api.us-south.discovery.watson.cloud.ibm.com/instances/5c9b84e4-f2e1-4aff-854f-651771aaa464"
api_key="your-api-key"
project_id="8c50236a-d8db-4af4-b640-c40c2d3fe671"
collection_id="32fe88df-c5ce-4cd8-8b9b-5069c772b2ea"
index_name="32fe88df-c5ce-4cd8-8b9b-5069c772b2ea:passages"
max_num_documents=5

In [68]:
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import (
    IAMAuthenticator,
    BearerTokenAuthenticator,
)
import pandas as pd
from IPython.display import display, HTML

In [69]:
# This example uses a collection of documents from the InsuranceLib corpus 
# Initialize Watson Discovery connection and obtain the collection id 
WDS = DiscoveryV2(version="2020-08-30", authenticator=IAMAuthenticator(apikey=api_key))
WDS.set_service_url(endpoint)
collections = WDS.list_collections(project_id=project_id).get_result()["collections"]
for collection in collections:
    if collection['name'] == index_name:
        collection_id = collection['collection_id']
        break
if collection_id == None:
    raise RuntimeError(f"Index not found {index_name}")

print(f'collection_id: {collection_id}')


collection_id: 6c16e135-5086-950f-0000-017d2ea3a766


In [71]:
# Retrieve documents
question = "when can I drop collision on my auto policy ?"
hits = WDS.query(
        project_id=project_id,
        collection_ids=[collection_id],
        natural_language_query=question,
        count=max_num_documents).get_result()["results"]

print(f'Number of hits: {len(hits)}')

results = []
if hits:
    for i, hit in enumerate(hits):
        query_hits = {
            "rank": i,
            "score": hit['result_metadata']['confidence'],
            "doc_id": hit["document_id"] if "document_id" in hit else None,
            "text": hit["text"][0],
            "title": hit["title"] if "title" in hit else None
        }
        
        results.append(query_hits)

df = pd.DataFrame.from_records(results, columns=['rank','score','doc_id','title','text'])
print('======================================================================')
print(f'QUERY: {question}')
display( HTML(df.to_html()) )

                

Number of hits: 5
QUERY: when can I drop collision on my auto policy ?


Unnamed: 0,rank,score,doc_id,title,text
0,0,0.13239,384517da-ffef-4a1a-814b-59a16e7306cb,6048,"the single best thing you can do is shop around with other auto insurance companies . each company is unique in that they have their own appetite for what they like and do n't like . you may fit right into some company 's `` sweet spot '' . but you will never know which companies can beat your current rates . as far as lowering premium on an active policy , you have to be careful . normally when drivers lower their coverages or cut them out altogether , they tend to make cuts in the wrong areas . it 's important to understand the difference between a controlled risk and an uncontrolled risk . a controlled risk is one where by lowering or dropping coverage , you know the exact dollar amount of the extra risk you are taking on . this includes raising your deductibles on comprehensive and collision coverage -lrb- or dropping it altogether -rrb- , dropping additional coverage such as rental and roadside assistance/towing . stay away from lowering uncontrolled risks ! namely liability coverage and uninsured motorist bodily injury coverage . this is where people get burned ! keep both of these as high as you can afford them ."
1,1,0.13107,234aff19-c002-48a9-baa2-4aea6fae6914,18434,i rarely recommend that any driver drop collision coverage . regardless of the premium there will be a sum recovered if the car is damaged in a collision and that might be extremely important to the customer . i do advise them that as the car loses value the amount they will recover drops as well and that for some older cars a minor collision would result in the car being declared a total .
2,2,0.12976,d686b2a5-12c6-4787-b00c-c672782b8e35,12610,dropping collision on your car depends on how much you wish to self insure . as long as you do not have a loan you can drop collision anytime . however one of the more common times to consider dropping collision is if your vehicle is worth $ 3500.00 or less . the reason for this thinking is this is the amount uninsured motorist property damage would pick up in the event you are hit by someone with no insurance . of course your still on your own if you cause the accident or if your involved in a hit and run type situation where you do n't know who caused the accident . you may want to look at what you actually are paying for collision coverage and what the vehicle is worth in a total loss since this is all your going to get .
3,3,0.09393,82c021bd-d613-4ce1-af4c-6e759a6a0f98,18084,"i understand it 's hard when you have a driver with , shall we say , a less than stellar driving record . most companies , however , will want your husband to be named on your auto insurance policy . over the years they 've realized that people drive the cars that they own whether they 're insured on a policy or not . it 's just the reality of the situation . however , that does n't mean that your rates have to be sky-high , as you put it . there are some pointers i can give you to help minimize the impact . your husband can get his own insurance separate from yours . if his record is bad enough that he has be placed with a non standard -lrb- high risk -rrb- company , you can still have a policy through a standard company . as long as he has insurance , he does n't have to be listed as a driver on your policy . whether you choose to have him get his own policy or list him on yours , list him on a car that does n't have physical damage coverage -lrb- comp and collision -rrb- . physical damage coverage on a non standard policy can cost big dollars , as much as 75 % of the total cost of the insurance . track his driving record . any violation will eventually drop off for rating purposes . most minor violations will drop off after two years , at fault accidents will usually drop off after three years , and major violations -lrb- dui etc. -rrb- will drop off after five years . as soon as he 's eligible to be insured through a standard carrier , list him . many people do n't realize this and continue to pay non standard rates even after they 're eligible . i 'll preface this last point by saying i do n't recommend it . some companies can exclude a driver . in so doing they 're not rated on the policy and they 're record does n't matter . but realize that there is no coverage if they get into an accident , even if they 're driving a car on the policy . once again , i do not recommend this . from time to time i see a driver that has a problem driving safely . they routinely get tickets and have accidents . some will even just stop carrying insurance because it 's so expensive but do n't seem to see the correlation between their driving and their insurance rates . if he will , get him into a safe driving program . if he wo n't , you 'll probably be dealing with this for a long time . good luck ."
4,4,0.09299,6ce4dec0-b137-4547-b817-62aa0ce75cd6,13033,based on the question i would assume that you are talking about collision coverage for you auto . collision coverage is typically paired with other than collision coverage in your auto policy and both have their own deductible levels . collision coverage typicallyprovides for payment of damage to your vehicle when you are involved in an accident even if you are deemed at fault . please read your policy completely to understand the coverage provided and any exclusions that ther may be or contact your local agent to have them go over the policy with you .


In [72]:
# import the PrimeQA reader
from primeqa.components.reader.extractive import ExtractiveReader
import json

In [73]:
# Instantiate Reader
reader = ExtractiveReader()
reader.load()

loading configuration file config.json from cache at /u/bsiyer/.cache/huggingface/hub/models--PrimeQA--nq_tydi_sq1-reader-xlmr_large-20221110/snapshots/59c0ac1e8c43a3c7f6d5e26e4bf1e9c3c53b850c/config.json
Model config XLMRobertaConfig {
  "_name_or_path": "PrimeQA/nq_tydi_sq1-reader-xlmr_large-20221110",
  "architectures": [
    "XLMRobertaModelForDownstreamTasks"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "decoding_times_with_dropout": 5,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "xlm-roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "output_dropout_rate": 0.25,
  "output_past": true,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "sep_token_id": 2,
  "torch_dtype": "float32",
  "transformers_version": "4.24.0

{"time":"2023-03-06 18:57:23,225", "name": "ExtractiveQAHead", "level": "INFO", "message": "Loading dropout value 0.1 from config attribute 'hidden_dropout_prob'"}


All model checkpoint weights were used when initializing XLMRobertaModelForDownstreamTasks.

All the weights of XLMRobertaModelForDownstreamTasks were initialized from the model checkpoint at PrimeQA/nq_tydi_sq1-reader-xlmr_large-20221110.
If your task is similar to the task the model of the checkpoint was trained on, you can already use XLMRobertaModelForDownstreamTasks for predictions without further training.


{"time":"2023-03-06 18:57:23,739", "name": "XLMRobertaModelForDownstreamTasks", "level": "INFO", "message": "Setting task head for first time to 'None'"}


In [74]:
# Predict answers
contexts = [[result['text'] for result in results]]
answers = reader.predict([question], contexts)

print(f"Question: {question}")
print("Answers:")
print(json.dumps(answers,indent=4))

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


Running tokenizer on eval dataset:   0%|          | 0/1 [00:00<?, ?ba/s]

{"time":"2023-03-06 18:57:30,611", "name": "primeqa.mrc.trainers.mrc", "level": "INFO", "message": "The following columns in the evaluation set  don't have a corresponding argument in `XLMRobertaModelForDownstreamTasks.forward` and have been ignored: example_idx, context_idx, example_id, offset_mapping."}


***** Running Evaluation *****
  Num examples = 6
  Batch size = 8
You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


100%|██████████| 1/1 [00:00<00:00, 33.15it/s]

Question: when can I drop collision on my auto policy ?
Answers:
{
    "0": [
        {
            "example_id": "0",
            "passage_index": 1,
            "span_answer_text": "as the car loses value",
            "span_answer": {
                "start_position": 232,
                "end_position": 254
            },
            "span_answer_score": 9.767881229519844,
            "confidence_score": 0.40340543499795295
        },
        {
            "example_id": "0",
            "passage_index": 1,
            "span_answer_text": "if the car is damaged in a collision",
            "span_answer": {
                "start_position": 117,
                "end_position": 153
            },
            "span_answer_score": 9.520724594593048,
            "confidence_score": 0.31506704689959253
        },
        {
            "example_id": "0",
            "passage_index": 1,
            "span_answer_text": "regardless of the premium there will be a sum recovered if the car is da


