# Question Answering using pre-trained Transformers

We will compare few candidates of question-answering models  models:

mrm8488/bert-tiny-5-finetuned-squadv2, distilbert-base-cased-distilled-squad, bert-large-uncased-whole-word-masking-finetuned-squad (license: Open source, Apache 2.0 0
https://huggingface.co/distilbert-base-cased-distilled-squad#model-details

In [10]:
#!pip install transformers

In [6]:
import PyPDF2
from transformers import pipeline

# Define a function to load the PDF file and convert it to text
def load_pdf(file_path):
    pdfFileObj = open(file_path, 'rb')
    pdfReader = PyPDF2.PdfReader(pdfFileObj)
    text = ""
    for page in pdfReader.pages:
        text += page.extract_text()
    pdfFileObj.close()
    return text

# Define a function to ask and answer questions
def ask_question(question, text, qa):
    result = qa(question=question, context=text)
    answer = result["answer"]
    print(f"Q: {question}\nA: {answer}\n")

# Define a function to ask a list of questions
def question_list(questions, text, qa):
    for question in questions:
        ask_question(question, text, qa)

file_path = 'Male Demand.pdf'
text = load_pdf(file_path)

questions = [
    "What is the settlement agreement about?",
    "When was the Date of loss:?",
    "Who are Our Client?",
    "What is Our Client's Gender?",
    "What is Our Client's Date of Birth?",
    "Who are the NAME of patient in MEDICAL TREATMENT receipt?",
    "When was the earliest Date of Treatment?",
    "Did date of Treatments more than 30 days after Date of Loss?",
    "What are the most severe damage Our client's have?",
    "What are the mental, emotions, and other non-physical damage Our Client's have?"]

# distilbert-base-cased-distilled-squad

In [2]:
choose_model = "distilbert-base-cased-distilled-squad"

qa = pipeline("question-answering", model=choose_model)
question_list(questions, text, qa)

Q: What is the settlement agreement about?
A: Mr. Rolax ’s claim

Q: When was the Date of loss:?
A: July 30, 2021

Q: Who are Our Client?
A: Ronald  Rolax

Q: What is Our Client's Gender?
A: Male

Q: What is Our Client's Date of Birth?
A: 01/01/1970

Q: Who are the NAME of patient in MEDICAL TREATMENT receipt?
A: Carolyn Downs Family Medical Clinic

Q: When was the earliest Date of Treatment?
A: March 8, 2023

Q: Did date of Treatments more than 30 days after Date of Loss?
A: March 8, 2023

Q: What are the most severe damage Our client's have?
A: right -sided chest wall pain, left shoulder 
pain, and left flank pain

Q: What are the mental, emotions, and other non-physical damage Our Client's have?
A: anxiety and worry



## bert-large-uncased-whole-word-masking-finetuned-squad

In [3]:
choose_model = "bert-large-uncased-whole-word-masking-finetuned-squad"

qa = pipeline("question-answering", model=choose_model)
question_list(questions, text, qa)

Q: What is the settlement agreement about?
A: Mr. Rolax ’s claim

Q: When was the Date of loss:?
A: July 30, 2021

Q: Who are Our Client?
A: Ronald  Rolax

Q: What is Our Client's Gender?
A: Male

Q: What is Our Client's Date of Birth?
A: 01/01/1970

Q: Who are the NAME of patient in MEDICAL TREATMENT receipt?
A: Ronald  Rolax

Q: When was the earliest Date of Treatment?
A: July 30, 2021

Q: Did date of Treatments more than 30 days after Date of Loss?
A: 01/17/22 – 10/17/22

Q: What are the most severe damage Our client's have?
A: disc protrusion with annular tear

Q: What are the mental, emotions, and other non-physical damage Our Client's have?
A: damages for emotional distress and economic damages



## mrm8488/bert-tiny-5-finetuned-squadv2

In [8]:
choose_model = "mrm8488/bert-tiny-5-finetuned-squadv2"

qa = pipeline("question-answering", model=choose_model)
question_list(questions, text, qa)

Q: What is the settlement agreement about?
A: Mr. Rolax ’s claim

Q: When was the Date of loss:?
A: July 30, 2021

Q: Who are Our Client?
A: Ronald  Rolax

Q: What is Our Client's Gender?
A: cervical pain

Q: What is Our Client's Date of Birth?
A: October 
17, 2022

Q: Who are the NAME of patient in MEDICAL TREATMENT receipt?
A: annular tear

Q: When was the earliest Date of Treatment?
A: 07/30/21

Q: Did date of Treatments more than 30 days after Date of Loss?
A: July 30, 2021

Q: What are the most severe damage Our client's have?
A: proximately caused, as well as the 
damages documented herein

Q: What are the mental, emotions, and other non-physical damage Our Client's have?
A: compensable



# Deploy Transformer model to Streamlit Cloud
link: https://hennypur-qna-pdf-qa-yy18t4.streamlit.app/

In [35]:
%%writefile QA.py

import streamlit as st
from transformers import pipeline
from PyPDF2 import PdfReader

def load_pdf(file):
    pdf_reader = PdfReader(file)
    num_pages = len(pdf_reader.pages)
    text = ""
    for page in range(num_pages):
        page_obj = pdf_reader.pages[page]
        text += page_obj.extract_text()
    return text

def main():
    # Prompt user to upload a PDF file
    file = st.file_uploader("Upload a PDF file", type="pdf")

    # Generate summary if file is uploaded
    if file is not None:
        # Read PDF content
        with st.spinner('Extracting text from PDF...'):
            text = load_pdf(file)

        # Define question answering pipeline
        choose_model = "distilbert-base-cased-distilled-squad"
        with st.spinner('Loading model...'):
            qa = pipeline("question-answering", model=choose_model)

        # Define a function to ask and answer questions
        def ask_question(question, text, qa):
            result = qa(question=question, context=text)
            answer = result["answer"]
            return answer

        # Define a list of example questions
        questions = [
            "What are the most severe damage Our client's have?"
        ]

        # Display example questions and answers
        st.write("Here are some example QnA of the PDF content:")
        with st.spinner('Answering example questions...'):
            for question in questions:
                answer = ask_question(question, text, qa)
                st.write(f"Q: {question}")
                st.write(f"A: {answer}\n")

        # Allow user to input custom question
        st.write("Enter your question below:")
        user_question = st.text_input("", "")
        if user_question:
            with st.spinner('Answering your question...'):
                answer = ask_question(user_question, text, qa)
                st.write(f"Q: {user_question}")
                st.write(f"A: {answer}\n")


if __name__ == "__main__":
    st.set_page_config(page_title='PDF QnA', page_icon=':books:')
    st.title('PDF QnA')
    st.write('This app allows you to ask questions about a PDF document.')
    main()


Overwriting QA.py


# Q&A using GPT-3 with paid token

In [10]:
import openai
import numpy as np
import pandas as pd
import pickle
import PyPDF2
from PyPDF2 import PdfReader

COMPLETIONS_MODEL = "text-davinci-003"

with open('api_key.txt', 'r') as f:
    api_key = f.read().strip()
openai.api_key = api_key

prompt = text
question = input("Question:")

# Generate answer for user inputs a question
if question:
    prompt_with_question= f"{prompt}\n\nQuestion: {question}\nA:"
    response = openai.Completion.create(
        prompt=prompt_with_question,
        temperature=0.5,
        max_tokens=512,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        model="text-davinci-003",
        stop=["Q:"],
        timeout=60  # wait for 60 seconds before timing out
    )

    # Extract answer
    answer = response.choices[0].text.strip()
    print(answer)

Question:who is our client?
Our client is Ronald Rolax.


In [142]:
import openai
import PyPDF2

# Define a function to load the PDF file and convert it to text
def load_pdf(file_path):
    pdfFileObj = open(file_path, 'rb')
    pdfReader = PyPDF2.PdfReader(pdfFileObj)
    text = ""
    for page in pdfReader.pages:
        text += page.extract_text()
    pdfFileObj.close()
    return text

# Load the PDF file and convert it to text
file_path = 'Male Demand.pdf'
text = load_pdf(file_path)

# Set API key as a secret
with open('api_key.txt', 'r') as f:
    api_key = f.read().strip()
openai.api_key = api_key

# Define the prompt to be used
prompt = f"Answer the following questions as truthfully as possible using the provided text. If the answer is not contained within the text below, say 'I don't know':\n\n{text}\n\nQ: Who is Our Client's name, Gender, and date of birth? When was the Date of loss? Does the patient's name in Medical Treatmant the same name as Our Client? If different, answer including note as suspicious. Was date of Medical treatment still within 30 days after Date of Loss? If no, answer including note as suspicious. What is the diagnosis of our client? What is the prognosis of our client? What are the mental, emotions, and other non-physical damage Our Client's suffer? What else you find suspicious?"

# Set the questions to be asked
questions = [
    "Who is Our Client's name, Gender, and date of birth?",
    "When was the Date of loss?",
    "Who is the patient's name in Medical Treatment?",
    "Does the patient's name in Medical Treatment the same name as Our Client? If different, answer including note as suspicious.",
    "Was date of Medical treatment still within 30 days after Date of Loss? If no, answer including note as suspicious.",
    "What is the diagnosis of our client?",
    "What is the prognosis of our client?",
    "What are the mental, emotions, and other non-physical damage Our Client's suffer?",
    "What else you find suspicious?"
]

# Ask the questions using OpenAI's GPT-3 model
for question in questions:
    example_prompt_with_question = f"{prompt}\n\nQ: {question}\nA"
    example_response = openai.Completion.create(
        prompt=example_prompt_with_question,
        temperature=0.5,
        max_tokens=512,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        model="text-davinci-003",
        stop=["Q:", "\n"],
    )

    answer = example_response.choices[0].text.strip()
    print(f"Q: {question}\nA: {answer}")

Q: Who is Our Client's name, Gender, and date of birth?
A: : Our Client's name is Ronald Rolax, Gender is Male, Date of Birth is 01/01/1970.
Q: When was the Date of loss?
A: : July 30, 2021.
Q: Who is the patient's name in Medical Treatment?
A: : Ms. Holmes
Q: Does the patient's name in Medical Treatment the same name as Our Client? If different, answer including note as suspicious.
A: : No, the patient's name in Medical Treatment is Ms. Holmes, which is different from Our Client's name, Ronald Rolax. Note: This is suspicious.
Q: Was date of Medical treatment still within 30 days after Date of Loss? If no, answer including note as suspicious.
A: : No, the date of medical treatment was not within 30 days after the Date of Loss. Note: This is suspicious.
Q: What is the diagnosis of our client?
A: : C3-C4 disc protrusion with annular tear (M50.80), tension headaches (R51), cervical sprain/strain (S13.4XXA), thoracic sprain/strain (S23.3XXA), lumbar sprain/strain (S33.5XXA), left shoulder/

# Deploy QnA GPT-3 to Streamlit Cloud

link: https://hennypur-qa-plus-qa-gpt3-75yq4o.streamlit.app/

In [2]:
%%writefile QA_GPT.py

import openai
import streamlit as st
from PyPDF2 import PdfReader

def load_pdf(file):
    pdf_reader = PdfReader(file)
    num_pages = len(pdf_reader.pages)
    text = ""
    for page in range(num_pages):
        page_obj = pdf_reader.pages[page]
        text += page_obj.extract_text()
    return text

def main():    
    # Set API key as a secret
    with open('api_key.txt', 'r') as f:
        api_key = f.read().strip()
    openai.api_key = api_key 

    #openai.api_key = input("Enter your OpenAI API key: ")
    
    # Prompt user to upload a PDF file
    file = st.file_uploader("Upload a PDF file", type="pdf")

    # Generate summary and example QnAs if file is uploaded
    if file is not None:
        # Read PDF content
        with st.spinner('Extracting text from PDF...'):
            text = load_pdf(file)

        # Assign PDF content to prompt var
        prompt = text

        # Set up Streamlit app 
        st.title("Ask a Question about Pdf Document")

        # Prompt user to enter a question
        question = st.text_input("What do you want to know about it?")

        # Generate answer if user inputs a question
        if question:
            prompt = f"Answer the question as truthfully as possible using the provided text. If the answer is not contained within the text below, say 'I don't know'.\n\n{prompt}"
            prompt_with_question= f"{prompt}\n\nQuestion: {question}\nA"
            response = openai.Completion.create(
                prompt=prompt_with_question,
                temperature=0.0,
                max_tokens=1024,
                top_p=1,
                frequency_penalty=0,
                presence_penalty=0,
                model="text-davinci-003",
                stop=["Q:", "\n"],
                timeout=600  # wait for 600 seconds before timing out
            )

            # Extract answer from OpenAI API response
            answer = response.choices[0].text.strip()

            # Output answer or "I don't know" if answer is empty
            answer_output = answer.strip() if answer.strip() != '' else "I don't know"
            st.write(f"Q: {question}")
            st.write(f"A: {answer_output}\n")
                    
if __name__ == "__main__":
    st.set_page_config(page_title='PDF Question and Answer', page_icon=':books:')
    st.title('PDF Question and Answer')
    st.write('App to answer your questions about PDF document.')
    main()


Writing QA_GPT.py
