<img align="right" width="400" src="https://www.fhnw.ch/de/++theme++web16theme/assets/media/img/fachhochschule-nordwestschweiz-fhnw-logo.svg" alt="FHNW Logo">


# Document Question Answering using Transformers

by Fabian Märki

## Summary
The aim of this notebook is to show how Huggingface's model can be used for document question answering.


## Links
- [Notebooks](https://huggingface.co/docs/transformers/notebooks) on a different topics (fine tuning,  translation, summarization, question answering, audio classification, image classification etc.)
- [Enabling GPU on Google Colab](https://www.tutorialspoint.com/google_colab/google_colab_using_free_gpu.htm)

This notebook does not contain assigments: <font color='red'>Enjoy.</font>

<a href="https://colab.research.google.com/github/markif/2024_FS_CAS_NLP_LAB_Notebooks/blob/master/08_a_Document_Question_Answering_using_Transformers.ipynb">
  <img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
%%capture

!pip install 'fhnw-nlp-utils>=0.8.0,<0.9.0'

**Make sure that a GPU is available (see [here](https://www.tutorialspoint.com/google_colab/google_colab_using_free_gpu.htm))!!!**

In [2]:
from fhnw.nlp.utils.system import set_log_level
from fhnw.nlp.utils.system import system_info

set_log_level()
print(system_info())

OS name: posix
Platform name: Linux
Platform release: 5.10.147+
Python version: 3.8.16
CPU cores: 1
RAM: 12.68GB total and 11.85GB available
Tensorflow version: 2.9.2
GPU is available


In [3]:
!pip install transformers
!apt-get install -y tesseract-ocr
!pip install Pillow pytesseract 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m64.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m68.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1
Reading package lists... Done
Building dependency

**On Google colab you might need to press "RESTART RUNTIME" above!**

Download and use a pretrained QA model specialized for documents.

In [1]:
from transformers import pipeline

qa_pipeline = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
)

Downloading:   0%|          | 0.00/789 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/511M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/315 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

Let's extract infomation from following document...

In [2]:
image = "https://templates.invoicehome.com/invoice-template-us-neat-750px.png"

In [3]:
from IPython.display import Image
from IPython.core.display import HTML

Image(url=image) 

In [4]:
%%time

qa_pipeline(
    image,
    # Note: invoice number vs. Invoice #
    "What is the invoice number?"
)

CPU times: user 523 ms, sys: 50.8 ms, total: 574 ms
Wall time: 2.16 s


[{'score': 0.8480660319328308, 'answer': 'us-001', 'start': 16, 'end': 16}]

In [5]:
%%time

qa_pipeline(
    image,
    "What is the due date?"
)

CPU times: user 505 ms, sys: 25.4 ms, total: 530 ms
Wall time: 1.87 s


[{'score': 0.9999326467514038, 'answer': '26/02/2019', 'start': 42, 'end': 42}]

In [6]:
%%time

qa_pipeline(
    image,
    "Who is the buyer?"
)

CPU times: user 490 ms, sys: 25.5 ms, total: 516 ms
Wall time: 1.76 s


[{'score': 0.06948953121900558,
  'answer': 'John Smith',
  'start': 17,
  'end': 18}]

In [7]:
%%time

qa_pipeline(
    image,
    "Who is the issuer?"
)

CPU times: user 481 ms, sys: 28.1 ms, total: 509 ms
Wall time: 1.64 s


[{'score': 0.7861327528953552,
  'answer': 'East Repair Inc.',
  'start': 1,
  'end': 3}]

In [8]:
%%time

qa_pipeline(
    image,
    "What is the purchase amount?"
)

CPU times: user 499 ms, sys: 26.4 ms, total: 525 ms
Wall time: 1.76 s


[{'score': 0.016787059605121613, 'answer': '30.00', 'start': 62, 'end': 62}]

In [9]:
%%time

qa_pipeline(
    image,
    # multi line seems to be an issue
    "What is the billing address?"
)

CPU times: user 515 ms, sys: 32.5 ms, total: 547 ms
Wall time: 1.78 s


[{'score': 0.803274393081665,
  'answer': '2 Court Square',
  'start': 24,
  'end': 26}]

In [10]:
%%time

qa_pipeline(
    image,
    # hm not quite right
    "What is the shipping address?"
)

CPU times: user 501 ms, sys: 27.7 ms, total: 529 ms
Wall time: 1.75 s


[{'score': 0.8589608669281006,
  'answer': '2 Court Square',
  'start': 24,
  'end': 26}]

In [11]:
%%time

qa_pipeline(
    image,
    # let's try wording that matches better
    "ship address?"
)

CPU times: user 488 ms, sys: 25.2 ms, total: 513 ms
Wall time: 1.69 s


[{'score': 0.31904807686805725,
  'answer': '1912 Harvest Lane',
  'start': 4,
  'end': 6}]

In [12]:
%%time

qa_pipeline(
    image,
    "What is the unit price for rear brake cables?"
)

CPU times: user 497 ms, sys: 26.2 ms, total: 524 ms
Wall time: 1.68 s


[{'score': 0.9934648871421814, 'answer': '100.00', 'start': 54, 'end': 54}]

In [13]:
%%time

qa_pipeline(
    image,
    "What is the unit price for labor?"
)

CPU times: user 472 ms, sys: 28.5 ms, total: 500 ms
Wall time: 1.78 s


[{'score': 0.9096618294715881, 'answer': '5.00', 'start': 66, 'end': 66}]

In [14]:
%%time

qa_pipeline(
    image,
    # hm not quite right
    "What is the amount for labor?"
)

CPU times: user 483 ms, sys: 26.3 ms, total: 509 ms
Wall time: 1.68 s


[{'score': 0.9646313190460205, 'answer': '1.00', 'start': 67, 'end': 67}]

Let's try another document...

In [15]:
image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"

In [16]:
Image(url=image) 

In [17]:
%%time

qa_pipeline(
    image,
    "What are the 2020 net sales?"
)

CPU times: user 345 ms, sys: 22.1 ms, total: 367 ms
Wall time: 2.16 s


[{'score': 0.9939389228820801, 'answer': '$ 3,980', 'start': 15, 'end': 16}]

In [18]:
image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"
qa_pipeline(
    image,
    "Issuer?"
)

[{'score': 0.9821509718894958,
  'answer': 'Example Corporation',
  'start': 0,
  'end': 1}]

In [19]:
image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"
qa_pipeline(
    image,
    "Document type?"
)

[{'score': 0.2799419164657593,
  'answer': 'Income Statement',
  'start': 2,
  'end': 3}]