<img align="right" width="400" src="https://www.fhnw.ch/de/++theme++web16theme/assets/media/img/fachhochschule-nordwestschweiz-fhnw-logo.svg" alt="FHNW Logo">


# Document Question Answering using Transformers

by Fabian Märki

## Summary
The aim of this notebook is to show how Huggingface's model can be used for document question answering.


## Links
- [Notebooks](https://huggingface.co/docs/transformers/notebooks) on a different topics (fine tuning,  translation, summarization, question answering, audio classification, image classification etc.)
- [Enabling GPU on Google Colab](https://www.tutorialspoint.com/google_colab/google_colab_using_free_gpu.htm)

This notebook does not contain assigments: <font color='red'>Enjoy.</font>

<a href="https://colab.research.google.com/github/markif/2024_FS_CAS_NLP_LAB_Notebooks/blob/master/08_b_Document_Question_Answering_using_Transformers.ipynb">
  <img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
%%capture

!pip install 'fhnw-nlp-utils>=0.8.0,<0.9.0'

**Make sure that a GPU is available (see [here](https://www.tutorialspoint.com/google_colab/google_colab_using_free_gpu.htm))!!!**

In [2]:
from fhnw.nlp.utils.system import set_log_level
from fhnw.nlp.utils.system import system_info

set_log_level()
print(system_info())

OS name: posix
Platform name: Linux
Platform release: 6.5.0-35-generic
Python version: 3.11.0rc1
CPU cores: 6
RAM: 31.1GB total and 23.16GB available
Tensorflow version: 2.16.1
GPU is available
GPU is a NVIDIA GeForce RTX 2070 with Max-Q Design with 8192MiB


In [3]:
!pip install transformers
!apt-get install -y tesseract-ocr
!pip install Pillow pytesseract 

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  fontconfig fontconfig-config fonts-dejavu-core libarchive13 libbsd0
  libcairo2 libdatrie1 libdeflate0 libfontconfig1 libfreetype6 libfribidi0
  libgif7 libgraphite2-3 libharfbuzz0b libjbig0 libjpeg-turbo8 libjpeg8
  liblept5 libmd0 libopenjp2-7 libpango-1.0-0 libpangocairo-1.0-0
  libpangoft2-1.0-0 libpixman-1-0 libpng16-16 libtesseract4 libthai-data
  libthai0 libtiff5 libwebp7 libwebpmux3 libx11-6 libx11-data libxau6
  libxcb-render0 libxcb-shm0 libxcb1 libxdmcp6 libxext6 libxrender1
  tesseract-ocr-eng tesseract-ocr-osd ucf
Suggested packages:
  lrzip
The following NEW packages will be installed:
  fontconfig fontconfig-config fonts-dejavu-core libarchive13 libbsd0
  libcairo2 libdatrie1 libdeflate0 libfontconfig1 libfreetype6 libfribidi0
  libgif7 libgraphite2-3 libharfbuzz0b libjbig0 libjpeg-turbo8 libjpeg8
  liblept5 libmd0 libop

**On Google colab you might need to press "RESTART RUNTIME" above!**

Download and use a pretrained QA model specialized for documents.

In [4]:
from transformers import pipeline

qa_pipeline = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
)

config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/511M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/315 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Let's extract infomation from following document...

In [5]:
image = "https://templates.invoicehome.com/invoice-template-us-neat-750px.png"

In [6]:
from IPython.display import Image
from IPython.core.display import HTML

Image(url=image) 

In [7]:
%%time

qa_pipeline(
    image,
    # Note: invoice number vs. Invoice #
    "What is the invoice number?"
)

CPU times: user 776 ms, sys: 312 µs, total: 777 ms
Wall time: 679 ms


[{'score': 0.4251529574394226, 'answer': 'us-001', 'start': 16, 'end': 16}]

In [8]:
%%time

qa_pipeline(
    image,
    "What is the due date?"
)

CPU times: user 1.08 s, sys: 0 ns, total: 1.08 s
Wall time: 678 ms


[{'score': 0.9999262094497681, 'answer': '26/02/2019', 'start': 42, 'end': 42}]

In [9]:
%%time

qa_pipeline(
    image,
    "Who is the buyer?"
)

CPU times: user 1.05 s, sys: 0 ns, total: 1.05 s
Wall time: 663 ms


[{'score': 0.13438627123832703,
  'answer': 'John Smith',
  'start': 17,
  'end': 18}]

In [10]:
%%time

qa_pipeline(
    image,
    "Who is the issuer?"
)

CPU times: user 942 ms, sys: 0 ns, total: 942 ms
Wall time: 626 ms


[{'score': 0.8599024415016174,
  'answer': 'East Repair Inc.',
  'start': 1,
  'end': 3}]

In [11]:
%%time

qa_pipeline(
    image,
    "What is the purchase amount?"
)

CPU times: user 703 ms, sys: 0 ns, total: 703 ms
Wall time: 566 ms


[{'score': 0.057942330837249756, 'answer': '100.00', 'start': 54, 'end': 54}]

In [12]:
%%time

qa_pipeline(
    image,
    # multi line seems to be an issue
    "What is the billing address?"
)

CPU times: user 781 ms, sys: 0 ns, total: 781 ms
Wall time: 645 ms


[{'score': 0.8556510806083679,
  'answer': '2 Court Square',
  'start': 24,
  'end': 26}]

In [13]:
%%time

qa_pipeline(
    image,
    # hm not quite right
    "What is the shipping address?"
)

CPU times: user 721 ms, sys: 0 ns, total: 721 ms
Wall time: 599 ms


[{'score': 0.8990191221237183,
  'answer': '2 Court Square',
  'start': 24,
  'end': 26}]

In [14]:
%%time

qa_pipeline(
    image,
    # let's try wording that matches better
    "ship address?"
)

CPU times: user 995 ms, sys: 0 ns, total: 995 ms
Wall time: 621 ms


[{'score': 0.3849318325519562,
  'answer': '2 Court Square',
  'start': 24,
  'end': 26}]

In [15]:
%%time

qa_pipeline(
    image,
    "What is the unit price for rear brake cables?"
)

CPU times: user 734 ms, sys: 0 ns, total: 734 ms
Wall time: 566 ms


[{'score': 0.9839473366737366, 'answer': '100.00', 'start': 54, 'end': 54}]

In [16]:
%%time

qa_pipeline(
    image,
    "What is the unit price for labor?"
)

CPU times: user 736 ms, sys: 0 ns, total: 736 ms
Wall time: 571 ms


[{'score': 0.8177785873413086, 'answer': '100.00', 'start': 54, 'end': 54}]

In [17]:
%%time

qa_pipeline(
    image,
    # hm not quite right
    "What is the amount for labor?"
)

CPU times: user 1.09 s, sys: 0 ns, total: 1.09 s
Wall time: 679 ms


[{'score': 0.5073113441467285,
  'answer': '1.00 Subtotal 145.00',
  'start': 67,
  'end': 69}]

Let's try another document...

In [18]:
image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"

In [19]:
Image(url=image) 

In [20]:
%%time

qa_pipeline(
    image,
    "What are the 2020 net sales?"
)



CPU times: user 552 ms, sys: 0 ns, total: 552 ms
Wall time: 890 ms


[{'score': 0.9726565480232239, 'answer': '$ 3,980', 'start': 11, 'end': 12}]

In [21]:
image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"
qa_pipeline(
    image,
    "Issuer?"
)



[{'score': 0.9959858655929565,
  'answer': 'Example Corporation',
  'start': 0,
  'end': 1}]

In [22]:
image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"
qa_pipeline(
    image,
    "Document type?"
)



[{'score': 0.4135207235813141,
  'answer': 'Example Corporation',
  'start': 0,
  'end': 1}]