<img align="right" width="400" src="https://www.fhnw.ch/de/++theme++web16theme/assets/media/img/fachhochschule-nordwestschweiz-fhnw-logo.svg" alt="FHNW Logo">


# Document Question Answering using Transformers

by Fabian Märki

## Summary
The aim of this notebook is to show how Huggingface's model can be used for document question answering.


## Links
- [Notebooks](https://huggingface.co/docs/transformers/notebooks) on a different topics (fine tuning,  translation, summarization, question answering, audio classification, image classification etc.)
- [Enabling GPU on Google Colab](https://www.tutorialspoint.com/google_colab/google_colab_using_free_gpu.htm)

<a href="https://colab.research.google.com/github/markif/2023_HS_DAS_NLP_Notebooks/blob/master/08_d_Transformers_Document_Question_Answering.ipynb">
  <img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
%%capture

!pip install 'fhnw-nlp-utils>=0.8.0,<0.9.0'

**Make sure that a GPU is available (see [here](https://www.tutorialspoint.com/google_colab/google_colab_using_free_gpu.htm))!!!**

In [2]:
from fhnw.nlp.utils.system import set_log_level
from fhnw.nlp.utils.system import system_info

set_log_level()
print(system_info())

OS name: posix
Platform name: Linux
Platform release: 5.19.0-41-generic
Python version: 3.8.10
CPU cores: 6
RAM: 31.12GB total and 15.14GB available
Tensorflow version: 2.12.0
GPU is available
GPU is a NVIDIA GeForce RTX 2070 with Max-Q Design with 8192MiB


In [3]:
%%capture

!pip install transformers
!apt-get update && apt-get install -y tesseract-ocr
!pip install Pillow pytesseract 

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease [1581 B]
Hit:2 http://archive.ubuntu.com/ubuntu focal InRelease                         
Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]        
Ign:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages [1010 kB]
Get:7 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1046 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]      
Hit:9 https://developer

Get:34 http://archive.ubuntu.com/ubuntu focal/main amd64 libthai-data all 0.1.28-3 [134 kB]
Get:35 http://archive.ubuntu.com/ubuntu focal/main amd64 libthai0 amd64 0.1.28-3 [18.1 kB]
Get:36 http://archive.ubuntu.com/ubuntu focal/main amd64 libpango-1.0-0 amd64 1.44.7-2ubuntu4 [162 kB]
Get:37 http://archive.ubuntu.com/ubuntu focal/main amd64 libpangoft2-1.0-0 amd64 1.44.7-2ubuntu4 [34.9 kB]
Get:38 http://archive.ubuntu.com/ubuntu focal/main amd64 libpangocairo-1.0-0 amd64 1.44.7-2ubuntu4 [24.8 kB]
Get:39 http://archive.ubuntu.com/ubuntu focal/universe amd64 libtesseract4 amd64 4.1.1-2build2 [1237 kB]
Get:40 http://archive.ubuntu.com/ubuntu focal/universe amd64 tesseract-ocr-eng all 1:4.00~git30-7274cfa-1 [1598 kB]
Get:41 http://archive.ubuntu.com/ubuntu focal/universe amd64 tesseract-ocr-osd all 1:4.00~git30-7274cfa-1 [2990 kB]
Get:42 http://archive.ubuntu.com/ubuntu focal/universe amd64 tesseract-ocr amd64 4.1.1-2build2 [262 kB]
Fetched 12.6 MB in 8s (1595 kB/s)                        

Selecting previously unselected package tesseract-ocr-osd.
Preparing to unpack .../40-tesseract-ocr-osd_1%3a4.00~git30-7274cfa-1_all.deb ...
Unpacking tesseract-ocr-osd (1:4.00~git30-7274cfa-1) ...
Selecting previously unselected package tesseract-ocr.
Preparing to unpack .../41-tesseract-ocr_4.1.1-2build2_amd64.deb ...
Unpacking tesseract-ocr (4.1.1-2build2) ...
Setting up libgraphite2-3:amd64 (1.3.13-11build1) ...
Setting up libpixman-1-0:amd64 (0.38.4-0ubuntu2.1) ...
Setting up libxau6:amd64 (1:1.0.9-0ubuntu1) ...
Setting up libdatrie1:amd64 (0.2.12-3) ...
Setting up libarchive13:amd64 (3.4.0-2ubuntu1.2) ...
Setting up tesseract-ocr-eng (1:4.00~git30-7274cfa-1) ...
Setting up libjbig0:amd64 (2.1-3.1ubuntu0.20.04.1) ...
Setting up libx11-data (2:1.6.9-2ubuntu1.2) ...
Setting up libfribidi0:amd64 (1.0.8-2ubuntu0.1) ...
Setting up libpng16-16:amd64 (1.6.37-2) ...
Setting up libwebp6:amd64 (0.6.1-2ubuntu0.20.04.1) ...
Setting up fonts-dejavu-core (2.37-1) ...
Setting up ucf (3.0038+nmu1

In [4]:
from transformers import pipeline

qa_pipeline = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
)

In [5]:
%%time

image = "https://templates.invoicehome.com/invoice-template-us-neat-750px.png"
qa_pipeline(
    image,
    # Note: invoice number vs. Invoice #
    "What is the invoice number?"
)

CPU times: user 1.05 s, sys: 89.1 ms, total: 1.14 s
Wall time: 771 ms


[{'score': 0.4251469373703003, 'answer': 'us-001', 'start': 16, 'end': 16}]

In [6]:
from IPython.display import Image
from IPython.core.display import HTML

Image(url=image) 

In [7]:
%%time

image = "https://templates.invoicehome.com/invoice-template-us-neat-750px.png"
qa_pipeline(
    image,
    "What is the due date?"
)

CPU times: user 559 ms, sys: 16.8 ms, total: 576 ms
Wall time: 531 ms


[{'score': 0.9999262094497681, 'answer': '26/02/2019', 'start': 42, 'end': 42}]

In [8]:
%%time

image = "https://templates.invoicehome.com/invoice-template-us-neat-750px.png"
qa_pipeline(
    image,
    "Who is the buyer?"
)

CPU times: user 736 ms, sys: 24.8 ms, total: 760 ms
Wall time: 579 ms


[{'score': 0.13438594341278076,
  'answer': 'John Smith',
  'start': 17,
  'end': 18}]

In [9]:
%%time

image = "https://templates.invoicehome.com/invoice-template-us-neat-750px.png"
qa_pipeline(
    image,
    "Who is the issuer?"
)

CPU times: user 747 ms, sys: 36.3 ms, total: 784 ms
Wall time: 583 ms


[{'score': 0.8599027395248413,
  'answer': 'East Repair Inc.',
  'start': 1,
  'end': 3}]

In [10]:
%%time

image = "https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg"
qa_pipeline(
    image,
    "What is the purchase amount?"
)

CPU times: user 1.26 s, sys: 58.1 ms, total: 1.32 s
Wall time: 1.19 s


[{'score': 0.999853253364563,
  'answer': '$1,000,000,000',
  'start': 97,
  'end': 97}]

In [11]:
Image(url=image) 

In [12]:
%%time

image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"
qa_pipeline(
    image,
    "What are the 2020 net sales?"
)

CPU times: user 590 ms, sys: 19.6 ms, total: 610 ms
Wall time: 879 ms


[{'score': 0.9780113101005554, 'answer': '$ 3,980', 'start': 15, 'end': 16}]

In [13]:
Image(url=image) 

In [14]:
image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"
qa_pipeline(
    image,
    "Issuer?"
)

[{'score': 0.9570097923278809,
  'answer': 'Example Corporation',
  'start': 0,
  'end': 1}]

In [15]:
image = "https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png"
qa_pipeline(
    image,
    "Document type?"
)

[{'score': 0.28877872228622437,
  'answer': 'Example Corporation',
  'start': 0,
  'end': 1}]