## Image to question anser
  * from: https://huggingface.co/docs/transformers/pipeline_tutorial
  
The pipeline() supports more than one modality. For example, a visual question answering (VQA) task combines text and image. Feel free to use any image link you like and a question you want to ask about the image. The image can be a URL or a local path to the image.

In [1]:
image_invoice = 'https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png'

In [2]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= image_invoice)

In [3]:
from transformers import pipeline



## Visual Question Answer = VQA
  * depends on image OCR - tesseract package/binary

In [4]:
vqa = pipeline(model="impira/layoutlm-document-qa")

In [5]:
vqa(
    image=image_invoice,
    question="What is the invoice number?",
)

[{'score': 0.9998127222061157, 'answer': 'us-001', 'start': 15, 'end': 15}]

In [6]:
vqa(
    image=image_invoice,
    question="What is the total price?",
)

[{'score': 0.995294988155365, 'answer': '$154.06', 'start': 74, 'end': 74}]

In [7]:
vqa(
    image=image_invoice,
    question="When is the payment due?",
)

[{'score': 0.999400794506073,
  'answer': 'within 15 days.',
  'start': 84,
  'end': 86}]

In [8]:
vqa(
    image=image_invoice,
    question="Who is this bill from?",
)

[{'score': 0.931580662727356, 'answer': 'John Smith', 'start': 16, 'end': 17}]

In [9]:
vqa(
    image=image_invoice,
    question="What was the most expensive item on this invoice?",
)

[{'score': 0.592963457107544,
  'answer': 'brake cables',
  'start': 51,
  'end': 52}]

In [10]:
vqa(
    image=image_invoice,
    question="What was the cost of the most expensive item on this invoice?",
)

[{'score': 0.9887761473655701, 'answer': '100.00', 'start': 53, 'end': 53}]

In [11]:
invoice2 = 'https://www.zoho.com/invoice/images/invoice-templates/sales-invoice-template/sales-invoice-template-2x.jpg'

In [12]:
Image(url= invoice2)

In [13]:
vqa(
    image=invoice2,
    question="What is the sub total?",
)

[{'score': 0.3901703953742981, 'answer': '61.97', 'start': 129, 'end': 129}]

In [14]:
vqa(
    image=invoice2,
    question="When was the invoice issued?",
)

[{'score': 0.9991297125816345,
  'answer': '18 May 2023',
  'start': 40,
  'end': 42}]

## Wrong answers

In [15]:
vqa(
    image=invoice2,
    question="What was the tax rate?",
)

[{'score': 0.5143300294876099, 'answer': '61.97', 'start': 129, 'end': 129}]

In [16]:
vqa(
    image=invoice2,
    question="What was the least expensive item?",
)

[{'score': 0.0004851834091823548, 'answer': '61.97', 'start': 129, 'end': 129}]

In [17]:
vqa(
    image=invoice2,
    question="What was the cheapest item?",
)

[{'score': 0.8215174674987793, 'answer': '$65.06', 'start': 131, 'end': 131}]

# Done -ZZZ