# **Chetan Khadke**

Linkedin: https://www.linkedin.com/in/khadke-chetan/


## Set-up environment

Install 🤗 Transformers, datasets and SentencePiece.

PS: Please enable the GPU runtime. 

In [1]:
!nvidia-smi

Sun Aug 28 08:24:52 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!pip install -q git+https://github.com/huggingface/transformers.git

In [None]:
!pip install -q datasets sentencepiece

## Load image

In [4]:
# Imports PIL module 
import urllib.request
from PIL import Image
  
urllib.request.urlretrieve(
  'https://paysliper.com/assets/templates/image/list1.jpg',
   "sample.png")
  
image = Image.open("sample.png")
  

## Load model and processor

In [None]:
from transformers import DonutProcessor, VisionEncoderDecoderModel

processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa")
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa")

## Prepare using processor

Prepare the image for the model using **`DonutProcessor`**.

In [9]:
pixel_values = processor(image, return_tensors="pt").pixel_values
print(pixel_values.shape)

torch.Size([1, 3, 2560, 1920])


## Prediction
Generate the answer to the question.

In [13]:
import torch
import re
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
questions = ["what is the employee name?",
            "How many working days?",
            "What is the final net amount?", 
            "How many total deduction?",
            ]

task_prompt = "<s_docvqa><s_question>{user_input}</s_question><s_answer>"
for each in questions:
    question = each
    prompt = task_prompt.replace("{user_input}", question)
    decoder_input_ids = processor.tokenizer(prompt, add_special_tokens=False, return_tensors="pt")["input_ids"]
    outputs = model.generate(pixel_values.to(device),
                                decoder_input_ids=decoder_input_ids.to(device),
                                max_length=model.decoder.config.max_position_embeddings,
                                early_stopping=True,
                                pad_token_id=processor.tokenizer.pad_token_id,
                                eos_token_id=processor.tokenizer.eos_token_id,
                                use_cache=True,
                                num_beams=1,
                                bad_words_ids=[[processor.tokenizer.unk_token_id]],
                                return_dict_in_generate=True,
                                output_scores=True)
    seq = processor.batch_decode(outputs.sequences)[0]
    print(processor.token2json(seq))

{'question': 'what is the employee name?', 'answer': 'sally harley'}
{'question': 'How many working days?', 'answer': '26'}
{'question': 'What is the final net amount?', 'answer': '9500'}
{'question': 'How many total deduction?', 'answer': '2100'}
