## Loading Model from transformer - HugginFace

In [1]:
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
model = model.eval().cuda()

## Usage

```
# input your test image
image_file = 'xxx.jpg'

# plain texts OCR
res = model.chat(tokenizer, image_file, ocr_type='ocr')

# format texts OCR:
# res = model.chat(tokenizer, image_file, ocr_type='format')

# fine-grained OCR:
# res = model.chat(tokenizer, image_file, ocr_type='ocr', ocr_box='')
# res = model.chat(tokenizer, image_file, ocr_type='format', ocr_box='')
# res = model.chat(tokenizer, image_file, ocr_type='ocr', ocr_color='')
# res = model.chat(tokenizer, image_file, ocr_type='format', ocr_color='')

# multi-crop OCR:
# res = model.chat_crop(tokenizer, image_file, ocr_type='ocr')
# res = model.chat_crop(tokenizer, image_file, ocr_type='format')

# render the formatted OCR results:
# res = model.chat(tokenizer, image_file, ocr_type='format', render=True, save_render_file = './demo.html')

print(res)

```

## Test of PDF OCR

check online [demo](https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo) for more details

In [17]:
import numpy as np
import cv2
from pdf2image import convert_from_path
from datetime import datetime as dt, timedelta as td

# read pdf as images and save
images = convert_from_path("test_pdf_ocr/sample_file.pdf")
images_paths = []
for i in range(len(images)):
    images_paths.append(f"test_pdf_ocr/image-{i}.jpg")
    cv2.imwrite(images_paths[-1],np.array(images[i]))

In [19]:
# read each image and read with ocr model
for path in images_paths:
    a = dt.now()
    res = model.chat(tokenizer, path, ocr_type='format', render=True, save_render_file = path.replace('.jpg','.html'))
    b = dt.now()
    with open(path.replace(".jpg",f"_{round((b-a).total_seconds(),2)}.txt"),"w") as f:
        f.write(res)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.




## Test of PDF OCR Format as paper

Check the paper!

You can download any other paper!

```
@conference{visapp23,
author={Emmanuel Morán. and Boris Vintimilla. and Miguel Realpe.},
title={Towards a Robust Solution for the Supermarket Shelf Audit Problem},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP},
year={2023},
pages={912-919},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011747000003417},
isbn={978-989-758-634-7},
issn={2184-4321},
}
```

In [20]:
import numpy as np
import cv2
from pdf2image import convert_from_path
from datetime import datetime as dt, timedelta as td

# read pdf as images and save
images = convert_from_path("test_pdf_paper_format_ocr/117470.pdf")
images_paths = []
for i in range(len(images)):
    images_paths.append(f"test_pdf_paper_format_ocr/image-{i}.jpg")
    cv2.imwrite(images_paths[-1],np.array(images[i]))

In [22]:
    
# read each image and read with ocr model
for path in images_paths:
    a = dt.now()
    res = model.chat(tokenizer, path, ocr_type='format', render=True, save_render_file = path.replace('.jpg','.html'))
    b = dt.now()
    with open(path.replace(".jpg",f"_{round((b-a).total_seconds(),2)}.txt"),"w") as f:
        f.write(res)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


