# Installation

Install all the dependencies to make the most out of docTR. The project provides two main [installation](https://mindee.github.io/doctr/latest/installing.html) streams: one for stable release (update once every 45 days on average), and developer mode.

## Latest stable release

This will install the last stable release that was published by our teams on pypi. It is expected to provide a clean and non-buggy experience for all users.

In [None]:
# TensorFlow
# !pip install python-doctr[tf]
# PyTorch
!pip install python-doctr[torch]

## From source

Before being staged for a stable release, we constantly iterate on the community feedback to improve the library. Bug fixes and performance improvements are regularly pushed to the project Git repository. Using this installation method, you will access all the latest features that have not yet made their way to a pypi release!

In [None]:
# Do not run this one after you restart one time
# Install the most up-to-date version from GitHub
# TensorFlow
# !pip install -e git+https://github.com/mindee/doctr.git#egg=python-doctr[tf]
# PyTorch
!pip install -e git+https://github.com/mindee/doctr.git#egg=python-doctr[torch]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining python-doctr[torch] from git+https://github.com/mindee/doctr.git#egg=python-doctr[torch]
  Updating ./src/python-doctr clone
  Running command git fetch -q --tags
  Running command git reset --hard -q acb9f64b11ebad8e53ac60737fcde8dbd3158a22
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Installing collected packages: python-doctr
  Attempting uninstall: python-doctr
    Found existing installation: python-doctr 0.6.0
    Uninstalling python-doctr-0.6.0:
      Successfully uninstalled python-doctr-0.6.0
  Running setup.py develop for python-doctr
Successfully installed python-doctr


Now go to  `Runtime/Restart runtime` for your changes to take effect!

# Basic usage

We're going to review the main features of docTR 🎁
And for you to have a proper overview of its capabilities, we will need some free fonts for a proper output visualization:

In [None]:
# Install some free fonts for result rendering
!sudo apt-get install fonts-freefont-ttf -y

Reading package lists... Done
Building dependency tree       
Reading state information... Done
fonts-freefont-ttf is already the newest version (20120503-7).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.


Let's take care of all the imports directly

In [None]:
%matplotlib inline
import os

# Let's pick the desired backend
# os.environ['USE_TF'] = '1'
os.environ['USE_TORCH'] = '1'

import matplotlib.pyplot as plt

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

For the next steps, we will need a proper PDF document that will be used to showcase the library features

In [None]:
# Download a sample
!wget https://eforms.com/download/2019/01/Cash-Payment-Receipt-Template.pdf
# Read the file
doc = DocumentFile.from_pdf("Cash-Payment-Receipt-Template.pdf")
print(f"Number of pages: {len(doc)}")

--2022-11-11 14:45:02--  https://eforms.com/download/2019/01/Cash-Payment-Receipt-Template.pdf
Resolving eforms.com (eforms.com)... 52.206.2.160
Connecting to eforms.com (eforms.com)|52.206.2.160|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16080 (16K) [application/pdf]
Saving to: ‘Cash-Payment-Receipt-Template.pdf’


2022-11-11 14:45:02 (203 MB/s) - ‘Cash-Payment-Receipt-Template.pdf’ saved [16080/16080]

Number of pages: 1


docTR is, under the hood, running Deep Learning models to perform the different tasks it supports. Those models were built and trained with very popular frameworks for maximum compatibility (you will be pleased to know that you can switch from [PyTorch](https://pytorch.org/) to [TensorFlow](https://www.tensorflow.org/) without noticing any difference for you). By default, our high-level API sets the best default values so that you get high performing models without having to know anything about it. All of this is wrapper in a `Predictor` object, which will take care of pre-processing, model inference and post-processing for you ⚡

Let's instantiate one!

In [None]:
# Instantiate a pretrained model
predictor = ocr_predictor(pretrained=True)

  f"Using {sequence_to_str(tuple(keyword_only_kwargs.keys()), separate_last='and ')} as positional "


Downloading https://doctr-static.mindee.com/models?id=v0.3.1/db_resnet50-ac60cadc.pt&src=0 to /root/.cache/doctr/models/db_resnet50-ac60cadc.pt


  0%|          | 0/101971449 [00:00<?, ?it/s]

Downloading https://doctr-static.mindee.com/models?id=v0.3.1/crnn_vgg16_bn-9762b0b0.pt&src=0 to /root/.cache/doctr/models/crnn_vgg16_bn-9762b0b0.pt


  0%|          | 0/63286381 [00:00<?, ?it/s]

By default, PyTorch model provides a nice visual description of a model, which is handy when it comes to debugging or knowing what you just created. We also added a similar feature for TensorFlow backend so that you don't miss on this nice assistance.

Let's dive into this model 🕵

In [None]:
# Display the architecture
# print(predictor)

Here we are inspecting the most complex (and high-level) object of docTR API: an OCR predictor. Since docTR achieves Optical Character Recognition by first localizing textual elements (Text Detection), then extracting the corresponding text from each location (Text Recognition), the OCR Predictor wraps two sub-predictors: one for text detection, and the other for text recognition.

## Basic inference

It looks quite complex, isn't it?
Well that will not prevent you from easily get nice results. See for yourself:

In [None]:
# result = predictor(doc)

## Prediction visualization

If you rightfully prefer to see the results with your eyes, docTR includes a few visualization features. We will first overlay our predictions on the original document:

In [None]:
# result.show(doc)

Looks accurate!
But we can go further: if the extracted information is correctly structured, we should be able to recreate the page entirely. So let's do this 🎨

In [None]:
# synthetic_pages = result.synthesize()
# plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

## Exporting results

OK, so the predictions are relevant, but how would you integrate this into your own document processing pipeline? Perhaps you're not using Python at all?

Well, if you happen to be using JSON or XML exports, they are already supported 🤗

In [None]:
# # JSON export
# json_export = result.export()
# print(json_export)

{'pages': [{'page_idx': 0, 'dimensions': (1584, 1224), 'orientation': {'value': None, 'confidence': None}, 'language': {'value': None, 'confidence': None}, 'blocks': [{'geometry': ((0.314453125, 0.0634765625), (0.6875, 0.083984375)), 'lines': [{'geometry': ((0.314453125, 0.0634765625), (0.6875, 0.083984375)), 'words': [{'value': 'CASH', 'confidence': 0.9990693926811218, 'geometry': ((0.314453125, 0.064453125), (0.3984375, 0.0830078125))}, {'value': 'PAYMENT', 'confidence': 0.9999754428863525, 'geometry': ((0.40625, 0.0634765625), (0.5517578125, 0.083984375))}, {'value': 'RECEIPT', 'confidence': 0.9987125396728516, 'geometry': ((0.5595703125, 0.0634765625), (0.6875, 0.083984375))}]}], 'artefacts': []}, {'geometry': ((0.1162109375, 0.11328125), (0.2783203125, 0.3037109375)), 'lines': [{'geometry': ((0.1181640625, 0.11328125), (0.2783203125, 0.1328125)), 'words': [{'value': 'Company', 'confidence': 0.9992720484733582, 'geometry': ((0.1181640625, 0.1142578125), (0.2099609375, 0.1328125))},

In [None]:
# XML export
xml_output = result.export_as_xml()
print(xml_output[0][0])

b'<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"><head><title>docTR - XML export (hOCR)</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type" /><meta content="python-doctr 0.4.1a0" name="ocr-system" /><meta content="ocr_page ocr_carea ocr_par ocr_line ocrx_word" name="ocr-capabilities" /></head><body><div class="ocr_page" id="page_1" title="image; bbox 0 0 1224 1584; ppageno 0" /><div class="ocr_carea" id="block_1" title="bbox 385 101                     842 133"><p class="ocr_par" id="par_1" title="bbox 385 101                     842 133"><span class="ocr_line" id="line_1" title="bbox 385 101                         842 133;                         baseline 0 0; x_size 0; x_descenders 0; x_ascenders 0"><span class="ocrx_word" id="word_1" title="bbox 385 102                             488 131;                             x_wconf 100">CASH</span><span class="ocrx_word" id="word_2" title="bbox 497 101                             675 133;                   

In [None]:
model = ocr_predictor(reco_arch = "crnn_vgg16_bn", pretrained=True)
model.reco_predictor.model = model.reco_predictor.model.eval()

  f"Using {sequence_to_str(tuple(keyword_only_kwargs.keys()), separate_last='and ')} as positional "


In [None]:
# !pip install onnxoptimizer
# !pip install onnxruntime
# !pip install onnx
# !pip uninstall pillow
# !pip install "pillow<7"
# !pip install tf2onnx
# !pip install tensorflow-addons
# !pip install onnxoptimizer
# !pip install onnxruntime

In [None]:
import time

import numpy as np
import torch.onnx
import torch
from torch.cuda.amp import autocast
from doctr.models import ocr_predictor
import onnx
import onnxoptimizer
import onnxruntime

In [None]:
input = torch.randn(1, 3, 32, 128)
input2 = torch.randn(49, 3, 32, 128)

In [None]:
input = input.type(torch.FloatTensor).to("cuda")
model = model.type(torch.FloatTensor).to("cuda")

In [None]:
start = time.time()
valid_pred = model.reco_predictor.model(input.to("cuda"))
print("fp32 first pred time single sample", time.time() - start)
start = time.time()
for i in range(100):
    pred = model.reco_predictor.model(input.cuda())
print("fp32 gpu pytorch time single sample", time.time() - start)

fp32 first pred time single sample 0.7706351280212402
fp32 gpu pytorch time single sample 0.4912567138671875


In [None]:
start = time.time()
for i in range(100):
    pred = model.reco_predictor.model(input2.cuda())
print("fp32 gpu pytorch time big batch", time.time() - start)

fp32 gpu pytorch time big batch 3.6229426860809326


In [None]:
#half precision
model.reco_predictor.model = model.reco_predictor.model
input = input
input2 = input2
start = time.time()
with autocast():
    pred = model.reco_predictor.model(input.to("cuda"))
print("fp16 first pred time single sample", time.time() - start)
start = time.time()
with autocast():
    for i in range(100):
        pred = model.reco_predictor.model(input.cuda())
print("fp16 gpu pytorch time single sample", time.time() - start)

fp16 first pred time single sample 0.02126455307006836
fp16 gpu pytorch time single sample 0.515648603439331


In [None]:
# Check cuda version
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0


In [None]:
# !pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu112
!pip3 install numpy --pre torch[dynamo] --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

In [None]:
!pip install torch -f https://download.pytorch.org/whl/nightly/{accelerator}/torch_nightly.html
!pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html -U

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://download.pytorch.org/whl/nightly/{accelerator}/torch_nightly.html
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
Collecting torchvision
  Downloading torchvision-0.14.0-cp37-cp37m-manylinux1_x86_64.whl (24.3 MB)
[K     |████████████████████████████████| 24.3 MB 1.2 MB/s 
Collecting torch
  Downloading torch-1.13.0-cp37-cp37m-manylinux1_x86_64.whl (890.2 MB)
[K     |██████████████████████████████  | 834.1 MB 1.2 MB/s eta 0:00:48tcmalloc: large alloc 1147494400 bytes == 0x233e000 @  0x7f251152e615 0x58ead6 0x4f355e 0x4d222f 0x51041f 0x5b4ee6 0x58ff2e 0x510325 0x5b4ee6 0x58ff2e 0x50d482 0x4d00fb 0x50cb8d 0x4d00fb 0x50cb8d 0x4d00fb 0x50cb8d 0x4bac0a 0x538a76 0x590ae5 0x510280 0x5b4ee6 0x58ff2e 0x50d482 0x5b4ee6 0x58ff2e 0x50c4fc 0x

In [None]:
!git clone https://github.com/pytorch/functorch
!pip install functorch

Cloning into 'functorch'...
remote: Enumerating objects: 16561, done.[K
remote: Counting objects: 100% (16560/16560), done.[K
remote: Compressing objects: 100% (4205/4205), done.[K
remote: Total 16561 (delta 12227), reused 16345 (delta 12073), pack-reused 1[K
Receiving objects: 100% (16561/16561), 9.33 MiB | 15.52 MiB/s, done.
Resolving deltas: 100% (12227/12227), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting functorch
  Downloading functorch-1.13.0-py2.py3-none-any.whl (2.1 kB)
Installing collected packages: functorch
Successfully installed functorch-1.13.0


In [None]:
print("--> Restarting colab instance") 
get_ipython().kernel.do_shutdown(True)

--> Restarting colab instance


{'status': 'ok', 'restart': True}

In [None]:
import torch
from functorch import vmap, grad

x = torch.randn(3)
y = vmap(torch.sin)(x)
assert torch.allclose(y, x.sin())

x = torch.randn([])
y = grad(torch.sin)(x)
assert torch.allclose(y, x.cos())

In [None]:
# !git clone https://github.com/pytorch/torchdynamo
# cd torchdynamo
# !pip install -r requirements.txt
# !pip install -e .

In [None]:
!pip install torchdynamo

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torchdynamo
  Downloading torchdynamo-1.13.0.tar.gz (274 kB)
[K     |████████████████████████████████| 274 kB 5.2 MB/s 
Building wheels for collected packages: torchdynamo
  Building wheel for torchdynamo (setup.py) ... [?25l[?25hdone
  Created wheel for torchdynamo: filename=torchdynamo-1.13.0-cp37-cp37m-linux_x86_64.whl size=2700366 sha256=418ac2c149c3c51bb010610a2442d05ac6c151d092122de1fc1c524698858b3c
  Stored in directory: /root/.cache/pip/wheels/e2/2d/e2/f0a4a0070b1c5037a796cb199eba4546582d684ac717b30a2b
Successfully built torchdynamo
Installing collected packages: torchdynamo
Successfully installed torchdynamo-1.13.0


In [None]:
!pip3 install git+https://github.com/pytorch/torchdynamo.git@ee1b62f4947a94a87c288f38711de76bfaa2ffd4 --no-cache-dir
!pip install matplotlib seaborn

In [None]:
import torch
torch._dynamo.list_backends()

AttributeError: ignored

In [None]:
@torch._dynamo.optimize()
def toy_example(a, b):
  pritnt("ff")

In [None]:
import torch

In [None]:
import torchdynamo
# with torchdynamo.optimize():

In [None]:
import torchdynamo
from torchdynamo.optimizations.training import aot_autograd_speedup_strategy
# with torchdynamo.optimize(aot_autograd_speedup_strategy):
  # print("ff")

In [None]:
start = time.time()
with autocast():
    for i in range(100):
        pred = model.reco_predictor.model(input2.cuda())
print("fp16 gpu pytorch time big batch", time.time() - start)
# print(np.testing.assert_allclose(valid_pred.detach().cpu().numpy(), pred.detach().cpu().numpy(), rtol=1e-3, atol=1e-5))


In [None]:
#fp32 onnx conversion
input = input.to(torch.float32)
input2 = input2.to(torch.float32)
torch.onnx.export(model.reco_predictor.model.to(torch.float32),
                  input.cuda(),
                  "rec.onnx",
                  export_params = True,
                  opset_version=11,
                  do_constant_folding=True,
                  input_names = ["input"],
                  output_names = ["output"],
                  dynamic_axes = {"input":{0:"batch_size"},
                                  "output":{0:"batch_size"}})

In [None]:

ort_session = onnxruntime.InferenceSession("rec.onnx", providers = ['CUDAExecutionProvider'])
ort_inputs = {"input":input.numpy()}
start = time.time()
ort_outs = ort_session.run(None, ort_inputs)
print("first sample time onnx fp32 cuda", time.time() - start)
start = time.time()
for i in range(100):
    ort_outs = ort_session.run(None, ort_inputs)
print("fp32 onnx time single sample cuda", time.time() - start)
