In this notebook, we are going to fine-tune `LayoutLMv2ForTokenClassification` on the [FUNSD](https://guillaumejaume.github.io/FUNSD/) dataset. The goal for the model is to label words appearing in scanned documents appropriately. This task is treated as a NER problem (sequence labeling). However, compared to BERT, LayoutLMv2 also incorporates visual and layout information about the tokens when encoding them into vectors. This makes the LayoutLMv2 model very powerful for document understanding tasks.

LayoutLMv2 is itself an upgrade of LayoutLM. The main novelty of LayoutLMv2 is that it also pre-trains visual embeddings, whereas the original LayoutLM only adds visual embeddings during fine-tuning.

* Paper: https://arxiv.org/abs/2012.14740
* Original repo: https://github.com/microsoft/unilm/tree/master/layoutlmv2

## Install dependencies

First, we install the required libraries:
* Transformers (for the LayoutLMv2 model)
* Datasets (for data preprocessing)
* Seqeval (for metrics)
* Detectron2 (which LayoutLMv2 requires for its visual backbone).



In [2]:
#!pip install -q git+https://github.com/huggingface/transformers.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m107.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone


In [3]:
#!pip install -q datasets seqeval

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.7/468.7 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.2/212.2 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.9/132.9 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m264.6/264.6 kB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.2/114.2 kB[0m [31m7.7 MB/s[0

In [5]:
#!pip install -q pyyaml==5.1
# workaround: install old version of pytorch since detectron2 hasn't released packages for pytorch 1.9 (issue: https://github.com/facebookresearch/detectron2/issues/3158)
!pip install -q torch==1.8.0+cu101 torchvision==0.9.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html


[notice] A new release of pip is available: 23.0.1 -> 23.1
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.


In [1]:
# install detectron2 that matches pytorch 1.8
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
#!pip install -q detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
!python -m pip install git+https://github.com/facebookresearch/detectron2.git

Collecting git+https://github.com/facebookresearch/detectron2.git
  Cloning https://github.com/facebookresearch/detectron2.git to c:\users\joaoo\appdata\local\temp\pip-req-build-rbk8fgw0
  Resolved https://github.com/facebookresearch/detectron2.git to commit 4e447553eb32b6e3784df0b8fca286935107b2fd
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting pycocotools>=2.0.2
  Using cached pycocotools-2.0.6.tar.gz (24 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting termcolor>=1.1
  Using cached termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting yacs>=0.1.8
  Using cached yacs-0.1.8-py3-none-any.whl (14 kB)
Collecting tabulate
  Using cache

  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\joaoo\AppData\Local\Temp\pip-req-build-rbk8fgw0'
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [370 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-39
      creating build\lib.win-amd64-cpython-39\detectron2
      copying detectron2\__init__.py -> build\lib.win-amd64-cpython-39\detectron2
      creating build\lib.win-amd64-cpython-39\tools
      copying tools\analyze_model.py -> build\lib.win-amd64-cpython-39\tools
      copying tools\benchmark.py -> build\lib.win-amd64-cpython-39\tools
      copying tools\convert-torchvision-to-d2.py -> build\lib.win-amd64-cpython-39\tools
      copying tools\lazyconfig_train_net.py -> build\lib.win-amd64-cpython-39\tools
      copying tools\lightning_train_net.p

To be able to share your model with the community on the HuggingFace hub, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/welcome) if you haven't already!) then uncomment the following cell and input your username and password (this only works on Colab, in a regular notebook, you need to do this in a terminal):

In [5]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid.
Your token has been saved to /root/.cache/huggingface/token
Login successful


Then you need to install Git-LFS (which is used by the hub) and setup Git if you haven't already. Uncomment the following instructions and adapt with your name and email:

In [None]:
#!apt install git-lfs
#!git config --global user.email "example@gmail.com"
#!git config --global user.name "your name"

## Prepare the data

Let's load the FUNSD dataset from the HuggingFace hub.

In [1]:
from datasets import load_dataset

datasets = load_dataset("jfecunha/arquivo_news")



  0%|          | 0/2 [00:00<?, ?it/s]

As we can see, it contains a training and test split. Each example consists of an id, tokens, bounding boxes, NER tags (in IOB format) and a document image. Note: tokens might be a bit misleading here, because these are still words. We need to convert them to actual tokens (word pieces) using the tokenizer. 

In [2]:
datasets

DatasetDict({
    train: Dataset({
        features: ['id', 'words', 'bboxes', 'labels', 'image', 'image_path', 'source', '__index_level_0__'],
        num_rows: 320
    })
    test: Dataset({
        features: ['id', 'words', 'bboxes', 'labels', 'image', 'image_path', 'source', '__index_level_0__'],
        num_rows: 80
    })
})

In [3]:
datasets['train'].features

{'id': Value(dtype='int64', id=None),
 'words': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
 'bboxes': Sequence(feature=Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='float64', id=None), length=-1, id=None),
 'image': Value(dtype='binary', id=None),
 'image_path': Value(dtype='string', id=None),
 'source': Value(dtype='string', id=None),
 '__index_level_0__': Value(dtype='int64', id=None)}

## Preprocess data

First, let's store the labels in a list, and create dictionaries that let us map from labels to integer indices and vice versa. The latter will be useful when evaluating the model.

In [4]:
label2id = {
    'None': 0,
    'Title': 1,
    'SubTitle': 2,
    'Category': 3
}

label2id

{'None': 0, 'Title': 1, 'SubTitle': 2, 'Category': 3}

In [5]:
id2label = {v: k for v, k in enumerate(label2id)}
id2label

{0: 'None', 1: 'Title', 2: 'SubTitle', 3: 'Category'}

Next, let's use `LayoutLMv2Processor` to prepare the data for the model.

In [6]:
from PIL import Image
from transformers import LayoutLMv2Processor, LayoutXLMProcessor, LayoutXLMTokenizer
from datasets import Features, Sequence, ClassLabel, Value, Array2D, Array3D

import io

#processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
processor = LayoutXLMProcessor.from_pretrained("microsoft/layoutxlm-base", apply_ocr=False)

from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor

# we need to define custom features
features = Features({
    'image': Array3D(dtype="int64", shape=(3, 224, 224)),
    'input_ids': Sequence(feature=Value(dtype='int64')),
    'attention_mask': Sequence(Value(dtype='int64')),
    'token_type_ids': Sequence(Value(dtype='int64')),
    'bbox': Array2D(dtype="int64", shape=(512, 4)),
    'labels': Sequence(ClassLabel(names=list(label2id.keys()))),
})

def process_image(image_bytes):
  """Decode byte image."""
  with io.BytesIO(image_bytes) as f:
    img = Image.open(f).convert("RGB")
    return img 
    
def preprocess_data(examples):
  images = [process_image(img) for img in examples['image']]
  words = examples['words']
  boxes = examples['bboxes']
  word_labels =  examples['labels']

  print(word_labels[0])  
  print(boxes[0]) 
  print(words[0])   
  
  encoded_inputs = processor(images, words, boxes=boxes, word_labels=word_labels,
                             padding="max_length", truncation=True, return_token_type_ids=True, max_length=512)
  
  return encoded_inputs

train_dataset = datasets['train'].map(preprocess_data, batched=True, remove_columns=datasets['train'].column_names,
                                      features=features)
test_dataset = datasets['test'].map(preprocess_data, batched=True, remove_columns=datasets['test'].column_names,
                                      features=features)



Map:   0%|          | 0/320 [00:00<?, ? examples/s]

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0,

Map:   0%|          | 0/80 [00:00<?, ? examples/s]

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0,

In [7]:
features

{'image': Array3D(shape=(3, 224, 224), dtype='int64', id=None),
 'input_ids': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'attention_mask': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'token_type_ids': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'bbox': Array2D(shape=(512, 4), dtype='int64', id=None),
 'labels': Sequence(feature=ClassLabel(names=['None', 'Title', 'SubTitle', 'Category'], id=None), length=-1, id=None)}

In [8]:
train_dataset

Dataset({
    features: ['image', 'input_ids', 'attention_mask', 'token_type_ids', 'bbox', 'labels'],
    num_rows: 320
})

In [9]:
len(train_dataset['image'][0])

3

Let's verify the first example:

In [10]:
processor.tokenizer.decode(train_dataset['input_ids'][0])

"<s> = Ss XDFESSO. D Excwusio + t — lle Daniel Oli ‘ira al m 2G —~ ~ Excwusivo ps Coutinho 4_Louwenco Pereira ure aed i Orcamento.do Stado aprovado E na generalidade ILIANA COELHO, MARIAN) old WA CUNHAEMIGUI éaimperturbdvel a VISLOS xr063S GO negécio alma do do tado do amento py i apr generalidade votos eira na com do da esquerda dos or eis ncao e deputad partid adireita D- A. C lo de onade ram ongo uma m ace ae de Ita = GH re deb siaxteéden direita (e te Pedro uma perdida) uma obsessao. A opgao E 09.01.2020 por como a esquerda aceita Orcamento adireita adorava um que terfeito nao (dentro jd C 505 enteno ou, ft ou 20 ‘V) onga? 40 te debate acabou Costasentado O 01.2020 que com ao abutres Q.som fazem que os de pecialistas Irao ucranianos do acesso as Calxas hegras aviao que Teerao caiu em ha minutos LUSA j ‘Anos: equipa ev dis de imprensa, ik nferéncia Va Pr numa m uc dos da Ne mini 10 ral 10! nia Adi Adiada sobre homicidio idio a sentenca devi ia a de triatleta devido “alteracao a do’ 

In [11]:
print(train_dataset['labels'][0])

[-100, 0, 0, -100, 0, -100, -100, -100, -100, 0, 0, -100, -100, 0, 0, 0, 0, 0, 0, 0, -100, 0, 0, 0, -100, 0, -100, 0, 0, -100, -100, -100, 0, 0, -100, -100, 0, -100, -100, -100, -100, 0, 0, 0, -100, 0, 0, -100, -100, -100, -100, 0, -100, 0, 0, 0, 0, -100, 0, 0, -100, 0, -100, -100, -100, 0, -100, -100, 0, 0, 0, -100, -100, -100, -100, -100, 0, -100, -100, -100, -100, -100, 0, 0, -100, 0, -100, -100, -100, -100, 0, 0, -100, -100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -100, 0, 0, -100, -100, 0, 0, -100, 0, -100, 0, -100, 0, 0, 0, 0, -100, 0, 0, 0, 0, 0, 0, -100, 0, -100, 0, 0, 0, 0, 0, 0, 0, 0, -100, -100, -100, -100, 0, -100, 0, -100, 0, 0, 0, 0, -100, 0, 0, -100, -100, -100, 0, 0, -100, -100, 0, 0, -100, -100, 0, 0, 0, 0, 0, 0, -100, -100, 0, -100, 0, -100, 0, 0, 0, -100, 0, -100, 0, -100, -100, 0, 0, 0, 0, -100, 0, -100, 0, -100, 0, 0, 0, 0, 0, -100, -100, 0, -100, -100, 0, 0, 0, 0, 0, -100, -100, 0, 0, -100, 0, 0, 0, 0, -100, 0, -100, -100, 

Finally, let's set the format to PyTorch.

In [12]:
train_dataset.set_format(type="torch")
test_dataset.set_format(type="torch")

In [13]:
train_dataset.features.keys()

dict_keys(['image', 'input_ids', 'attention_mask', 'token_type_ids', 'bbox', 'labels'])

Next, we create corresponding dataloaders.

In [14]:
from torch.utils.data import DataLoader

train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=2)

Let's verify a batch:

In [15]:
batch = next(iter(train_dataloader))

for k,v in batch.items():
  print(k, v.shape)

image torch.Size([4, 3, 224, 224])
input_ids torch.Size([4, 512])
attention_mask torch.Size([4, 512])
token_type_ids torch.Size([4, 512])
bbox torch.Size([4, 512, 4])
labels torch.Size([4, 512])


## Train the model

Here we train the model using HuggingFace's Trainer. We need to overwrite a few methods, namely those that return the PyTorch dataloaders, as we defined custom dataloaders above.

We can initialize a `Trainer` by passing our model as well as `TrainingArguments`. See the [docs](https://huggingface.co/transformers/main_classes/trainer.html) for all possible arguments..

In [16]:
from transformers import LayoutLMv2ForTokenClassification, TrainingArguments, Trainer
from datasets import load_metric
import numpy as np

model = LayoutLMv2ForTokenClassification.from_pretrained('microsoft/layoutxlm-base',
                                                                      num_labels=len(label2id))

# Set id2label and label2id 
model.config.id2label = id2label
model.config.label2id = label2id

# Metrics
metric = load_metric("seqeval")
return_entity_level_metrics = True

def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)

    # Remove ignored index (special tokens)
    true_predictions = [
        [id2label[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [id2label[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = metric.compute(predictions=true_predictions, references=true_labels)
    if return_entity_level_metrics:
        # Unpack nested dictionaries
        final_results = {}
        for key, value in results.items():
            if isinstance(value, dict):
                for n, v in value.items():
                    final_results[f"{key}_{n}"] = v
            else:
                final_results[key] = value
        return final_results
    else:
        return {
            "precision": results["overall_precision"],
            "recall": results["overall_recall"],
            "f1": results["overall_f1"],
            "accuracy": results["overall_accuracy"],
        }

class FunsdTrainer(Trainer):
    def get_train_dataloader(self):
      return train_dataloader

    def get_test_dataloader(self, test_dataset):
      return test_dataloader

args = TrainingArguments(
    output_dir="layoutlmv2-finetuned-funsd-v2", # name of directory to store the checkpoints
    max_steps=1000, # we train for a maximum of 1,000 batches
    warmup_ratio=0.1, # we warmup a bit
    fp16=True, # we use mixed precision (less memory consumption)
    push_to_hub=False, # after training, we'd like to push our model to the hub
    push_to_hub_model_id=f"layoutlmv2-finetuned-funsd-test", # this is the name we'll use for our model on the hub
)

# Initialize our Trainer
trainer = FunsdTrainer(
    model=model,
    args=args,
    compute_metrics=compute_metrics,
)

Some weights of the model checkpoint at microsoft/layoutxlm-base were not used when initializing LayoutLMv2ForTokenClassification: ['layoutlmv2.visual.backbone.bottom_up.res4.20.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv3.norm.num_batches_tracked', 'lay

Let's train the model! By default, the Trainer saves checkpoints every 500 steps.

In [17]:
trainer.train()



Step,Training Loss
500,0.3365
1000,0.0424


TrainOutput(global_step=1000, training_loss=0.1894197406768799, metrics={'train_runtime': 666.6626, 'train_samples_per_second': 12.0, 'train_steps_per_second': 1.5, 'total_flos': 2159409414144000.0, 'train_loss': 0.1894197406768799, 'epoch': 12.5})

In [18]:
args

TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=cuda_amp,
hub_model_id=jfecunha/layoutlmv2-finetuned-funsd-test,
hub_private_repo=False,
hub_strategy=e

To compute metrics on the test set, we can run `trainer.predict()`. We get predictions, labels, and metrics back.

In [19]:
predictions, labels, metrics = trainer.predict(test_dataset)



In [20]:
print(metrics)

{'test_loss': 0.3373465836048126, 'test_ategory_precision': 0.7613636363636364, 'test_ategory_recall': 0.7745664739884393, 'test_ategory_f1': 0.7679083094555875, 'test_ategory_number': 173, 'test_itle_precision': 0.8196864111498258, 'test_itle_recall': 0.8554545454545455, 'test_itle_f1': 0.8371886120996441, 'test_itle_number': 1100, 'test_one_precision': 0.8487467588591184, 'test_one_recall': 0.8472821397756687, 'test_one_f1': 0.8480138169257341, 'test_one_number': 1159, 'test_ubTitle_precision': 0.9475677250218468, 'test_ubTitle_recall': 0.9474297364205622, 'test_ubTitle_f1': 0.9474987256972257, 'test_ubTitle_number': 6867, 'test_overall_precision': 0.9161228201561998, 'test_overall_recall': 0.9208517044843532, 'test_overall_f1': 0.9184811755872574, 'test_overall_accuracy': 0.9513093828135281, 'test_runtime': 9.6897, 'test_samples_per_second': 8.256, 'test_steps_per_second': 1.032}


In [21]:
predictions[0]

array([[ 8.03  , -2.838 , -3.068 , -2.398 ],
       [ 7.996 , -2.865 , -2.932 , -2.38  ],
       [ 7.875 , -3.008 , -2.928 , -2.363 ],
       ...,
       [-1.343 ,  6.402 , -1.897 , -3.51  ],
       [ 4.668 ,  0.3877, -1.639 , -3.182 ],
       [ 3.238 , -0.2375, -0.2727, -1.741 ]], dtype=float16)

## Share model on the hub

Finally, we can easily push our model to the hub as follows:

In [None]:
trainer.push_to_hub()