Skip to content

Roadmap

Carlos Mocholí edited this page Dec 21, 2020 · 6 revisions

🎯 (in no particular order)

Improve half-precision support

Variable input height models use special operators that do not support Half precision tensors. There is an open PR to add it: https://github.com/jpuigcerver/nnutils/pull/5

Improve segmentation output

The segmentation's (greedy) probability is not included in the segmentation output. https://github.com/carmocca/PyLaia/commit/04bc75ce84d84a24666702cd0dfc28b808602b58 contains part of what is required.

Another improvement is to try to estimate the word probability by using the {sum,avg,mult,softmax}. This requires some research. The WER can be used to evaluate it.

Support auto learning rate finder

Use https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.tuner.lr_finder.html#module-pytorch_lightning.tuner.lr_finder

The implementation should be straightforward. Could be available in a new script (pylaia-htr-tune)

Support auto batch size finder

Use https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.tuner.batch_size_scaling.html#module-pytorch_lightning.tuner.batch_size_scaling

Using this is much more complicated. The scaler algorithm assumes that all batches have the same size. This is not the case for PyLaia where even though a fixed batch size is used (B), each image might have a drastically different size (HxW).

Since images are collated (to make use of efficient batching), a batch of B items occupies the memory of the largest item times B. For this reason, the scaler algorithm would have to be run with batches that occupy as much memory as the batch that contains the largest image in the dataset. Related to the item below.

Reduce memory footprint by bucketing input samples

During training, the dataset images are shuffled and then sampled into batches of size B. Since images are collated, a batch of B items occupies the memory of the largest item times B.

If we sort the dataset by image size, we could batch (bucket) input samples efficiently by size. Two possibilities:

  • Split the dataset into batches of size B where each batch contains images of very similar size. This minimizes the padding data. The downside is that if image sizes are very different, the GPU memory will be underutilized during a large part of training.
  • Split the dataset into batches of any size that maximize the available GPU memory. This approach would greatly improve the training speed, however, further research would be required to check how training would be impacted. Some batches could contain many many small images which is problematic to learning. Also, a different optimization algorithm might be needed for this, as well as, some adaptive learning rate schedule based on the current batch size. An upper bound on the batch size might help.

Might also be problematic when using distributed (DDP) or automatic mixed precision (AMP).

Some work has been done on this, available at https://github.com/carmocca/PyLaia/tree/samplers

Also, references to similar ideas/implementations:

Improve Kaldi integration

pylaia-htr-netout could use a library like PyKaldi to (for example) directly generate .scp files.

Support fine-tuning

A pylaia-htr-tune script could be created to support fine-tuning the last linear layer of the model. This is very interesting for transfer learning tasks or to allow changes in the vocabulary set used.

Add language modelling support (pylaia-htr-beam-decode)