[references] Add FP16 support for training #263

fg-mindee · 2021-05-17T08:49:41Z

The reference training script should have an option to switch from a FP32 training to a FP16 training.
It would raise a question: how to harmonize model loading from both FP?

A few suggestions:

simply cast the FP16 checkpoint to FP32 before hashing + uploading
for a given model, each checkpoint is marked with its FP. The user may choose how the model needs to be instantiated: by default the output FP is FP32, but this can be changed by the user. Depending on the checkpoint FP, it will be cast to its counterpart FP for loading.

The second option would be cleaner and avoid unnecessary large checkpoints when they can be kept in FP16

Here is a proposition:

add support of FP16 to doctr.datasets (feat: Added FP16 support to doctr.datasets #367)
add support of FP16 to doctr.models (feat: Added FP16 support for doctr.models #382)
add support of FP16 to doctr.transforms (feat: Added support of fp16 to doctr.transforms #388)
add support of FP16 to references with PyTorch (feat: Added support of AMP to all PyTorch training scripts #604)
add support of FP16 to references with TensorFlow (feat: Added option to use AMP with TF scripts #682)

The text was updated successfully, but these errors were encountered:

fg-mindee · 2021-11-10T16:31:52Z

Quick update: this design was good on paper but ML frameworks actually uses AMP instead of full fp16. And other data types yield issues with post processing (cv2 doesn't like things that are not in fp32 among others)

Adding AMP support to PyTorch trainings (feat: Added support of AMP to all PyTorch training scripts #604)
Adding AMP support to TensorFlow trainings (feat: Added option to use AMP with TF scripts #682)

fg-mindee · 2021-11-16T16:19:45Z

Moving this to a later release for TensorFlow support of AMP

fg-mindee added help wanted Extra attention is needed ext: references Related to references folder labels May 17, 2021

fg-mindee added this to the 0.3.0 milestone May 17, 2021

fg-mindee modified the milestones: 0.3.0, 0.3.1 Jul 1, 2021

fg-mindee self-assigned this Jul 1, 2021

fg-mindee added framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend labels Jul 6, 2021

This was referenced Jul 7, 2021

feat: Added FP16 support to doctr.datasets #367

Merged

feat: Added FP16 support for doctr.models #382

Merged

fg-mindee mentioned this issue Jul 22, 2021

feat: Added support of fp16 to doctr.transforms #388

Merged

fg-mindee modified the milestones: 0.3.1, 0.4.0 Aug 26, 2021

fg-mindee modified the milestones: 0.4.0, 0.4.1 Sep 20, 2021

fg-mindee mentioned this issue Nov 16, 2021

refactor: Drops support of np.float16 #627

Merged

fg-mindee modified the milestones: 0.4.1, 0.6.0 Nov 16, 2021

fg-mindee mentioned this issue Dec 7, 2021

feat: Added option to use AMP with TF scripts #682

Merged

fg-mindee closed this as completed in #682 Dec 8, 2021

fg-mindee mentioned this issue Jan 10, 2022

Release tracker - v0.6.0 #791

Closed

85 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[references] Add FP16 support for training #263

[references] Add FP16 support for training #263

fg-mindee commented May 17, 2021 •

edited by felixdittrich92

Loading

fg-mindee commented Nov 10, 2021 •

edited

Loading

fg-mindee commented Nov 16, 2021

[references] Add FP16 support for training #263

[references] Add FP16 support for training #263

Comments

fg-mindee commented May 17, 2021 • edited by felixdittrich92 Loading

fg-mindee commented Nov 10, 2021 • edited Loading

fg-mindee commented Nov 16, 2021

fg-mindee commented May 17, 2021 •

edited by felixdittrich92

Loading

fg-mindee commented Nov 10, 2021 •

edited

Loading