training code #14

an1018 · 2022-10-18T10:29:17Z

Hi，thanks for your great work, and when will you release the training code?

fh2019ustc · 2022-10-18T11:30:29Z

Hi, thanks for your attention to our work. We will release the training code after the acceptance of our work DocScanner.

an1018 · 2022-10-19T08:57:16Z

Thanks for your reply, could you tell us your training environment（such as, the number and model of GPU）、the training time of geometric unwarping transformer and illumination correction transformer

fh2019ustc · 2022-10-21T03:16:35Z

For geometric unwarping, we use 4 GPUs for training. The training takes about 3 days.
For illumination correction, we use 2 GPUs for training. The training takes about 1 day.
In fact, we do not conduct hyper-parameter tuning experiments on the batch size, learning rate, and number of GPU.

an1018 · 2022-10-25T03:11:42Z

Thanks for your detailed explanation, and the training GPUs of DocScanner is NVIDIA RTX 2080 Ti GPUs and NVIDIA GTX 1080 Ti GPU, which one is used in DocTr?

fh2019ustc · 2022-10-25T04:40:04Z

Hi, for DocTr we use 1080 Ti GPUs. In fact, based on our experience, the category of GPU seems not to affect the performance of our method.

an1018 · 2022-11-08T14:44:00Z

When writing the training code,I have some confusion.
1）Before training the GeoTr module, the background needs to be removed. Is it handled by the pre-trained model of the Segmentation module?

2）And after removing the background，the result looks like the image on the right?

3）But in DocScanner, is ground truth mask the result of document localization module? If yes, Why does it say groud truth?

fh2019ustc · 2022-11-08T15:21:05Z

Thanks for your attention to our work.

To train the segmentation module, we remove the noisy backgrounds using the GT masks rather than the pre-trained segmentation module. This is the same for our DocTr and DocScanner.
You can also upsample the mask to the original resolution as the input image and then multiply them at the original resolution.
See the A1.

Hope this helps.

an1018 · 2022-11-09T01:54:09Z

Is there any reference code? And what does GT masks represent in the doc3d dataset?

fh2019ustc · 2022-11-09T03:19:36Z

In fact, it is easy to extract the GT mask of the document image from other annotations. For example, in UV map, the values of the background region are 0.

an1018 · 2022-11-16T14:26:58Z

@fh2019ustc I've written the training code, but the model does not converge. I'vd send the code to your email（haof@mail.ustc.edu.cn）, could you look at the code？Thanks very much.

Aiden0609 · 2023-04-15T01:01:14Z

@an1018 So, have you reproduced it successfully with your own training code?

minhduc01168 · 2024-04-11T10:04:17Z

@an1018 So have you successfully written your own training code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training code #14

training code #14

an1018 commented Oct 18, 2022

fh2019ustc commented Oct 18, 2022

an1018 commented Oct 19, 2022

fh2019ustc commented Oct 21, 2022

an1018 commented Oct 25, 2022

fh2019ustc commented Oct 25, 2022

an1018 commented Nov 8, 2022

fh2019ustc commented Nov 8, 2022

an1018 commented Nov 9, 2022

fh2019ustc commented Nov 9, 2022

an1018 commented Nov 16, 2022

Aiden0609 commented Apr 15, 2023

minhduc01168 commented Apr 11, 2024

training code #14

training code #14

Comments

an1018 commented Oct 18, 2022

fh2019ustc commented Oct 18, 2022

an1018 commented Oct 19, 2022

fh2019ustc commented Oct 21, 2022

an1018 commented Oct 25, 2022

fh2019ustc commented Oct 25, 2022

an1018 commented Nov 8, 2022

fh2019ustc commented Nov 8, 2022

an1018 commented Nov 9, 2022

fh2019ustc commented Nov 9, 2022

an1018 commented Nov 16, 2022

Aiden0609 commented Apr 15, 2023

minhduc01168 commented Apr 11, 2024