Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training code #14

Open
an1018 opened this issue Oct 18, 2022 · 12 comments
Open

training code #14

an1018 opened this issue Oct 18, 2022 · 12 comments

Comments

@an1018
Copy link

an1018 commented Oct 18, 2022

Hi,thanks for your great work, and when will you release the training code?

@fh2019ustc
Copy link
Owner

Hi, thanks for your attention to our work. We will release the training code after the acceptance of our work DocScanner.

@an1018
Copy link
Author

an1018 commented Oct 19, 2022

Thanks for your reply, could you tell us your training environment(such as, the number and model of GPU)、the training time of geometric unwarping transformer and illumination correction transformer

@fh2019ustc
Copy link
Owner

For geometric unwarping, we use 4 GPUs for training. The training takes about 3 days.
For illumination correction, we use 2 GPUs for training. The training takes about 1 day.
In fact, we do not conduct hyper-parameter tuning experiments on the batch size, learning rate, and number of GPU.

@an1018
Copy link
Author

an1018 commented Oct 25, 2022

Thanks for your detailed explanation, and the training GPUs of DocScanner is NVIDIA RTX 2080 Ti GPUs and NVIDIA GTX 1080 Ti GPU, which one is used in DocTr?

@fh2019ustc
Copy link
Owner

Hi, for DocTr we use 1080 Ti GPUs. In fact, based on our experience, the category of GPU seems not to affect the performance of our method.

@an1018
Copy link
Author

an1018 commented Nov 8, 2022

When writing the training code,I have some confusion.
1)Before training the GeoTr module, the background needs to be removed. Is it handled by the pre-trained model of the Segmentation module?
image

2)And after removing the background,the result looks like the image on the right?
image

3)But in DocScanner, is ground truth mask the result of document localization module? If yes, Why does it say groud truth?

image

@fh2019ustc
Copy link
Owner

Thanks for your attention to our work.

  1. To train the segmentation module, we remove the noisy backgrounds using the GT masks rather than the pre-trained segmentation module. This is the same for our DocTr and DocScanner.
  2. You can also upsample the mask to the original resolution as the input image and then multiply them at the original resolution.
  3. See the A1.

Hope this helps.

@an1018
Copy link
Author

an1018 commented Nov 9, 2022

Is there any reference code? And what does GT masks represent in the doc3d dataset?
image

@fh2019ustc
Copy link
Owner

In fact, it is easy to extract the GT mask of the document image from other annotations. For example, in UV map, the values of the background region are 0.

@an1018
Copy link
Author

an1018 commented Nov 16, 2022

@fh2019ustc I've written the training code, but the model does not converge. I'vd send the code to your email(haof@mail.ustc.edu.cn), could you look at the code?Thanks very much.

@Aiden0609
Copy link

@an1018 So, have you reproduced it successfully with your own training code?

@minhduc01168
Copy link

@an1018 So have you successfully written your own training code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants