Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pre-trained models #46

Open
HoomanKhosravi opened this issue Nov 16, 2022 · 17 comments
Open

Using pre-trained models #46

HoomanKhosravi opened this issue Nov 16, 2022 · 17 comments

Comments

@HoomanKhosravi
Copy link

Hello, thank you for the great work.
I used this script to run the pre-training for MLM task: https://github.com/shabie/docformer/blob/master/examples/docformer_pl/3_Pre_training_DocFormer_Task_MLM_Task.ipynb
Afterwards, I used the resulting model in the token-classification task. ( using load_from_check_point which copies all the weight except the linear layer which has a different shape).

The problem is that no matter how much I run the pre-training, I always get the same metrics in the token-classification task (using that pre-trained model as a starting point).

I even tried the model from document-classification task as a base for token classification and I still the get same exact metrics as the results I was getting from using the MLM-pretrained task.

Any suggestion on how to properly use the pre-trained models?

@uakarsh
Copy link
Collaborator

uakarsh commented Nov 16, 2022

Not sure, how to go about it, because I tried just a few samples of documents. I think, there is something which saturates the entire training, would look for it, in a few days.

@HoomanKhosravi
Copy link
Author

thanks for the quick response! ok

@uakarsh
Copy link
Collaborator

uakarsh commented Nov 16, 2022

Can you let me know, what approach did you try? I guess, it would be helpful for me as well, to conduct the experiment

@HoomanKhosravi
Copy link
Author

first, I ran the MLM task and got the pre-trained model. In the token_cls task, after model init (docformer = DocFormer(config)), I used load_from_checkpoint to load the pre-trained model ( had to override on_load_checkpoint to skip copying layers with mismatch size, which were the weights and biases for linear layer). then started training the model.

@uakarsh
Copy link
Collaborator

uakarsh commented Nov 16, 2022

Got it. Thanks, and would update here, as soon as possible

@uakarsh
Copy link
Collaborator

uakarsh commented Nov 24, 2022

Hi @HoomanKhosravi, I did try to train the docformer on the FUNSD dataset from scratch.

Here is the script, Kaggle Notebook, and here are the results Weight and Bias Report

When we see the results, the model is overfitting the train data and can obtain 100 percentage results on each of those metrics but around 30 percentage on the validation set (refer to the kaggle notebook's last section). In order to obtain good metrics for the validation dataset, I think some tricks to prevent overfitting can help. Were you getting similar results?

Regards,
Akarsh

@HoomanKhosravi
Copy link
Author

@uakarsh sorry for the late reply. yes I get similar results, however this is not using any pre-training. My original issue was that I couldn't load checkpoints for pre-training. have you tried using the pre-training?

@uakarsh
Copy link
Collaborator

uakarsh commented Dec 13, 2022

Hi, nope, I haven't tried to pre-train and then fine-tune it, but I am planning to do it in a few days and then observe the results. I guess, I have been able to write scripts for pre-training, so I would get back here, as soon as possible

Regards,
Akarsh

@riteshKumarUMass
Copy link

Hi,
I am also facing the exact same issue where model is overfitting on the training dataset. Did you have any luck?

@uakarsh
Copy link
Collaborator

uakarsh commented Dec 19, 2022

I guess pre-training would help, I have managed to prepare a small data for pre-training. It is actually IDL Dataset (https://github.com/furkanbiten/idl_data) and in some time I am about to train and get the results.

@uakarsh
Copy link
Collaborator

uakarsh commented Dec 19, 2022

I am looking at the approach in this way, if the model can overfit that means, the implementation is going in the right direction. Now, the idea of pre-training came, because in the paper as well, the authors mentioned about pre-training on a large data gave them good results. So, that was the reason, pre-training came in mind

@uakarsh
Copy link
Collaborator

uakarsh commented Dec 19, 2022

Although, I didn't get, when you say "there is no attempt to save the embedding features or visual features"

@riteshKumarUMass
Copy link

My bad. I misunderstood your earlier comment.
If you see any improvement with the pertaining, would you be able to share the pertained weights?

@uakarsh
Copy link
Collaborator

uakarsh commented Dec 20, 2022

For sure

@riteshKumarUMass
Copy link

@uakarsh , any luck with the pre-training?

@uakarsh
Copy link
Collaborator

uakarsh commented Dec 24, 2022

Not till now (since busy with one of my research work)

@uakarsh
Copy link
Collaborator

uakarsh commented Feb 13, 2023

Hi @riteshKumarUMass @HoomanKhosravi, can you guys again try the fine-tuning using the pre-trained weights (I have attached them in the readme)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants