Using pre-trained models #46

HoomanKhosravi · 2022-11-16T18:04:23Z

Hello, thank you for the great work.
I used this script to run the pre-training for MLM task: https://github.com/shabie/docformer/blob/master/examples/docformer_pl/3_Pre_training_DocFormer_Task_MLM_Task.ipynb
Afterwards, I used the resulting model in the token-classification task. ( using load_from_check_point which copies all the weight except the linear layer which has a different shape).

The problem is that no matter how much I run the pre-training, I always get the same metrics in the token-classification task (using that pre-trained model as a starting point).

I even tried the model from document-classification task as a base for token classification and I still the get same exact metrics as the results I was getting from using the MLM-pretrained task.

Any suggestion on how to properly use the pre-trained models?

uakarsh · 2022-11-16T18:13:01Z

Not sure, how to go about it, because I tried just a few samples of documents. I think, there is something which saturates the entire training, would look for it, in a few days.

HoomanKhosravi · 2022-11-16T18:14:58Z

thanks for the quick response! ok

uakarsh · 2022-11-16T18:19:39Z

Can you let me know, what approach did you try? I guess, it would be helpful for me as well, to conduct the experiment

HoomanKhosravi · 2022-11-16T18:45:09Z

first, I ran the MLM task and got the pre-trained model. In the token_cls task, after model init (docformer = DocFormer(config)), I used load_from_checkpoint to load the pre-trained model ( had to override on_load_checkpoint to skip copying layers with mismatch size, which were the weights and biases for linear layer). then started training the model.

uakarsh · 2022-11-16T19:24:57Z

Got it. Thanks, and would update here, as soon as possible

uakarsh · 2022-11-24T03:10:40Z

Hi @HoomanKhosravi, I did try to train the docformer on the FUNSD dataset from scratch.

Here is the script, Kaggle Notebook, and here are the results Weight and Bias Report

When we see the results, the model is overfitting the train data and can obtain 100 percentage results on each of those metrics but around 30 percentage on the validation set (refer to the kaggle notebook's last section). In order to obtain good metrics for the validation dataset, I think some tricks to prevent overfitting can help. Were you getting similar results?

Regards,
Akarsh

HoomanKhosravi · 2022-12-12T20:10:33Z

@uakarsh sorry for the late reply. yes I get similar results, however this is not using any pre-training. My original issue was that I couldn't load checkpoints for pre-training. have you tried using the pre-training?

uakarsh · 2022-12-13T02:23:44Z

Hi, nope, I haven't tried to pre-train and then fine-tune it, but I am planning to do it in a few days and then observe the results. I guess, I have been able to write scripts for pre-training, so I would get back here, as soon as possible

Regards,
Akarsh

riteshKumarUMass · 2022-12-19T16:08:18Z

Hi,
I am also facing the exact same issue where model is overfitting on the training dataset. Did you have any luck?

uakarsh · 2022-12-19T16:19:02Z

I guess pre-training would help, I have managed to prepare a small data for pre-training. It is actually IDL Dataset (https://github.com/furkanbiten/idl_data) and in some time I am about to train and get the results.

uakarsh · 2022-12-19T17:34:01Z

I am looking at the approach in this way, if the model can overfit that means, the implementation is going in the right direction. Now, the idea of pre-training came, because in the paper as well, the authors mentioned about pre-training on a large data gave them good results. So, that was the reason, pre-training came in mind

uakarsh · 2022-12-19T17:34:55Z

Although, I didn't get, when you say "there is no attempt to save the embedding features or visual features"

riteshKumarUMass · 2022-12-19T17:56:50Z

My bad. I misunderstood your earlier comment.
If you see any improvement with the pertaining, would you be able to share the pertained weights?

uakarsh · 2022-12-20T02:02:54Z

For sure

riteshKumarUMass · 2022-12-23T16:01:47Z

@uakarsh , any luck with the pre-training?

uakarsh · 2022-12-24T02:33:58Z

Not till now (since busy with one of my research work)

uakarsh · 2023-02-13T07:37:18Z

Hi @riteshKumarUMass @HoomanKhosravi, can you guys again try the fine-tuning using the pre-trained weights (I have attached them in the readme)

riteshKumarUMass mentioned this issue Dec 22, 2022

Inference for token classification. #49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using pre-trained models #46

Using pre-trained models #46

HoomanKhosravi commented Nov 16, 2022

uakarsh commented Nov 16, 2022 •

edited

HoomanKhosravi commented Nov 16, 2022

uakarsh commented Nov 16, 2022

HoomanKhosravi commented Nov 16, 2022

uakarsh commented Nov 16, 2022

uakarsh commented Nov 24, 2022

HoomanKhosravi commented Dec 12, 2022

uakarsh commented Dec 13, 2022

riteshKumarUMass commented Dec 19, 2022

uakarsh commented Dec 19, 2022

uakarsh commented Dec 19, 2022

uakarsh commented Dec 19, 2022

riteshKumarUMass commented Dec 19, 2022

uakarsh commented Dec 20, 2022

riteshKumarUMass commented Dec 23, 2022

uakarsh commented Dec 24, 2022

uakarsh commented Feb 13, 2023

Using pre-trained models #46

Using pre-trained models #46

Comments

HoomanKhosravi commented Nov 16, 2022

uakarsh commented Nov 16, 2022 • edited

HoomanKhosravi commented Nov 16, 2022

uakarsh commented Nov 16, 2022

HoomanKhosravi commented Nov 16, 2022

uakarsh commented Nov 16, 2022

uakarsh commented Nov 24, 2022

HoomanKhosravi commented Dec 12, 2022

uakarsh commented Dec 13, 2022

riteshKumarUMass commented Dec 19, 2022

uakarsh commented Dec 19, 2022

uakarsh commented Dec 19, 2022

uakarsh commented Dec 19, 2022

riteshKumarUMass commented Dec 19, 2022

uakarsh commented Dec 20, 2022

riteshKumarUMass commented Dec 23, 2022

uakarsh commented Dec 24, 2022

uakarsh commented Feb 13, 2023

uakarsh commented Nov 16, 2022 •

edited