New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using pre-trained models #46
Comments
Not sure, how to go about it, because I tried just a few samples of documents. I think, there is something which saturates the entire training, would look for it, in a few days. |
thanks for the quick response! ok |
Can you let me know, what approach did you try? I guess, it would be helpful for me as well, to conduct the experiment |
first, I ran the MLM task and got the pre-trained model. In the token_cls task, after model init (docformer = DocFormer(config)), I used load_from_checkpoint to load the pre-trained model ( had to override on_load_checkpoint to skip copying layers with mismatch size, which were the weights and biases for linear layer). then started training the model. |
Got it. Thanks, and would update here, as soon as possible |
Hi @HoomanKhosravi, I did try to train the docformer on the FUNSD dataset from scratch. Here is the script, Kaggle Notebook, and here are the results Weight and Bias Report When we see the results, the model is overfitting the train data and can obtain 100 percentage results on each of those metrics but around 30 percentage on the validation set (refer to the kaggle notebook's last section). In order to obtain good metrics for the validation dataset, I think some tricks to prevent overfitting can help. Were you getting similar results? Regards, |
@uakarsh sorry for the late reply. yes I get similar results, however this is not using any pre-training. My original issue was that I couldn't load checkpoints for pre-training. have you tried using the pre-training? |
Hi, nope, I haven't tried to pre-train and then fine-tune it, but I am planning to do it in a few days and then observe the results. I guess, I have been able to write scripts for pre-training, so I would get back here, as soon as possible Regards, |
Hi, |
I guess pre-training would help, I have managed to prepare a small data for pre-training. It is actually IDL Dataset (https://github.com/furkanbiten/idl_data) and in some time I am about to train and get the results. |
I am looking at the approach in this way, if the model can overfit that means, the implementation is going in the right direction. Now, the idea of pre-training came, because in the paper as well, the authors mentioned about pre-training on a large data gave them good results. So, that was the reason, pre-training came in mind |
Although, I didn't get, when you say "there is no attempt to save the embedding features or visual features" |
My bad. I misunderstood your earlier comment. |
For sure |
@uakarsh , any luck with the pre-training? |
Not till now (since busy with one of my research work) |
Hi @riteshKumarUMass @HoomanKhosravi, can you guys again try the fine-tuning using the pre-trained weights (I have attached them in the readme) |
Hello, thank you for the great work.
I used this script to run the pre-training for MLM task: https://github.com/shabie/docformer/blob/master/examples/docformer_pl/3_Pre_training_DocFormer_Task_MLM_Task.ipynb
Afterwards, I used the resulting model in the token-classification task. ( using load_from_check_point which copies all the weight except the linear layer which has a different shape).
The problem is that no matter how much I run the pre-training, I always get the same metrics in the token-classification task (using that pre-trained model as a starting point).
I even tried the model from document-classification task as a base for token classification and I still the get same exact metrics as the results I was getting from using the MLM-pretrained task.
Any suggestion on how to properly use the pre-trained models?
The text was updated successfully, but these errors were encountered: