New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some problem while fine-tuing on Paraphrase Dataset #6
Comments
Hi~ Actually, we didn't use the training script we provide here since we only have 4 small GPUs (11G each). So we used model parallel to shard the model into 4 GPUs during training. I think one V100-32G is enough to train BART with small micro batchsize (maybe 1 or 2), you can set the gradient accumulation step to adjust the effective batchsize (= gradient accumulation step * micro batchsize). For reproducing, you should mostly follow the setting in our paper. And also, we only fine-tuned on a subset of ParaBank, which contains 30000 data. |
Thank you for your reply ~ |
Hello! |
Hi~ after such a long time, do you sucessfully reproduce the results in paper? |
I have reproduced the evaluation results of released trained model, like what analysis.ipynb do. But I cannot get access to the training data files , let alone reproduce the training process :-( |
same situation like you. |
Sorry for not paying attention to this closed issue. We have added our training script inside the |
Thanks a lot ! I try to fix this error by the following code ''' Is there any convenient and feasible way to use the fine-tuned model ? |
Well i think you maybe should modify the save_model in bart.py I modify the code and solve the problem Previous
After
|
I'm sorry to trouble you that I am interesting in paraphrase-fine tuing part.
I use the concise code you provide in train/bart.py.
However when i use the command “ python bart.py" , I encounter OOM problem ( On single V100-32GB )
I reread your paper, and notice that you can even use batchsize=20 to fine-tuing one epoch within a hour ( impressive speed~).
And i check the train/args.py in this repo, the batchsize is 8 and Max-epoch is 3 which is not consistency with the setting in paper (batch = 20,epoch =1).
Can you release the setting/code you used in fine-tuing? I want to check whats the problem for OOV ( maybe the larger max_length? or larger batchsize? Currently i cut TrainBatchSize to 4 (EvalBatchSize to 2) and the model fine-tuned very very very slowly....
Or can you give me some reproducing advice ? I am not sure lower batchsize or shoter sequence length could achieve same precision with paper's result.
Thank you vey much and any suggestions are greatly appreciated.
Sorry to trouble you QAQ
The text was updated successfully, but these errors were encountered: