Some problem while fine-tuing on Paraphrase Dataset #6

Ricardokevins · 2021-12-03T09:12:35Z

I'm sorry to trouble you that I am interesting in paraphrase-fine tuing part.
I use the concise code you provide in train/bart.py.
However when i use the command “ python bart.py" , I encounter OOM problem ( On single V100-32GB )
I reread your paper, and notice that you can even use batchsize=20 to fine-tuing one epoch within a hour ( impressive speed~).

And i check the train/args.py in this repo, the batchsize is 8 and Max-epoch is 3 which is not consistency with the setting in paper (batch = 20,epoch =1).

Can you release the setting/code you used in fine-tuing? I want to check whats the problem for OOV ( maybe the larger max_length? or larger batchsize? Currently i cut TrainBatchSize to 4 (EvalBatchSize to 2) and the model fine-tuned very very very slowly....

Or can you give me some reproducing advice ? I am not sure lower batchsize or shoter sequence length could achieve same precision with paper's result.

Thank you vey much and any suggestions are greatly appreciated.
Sorry to trouble you QAQ

yyy-Apple · 2021-12-03T17:16:13Z

Hi~
Thanks for your interest.

Actually, we didn't use the training script we provide here since we only have 4 small GPUs (11G each). So we used model parallel to shard the model into 4 GPUs during training. I think one V100-32G is enough to train BART with small micro batchsize (maybe 1 or 2), you can set the gradient accumulation step to adjust the effective batchsize (= gradient accumulation step * micro batchsize).

For reproducing, you should mostly follow the setting in our paper. And also, we only fine-tuned on a subset of ParaBank, which contains 30000 data.

Ricardokevins · 2021-12-04T12:01:30Z

Hi~ Thanks for your interest.

Actually, we didn't use the training script we provide here since we only have 4 small GPUs (11G each). So we used model parallel to shard the model into 4 GPUs during training. I think one V100-32G is enough to train BART with small micro batchsize (maybe 1 or 2), you can set the gradient accumulation step to adjust the effective batchsize (= gradient accumulation step * micro batchsize).

For reproducing, you should mostly follow the setting in our paper. And also, we only fine-tuned on a subset of ParaBank, which contains 30000 data.

Thank you for your reply ~
I will try more setting to get the result :D

ZacharyChenpk · 2022-01-03T06:52:52Z

Hi~ Thanks for your interest.

Actually, we didn't use the training script we provide here since we only have 4 small GPUs (11G each). So we used model parallel to shard the model into 4 GPUs during training. I think one V100-32G is enough to train BART with small micro batchsize (maybe 1 or 2), you can set the gradient accumulation step to adjust the effective batchsize (= gradient accumulation step * micro batchsize).

For reproducing, you should mostly follow the setting in our paper. And also, we only fine-tuned on a subset of ParaBank, which contains 30000 data.

Hello!
I'm interested in the fine-tuning part of your works too. Where can I get the data files (seem to be data/parabank2.json and data/eval.json) to reproduce the results in the paper?

Ricardokevins · 2022-02-22T02:27:14Z

Hi~ Thanks for your interest.
Actually, we didn't use the training script we provide here since we only have 4 small GPUs (11G each). So we used model parallel to shard the model into 4 GPUs during training. I think one V100-32G is enough to train BART with small micro batchsize (maybe 1 or 2), you can set the gradient accumulation step to adjust the effective batchsize (= gradient accumulation step * micro batchsize).
For reproducing, you should mostly follow the setting in our paper. And also, we only fine-tuned on a subset of ParaBank, which contains 30000 data.

Hello! I'm interested in the fine-tuning part of your works too. Where can I get the data files (seem to be data/parabank2.json and data/eval.json) to reproduce the results in the paper?

Hi~ after such a long time, do you sucessfully reproduce the results in paper?

ZacharyChenpk · 2022-02-22T08:52:52Z

Hi~ Thanks for your interest.
Actually, we didn't use the training script we provide here since we only have 4 small GPUs (11G each). So we used model parallel to shard the model into 4 GPUs during training. I think one V100-32G is enough to train BART with small micro batchsize (maybe 1 or 2), you can set the gradient accumulation step to adjust the effective batchsize (= gradient accumulation step * micro batchsize).
For reproducing, you should mostly follow the setting in our paper. And also, we only fine-tuned on a subset of ParaBank, which contains 30000 data.

Hello! I'm interested in the fine-tuning part of your works too. Where can I get the data files (seem to be data/parabank2.json and data/eval.json) to reproduce the results in the paper?

Hi~ after such a long time, do you sucessfully reproduce the results in paper?

I have reproduced the evaluation results of released trained model, like what analysis.ipynb do. But I cannot get access to the training data files , let alone reproduce the training process :-(

Ricardokevins · 2022-02-22T11:53:58Z

Hi~ Thanks for your interest.
Actually, we didn't use the training script we provide here since we only have 4 small GPUs (11G each). So we used model parallel to shard the model into 4 GPUs during training. I think one V100-32G is enough to train BART with small micro batchsize (maybe 1 or 2), you can set the gradient accumulation step to adjust the effective batchsize (= gradient accumulation step * micro batchsize).
For reproducing, you should mostly follow the setting in our paper. And also, we only fine-tuned on a subset of ParaBank, which contains 30000 data.

Hello! I'm interested in the fine-tuning part of your works too. Where can I get the data files (seem to be data/parabank2.json and data/eval.json) to reproduce the results in the paper?

Hi~ after such a long time, do you sucessfully reproduce the results in paper?

I have reproduced the evaluation results of released trained model, like what analysis.ipynb do. But I cannot get access to the training data files , let alone reproduce the training process :-(

same situation like you.
Do you try to fine-tuing the model in your own data? and observe the Improved result?

yyy-Apple · 2022-03-01T16:36:24Z

Sorry for not paying attention to this closed issue. We have added our training script inside the train folder, as well as instructions for preparing the data.

Ricardokevins · 2022-03-03T14:10:36Z

Sorry for not paying attention to this closed issue. We have added our training script inside the train folder, as well as instructions for preparing the data.

Thanks a lot !
I try the code and training scripts.
I notice that because of the limited GPU , You define the SharedBART and put Model in different GPU.
That would lead to ERROR while load_state_dict In bart_score.py
( Common Load is used to load BartForConditionalGeneration while fine-tuing Script Save the whole SharedBART )

I try to fix this error by the following code
But I encounter another Error: "decode_attention_mask is None"

'''
from bart_utils import ShardedBART
self.bart = ShardedBART(self.checkpoint)
self.bart.load_state_dict(torch.load('xxxxxxxxx/bart_3000.pth', map_location=self.device))
self.model = self.bart
'''

Is there any convenient and feasible way to use the fine-tuned model ?
( fine-tuing script can be used smoothly And Thanks for open source !!!!! )

Ricardokevins · 2022-03-03T15:11:35Z

Well i think you maybe should modify the save_model in bart.py

I modify the code and solve the problem

def save_model(self, path):
  torch.save(self.bart.state_dict(), path)
  print(f'Model saved in {path}.')

def load_model(self, path):
  self.bart.load_state_dict(torch.load(path, map_location=self.device))
  print(f'Model {path} loaded.')

After

def save_model(self, path):
    torch.save(self.bart.model.state_dict(), path)
    print(f'Model saved in {path}.')

def load_model(self, path):
    self.bart.model.load_state_dict(torch.load(path, map_location=self.device))
    print(f'Model {path} loaded.')

Ricardokevins closed this as completed Dec 4, 2021

Ricardokevins reopened this Mar 3, 2022

yyy-Apple closed this as completed Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some problem while fine-tuing on Paraphrase Dataset #6

Some problem while fine-tuing on Paraphrase Dataset #6

Ricardokevins commented Dec 3, 2021

yyy-Apple commented Dec 3, 2021

Ricardokevins commented Dec 4, 2021

ZacharyChenpk commented Jan 3, 2022

Ricardokevins commented Feb 22, 2022

ZacharyChenpk commented Feb 22, 2022

Ricardokevins commented Feb 22, 2022 •

edited

yyy-Apple commented Mar 1, 2022

Ricardokevins commented Mar 3, 2022 •

edited

Ricardokevins commented Mar 3, 2022 •

edited

Some problem while fine-tuing on Paraphrase Dataset #6

Some problem while fine-tuing on Paraphrase Dataset #6

Comments

Ricardokevins commented Dec 3, 2021

yyy-Apple commented Dec 3, 2021

Ricardokevins commented Dec 4, 2021

ZacharyChenpk commented Jan 3, 2022

Ricardokevins commented Feb 22, 2022

ZacharyChenpk commented Feb 22, 2022

Ricardokevins commented Feb 22, 2022 • edited

yyy-Apple commented Mar 1, 2022

Ricardokevins commented Mar 3, 2022 • edited

Ricardokevins commented Mar 3, 2022 • edited

Ricardokevins commented Feb 22, 2022 •

edited

Ricardokevins commented Mar 3, 2022 •

edited

Ricardokevins commented Mar 3, 2022 •

edited