New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use fine-tuned BART for prediction? #3853
Comments
Facing a similar type of issue for T5. @sshleifer |
The last ckpt file should be loaded into a There is a bug on master that messes up the loading, but it's fixed in #3866 To use that code immediately, you can run:
then your same Would love to know if that works! cc: @ethanjperez. |
Change is on master, let me know if this solves the problem! |
Config.json is still not generated while training. |
def log_hyperparams(model: pl.LightningModule):
model.config.save_pretrained(model.hparams.output_dir)
with open(os.path.join(model.hparams.output_dir, "hparam.json")) as f:
json.dump(model.hparams, f) You can call this somewhere in your code, if that's helpful. |
@sshleifer, thank you - I can run ./run_train.sh with the --predict() option successfully. Regarding my original question, could you please specify how to load the checkpoint into the LighteningModule? After inspecting transformer_base.py, I think hparams is equivalent to the arguments provided in run_train.sh, so a separate hparams.json file does not need to be generated. Please correct me if I'm wrong. I am receiving the following error with my current code:
I've been using the following code, based on the discussion in Lightning-AI/pytorch-lightning#525 and https://pytorch-lightning.readthedocs.io/en/latest/weights_loading.html:
|
Seems close to correct. transformers/examples/summarization/bart/finetune.py Lines 164 to 175 in 7d40901
is how we do it @riacheruvu |
|
@sshleifer, thank you. I've got to the point where I can load the model and generate "outputs" using the forward() function, but I can't decode the outputs - using tokenizer.decoder() results in an error. Should I be using model.generate() instead of model.forward()? If so, it seems SummarizationTrainer does not support model.generate? Revised code:
The error I'm encountering
|
I found a solution. The model.generate() function is necessary to extract the predictions. I defined a separate function in the SummarizationTrainer() class to use self.model.generate(), and was able to use tokenizer.decoder() on the outputs. I was encountering issues when using self.tokenizer, so I assume using 'bart-large-cnn' tokenizer for similar custom summarization datasets is okay. @prabalbansal, I'm not sure if the same method will apply to T5, but it could work for predicting for a single instance, per one of your questions. My code is below:
Thank you for the help, @sshleifer ! |
@riacheruvu Thank You. It works for T5 also. |
I followed the steps given in this thread and am still facing an issue. I get an error saying the below when I try to use my fine-tuned model for prediction. OSError: Can't load '/home/bart/bart_1/checkpointepoch=3.ckpt'. Make sure that:
|
@sangeethabal15, with my model, files were only generated up till the 2nd epoch. Just to confirm, do you have a checkpointepoch=3.ckpt file? Are you using the load_from_checkpoint() function? |
@riacheruvu yes I do have checkpoint=3.ckpt file. I gave my own number of epochs instead of the default 3. Yes I am using the load_from_checkpoint() function |
Ok. Could you share your code here, @sangeethabal15? It might be easier to help debug. |
@riacheruvu This is my modified code -
|
Thank you, @sangeethabal15. From the error message you posted earlier, it seems load_from_checkpoint() is expecting a config.json file in the specified directory. I have a few more debug questions:
|
|
@sangeethabal15 Could you add this at the end of transformer_base.py. This works for me. |
@prabalbansal this is for when I am training my model. Since I have already fine-tuned my model, is there any workaround for test time when I am trying to predict my outputs? |
@riacheruvu I am currently working on a Text Summarization problem. I have collected a small dataset of my own. Implementing BART is very easy. I can generate a great summary. But I want to know how to how to use BART model for training my own custom dataset. Can you please kindly help me with this? I have browsed through internet. But I cannot any find any helpful resources as it is relatively new compared to other Transfer learning models. |
@murugeshmanthiramoorthi you can just use run_train.sh in the bart folder where you give in your parameters to run the fiinetune.py file |
@sangeethabal15 Thank you so much for your reply mam. I am completely new to transfer learning mam. I can't get what you are upto. Can you kindly explain more elaborately or share a resource so that I can follow up? |
@riacheruvu Thank you so much for your help. But when I proceeded with those steps, I get the error Traceback (most recent call last): Do you have any idea solving this issue. |
@murugeshmanthiramoorthi Follow the below steps and you should be able to run your code. Important To run the latest versions of the examples, you have to install from source and install some specific requirements for the examples. Execute the following steps in a new virtual environment: git clone https://github.com/huggingface/transformers You can find the above in the readme section of https://github.com/huggingface/transformers/tree/cbbb3c43c55d2d93a156fc80bd12f31ecbac8520/examples |
@murugeshmanthiramoorthi, I agree with @sangeethabal15, I followed the same steps as well. After installing the dependencies, the code should run without errors about transformer_base - I believe the following line in run_train.sh ensures that:
|
@sshleifer @riacheruvu I keep running into an error every time I change the beam size, define min_length, skip_ngram, length_penalty during decoding time. Here is a snippet of the error
The function where I have defined all of this
Any idea how to go about this? |
@sangeethabal15, I have two ideas: Try explicitly setting use_cache=True in the generate() function to see if it resolves the error. If that does not work, could you try specifying the attention_mask parameter? I'm looking at modeling_utils.py and modeling_bart.py, and I think these are the two parameters that are linked to this issue. Edit: It also seems evaluate_cnn.py demonstrates a similar configuration for the generate() function, although the parameters are slightly different. If the two ideas above don't work, you could try using specifying those parameters to confirm it's not an issue with the values of the parameters that were chosen. |
Thank you so much @sangeethabal15 @riacheruvu I got it. Thanks a ton for your help. |
@sshleifer when I use the exact same parameters as in the evaluate_cnn.py code` I still get the exact same error as below. There seems to be an issue with the values chosen for these parameters specified in evaluate_cnn.py @riacheruvu I have tried the parameters you specified, same issue.
|
Try passing |
@sshleifer use_cache by default is set to true in the modeling_utils.py. But when I specify the parameter in my function and run the code it throws the following error
|
This isn't enough information for me to diagnose. My guess with the limited info I have is that you didn't run What does |
@sshleifer I did run pip install -e . Here is the output of pip freeze | grep transformers
|
Ok, output should look like git pull
pip install -e . You should probably also upgrade pip, though that shouldn't matter much. |
@riacheruvu hello , do you get <extra_id_0> in your generation output ? |
@ArijRB, hi - I don’t remember seeing that in the output of the model. |
@ArijRB I'm also getting |
@riacheruvu How did you load the model in the line 'model.load_from_checkpoint(checkpoints[-1])' of the following code you posted?
Is 'model' an instance of pl.LightningModule? I still have the error message that you got in the previous post:
|
@claudiatin, model should be defined as an instance of the Summarization trainer class. You will need to have the following code (which is already under main() in fine tune.py):
I am wondering if there is an easier way to go about generating the predictions though. I’ve tried calling the Summarization trainer from another python file so I can separate my prediction and training code files, but ran into some issues, so I needed to stick with using another version of finetune.py running with a clone of the repo. If anyone finds an easier way of accomplishing this or if the HuggingFace team can build this functionality in, that would be great. |
@riacheruvu Thank you so much for your answer. I did the same you did, and then I save the .bin file and config.json so I can use 'BartForConditionalGeneration.from_pretrained'. I don't know if it is the best way actually.
|
@claudiatin, thank you! Edit: Please ignore my previous response to your newest reply. I just went through the code again, and I was wrong about the inputs to the from_pretrained() function. I apologize for that. I’ll try using the code block you provided! |
I tried applying the code provided for T5 (I haven't tried it with BART, but I think it'll work successfully per @claudiatin's response) - I am including the results here for documentation and if anyone knows the solution:
I run into the error:
I've tried importing T5WithLMHeadModel using from transformers import T5WithLMHeadModel and encounter an |
@riacheruvu, don't worry about the previous answer. For the sake of completeness 'bart_sum' is just the default name of the folder where the checkpoints are saved (the line
In another notebook
The code works but the performances are not good. I think this is because of my dataset:) |
Thank you, @claudiatin, and thank you for sharing your code! |
@claudiatin thanks for providing your code. I was able to load a finetuned version of facebook/bart-large-cnn into a pipeline using a far hackier way originally as well as your method. Problem I'm running into which it sounds like maybe you were as well, is that the predictions from the pipeline after finetuning come out as pure gibberish, so something is being lost in translation. Example below:
I used the finetune.py script on the cnn tiny dataset found from the tiny version of the bash script in the examples folder. I even attempted to do this finetuning with nearly 0 (1e-10) learning rate, so that I knew I wasn't significantly changing the model. This still lead to gibberish predictions. I tried a version where I loaded the pretrained model into the pipeline, saved it using pipeline.model.save_pretrained("path/to/dir") and in a new session, reloaded it using the second portion of the code provided by @claudiatin plus This worked correctly on predictions, however I did notice a significant change in inference time on the same article I tested (~3 seconds vs ~20 seconds). The only difference I could see vs using the config.json and pytorch_model.bin that came out of save_pretrained() vs the finetune.py checkpoint is that the save_pretrained() config.json contains the added key:value @sshleifer , any ideas? |
@gmlander, yes I have the same gibberish issue. It's not clear to me how to solve it. It would be nice to know that |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi @riacheruvu , I am facing a similar issue while tokenizing a piece of text in the QAGS repo. Line number 133 in https://github.com/W4ngatang/qags/blob/master/qg_utils.py gives me the same error which is due to
|
Hi @mriganktiwari, in my case, I needed to use You could consider first using |
❓ Questions & Help
Details
I fine-tuned the BART model on a custom summarization dataset using the transformers/examples/summarization/bart/finetune.py and transformers/examples/summarization/bart/run_train.sh files in the repository for training (which generated three checkpointepoch=*.ckpt files) and prediction (which generated a .txt file with the test loss scores).
I have two questions on using this model for prediction:
How can I modify finetune.py to generate predictions for the test set, in addition to the loss scores? I see some test functions in finetune.py, but I'm not sure how to use these for generating a .txt file with the predictions.
How can I load the generated .ckpt files into BartForConditionalGeneration()? A config.json file was not generated along with the checkpoint files; there doesn't seem to be a TFBartForConditionalGeneration; and the convert_tf_checkpoint_to_pytorch.py script in the repo doesn't seem to support BART yet.
Thank you for your time!
The text was updated successfully, but these errors were encountered: