New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROUGE scores calculated using pretrained model is too low #163
Comments
Hi, I'm not sure if this past issue is of any help. I have another issue with summarizing long text. Could you show the command line that you used? |
Hi! Thanks for mentioning that issue, it does seem like a similar one to mine. I did manage to fix the ROUGE problem at least for the extractive case. Right now I'm training the abstractive case again from scratch. I don't know what the problem was, but deleting the GitHub repo and cloning it again fixed it. The command that I'm using is simply the one provided in the README file: python train.py -task abs -mode train -bert_data_path BERT_DATA_PATH -dec_dropout 0.2 -model_path MODEL_PATH -sep_optim true -lr_bert 0.002 -lr_dec 0.2 -save_checkpoint_steps 2000 -batch_size 140 -train_steps 200000 -report_every 50 -accum_count 5 -use_bert_emb true -use_interval true -warmup_steps_bert 20000 -warmup_steps_dec 10000 -max_pos 512 -visible_gpus 0,1,2,3 -log_file ../logs/abs_bert_cnndm I'll update this Git Issue once I have the results for TransformerAbs and BertAbs. |
Thank you for the reply! The command line you gave is for training right? Thanks a lot! |
Ah yeah I copy and pasted the wrong command haha. And no I didn't use the pretrained model, I trained the extractive one from scratch on my own. When I used the pretrained model the performance was also super bad, so I'm going to try and see if training it from scratch helps. Not sure how long the training will take. Right now I'm training BertSumAbs and TransformerAbs. It'll probably be a few more hours. And no, I didn't use the dev branch. |
Performance for TransformerAbs isn't as reported (it's around 0.09, 0.002, and 0.08) but BertAbs is alright (0.40, 0.18, 0.37). I'm running TransformerAbs again, just to be sure that I did things properly. Please keep in mind that these models were trained by me and I'm not using the pretrained models provided. Edit The pretrained TransformerAbs model provides good performance. Not sure what the problem exactly is with the ones that I trained. I'll close this issue for now and open another one if I find out a specific issue. |
Hii.. actually I am also facing this issue. I also encountered low rouge values for my own dataset.
|
No, using my own dataset resulted in extremely low ROUGE results, similar to yours. I had to use data either provided or preprocessed according to the repository. If you don't mind me asking, where did you get your data from? |
I have some legal documents(not publically available dataset). |
Ah then this is a different case. I'm not that surprised if the model doesn't work for a dataset other than the CNN/DM. What I was saying is that I downloaded a CNN/DM dataset and preprocessed it accordingly, but for some reason the scores were too low, which is a bit strange to me. Just making sure, you trained the model on your documents, right? The problem with legal documents may be that there are too many out-of-vocabulary (OOV) words. A member at the lab I'm at tried to do something similar but the legalese was a bit difficult for conventional models to use. This isn't a problem if you have a lot of legal document data since you can just train your model accordingly, but this usually isn't the case since legal information is often confidential. |
Oh sorry, I misunderstood your question. |
I agree with that OOV problem, and few legal documents issue. |
You could always try to perform data augmentation on the data that you have. It'll take a lot of time and effort, but if it's something you really want to do it may be your only choice. Especially considering that the data is confidential and you probably can't outsource it. |
Thank You, I will look into this :)) |
@seanswyi Hi,When i training TransformerABS model,I have the same problem as you. Have you solved it?I used the same data and the same model setting as the paper report。when I load checkpoint from model_path/model_step_166000.pt. I got this result: 1 ROUGE-1 Average_R: 0.32458 (95%-conf.int. 0.31799 - 0.33093)
|
Not sure if anyone else has encountered this problem, but when I download the pretrained model and use it to evaluate the data, the scores that I get are abysmally low. It's something like:
Which is weird considering I used the same data and the same model. Anyone know what might be some causes? I've been trying to get this code to work properly for a while now and would appreciate any tips. Thanks.
The text was updated successfully, but these errors were encountered: