Reproduce Evaluation Results

Hi,

I am trying to reproduce the evaluation results, specifically for:

- Table 3: Headline generation results for the three baseline models trained in the five data settings: English only (en), Hindi only (hi), Latin transliterated data (latin), Devanagari transliterated data (dvn.), and original script data (all). Only ROUGE-L scores are shown here. (See Appendix A.1 for more details.)
- Table 13: Zero-shot performance of the best mT5 and Varta-T5 models on XL-Sum headline generation and abstractive summarization.

However, I am consistently getting slightly worse results compared to those reported in the paper, despite attempting to reflect the inference and evaluation specifications as stated.

Could you please release the code pipeline used to produce the results in the paper?

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce Evaluation Results #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproduce Evaluation Results #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions