Skip to content

Reproduce Evaluation Results #4

@Kosei1227

Description

@Kosei1227

Hi,

I am trying to reproduce the evaluation results, specifically for:

  • Table 3: Headline generation results for the three baseline models trained in the five data settings: English only (en), Hindi only (hi), Latin transliterated data (latin), Devanagari transliterated data (dvn.), and original script data (all). Only ROUGE-L scores are shown here. (See Appendix A.1 for more details.)
  • Table 13: Zero-shot performance of the best mT5 and Varta-T5 models on XL-Sum headline generation and abstractive summarization.

However, I am consistently getting slightly worse results compared to those reported in the paper, despite attempting to reflect the inference and evaluation specifications as stated.

Could you please release the code pipeline used to produce the results in the paper?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions