Skip to content

fix spbleu ram issue; loosen check for TestConfidenceFile#960

Merged
mshannon-sil merged 1 commit intomasterfrom
bug_fixes
Mar 6, 2026
Merged

fix spbleu ram issue; loosen check for TestConfidenceFile#960
mshannon-sil merged 1 commit intomasterfrom
bug_fixes

Conversation

@mshannon-sil
Copy link
Copy Markdown
Collaborator

@mshannon-sil mshannon-sil commented Mar 6, 2026

This PR fixes two bugs.

The first is #940, where the spbleu metric was reinitializing the Flores200 tokenizer on every iteration in the loop and consequently eating up all the RAM to the point of failure. I have moved intialization outside the loop to prevent that. The memory issue has gone away, and it has also had the side effect of drastically increasing the speed of the test step when spbleu is included.

The second is #947, where multilingual drafting was not compatible with saving confidence scores. This was only due to the fact that the code to check for a valid filename for a TestConfidenceFile was too restrictive and did not account for filenames where the lang id is inserted toward the beginning. I have loosened the check just to look for the presence of "trg-predictions" in the trg draft file name and confirmed multilingual drafting experiments now complete successfully.


This change is Reviewable

@mshannon-sil mshannon-sil self-assigned this Mar 6, 2026
@TaperChipmunk32
Copy link
Copy Markdown
Collaborator

silnlp/nmt/test.py line 267 at r1 (raw file):

    scorers = scorers.intersection(SUPPORTED_SENTENCE_SCORERS)
    other_scores = {k: v for k, v in other_scores.items() if k.lower() in scorers}
    other_verse_scores: Dict[str, float] = {}

Why was this removed? It seems to still be needed later in this function.

@mshannon-sil
Copy link
Copy Markdown
Collaborator Author

silnlp/nmt/test.py line 267 at r1 (raw file):

Previously, TaperChipmunk32 (Matthew Beech) wrote…

Why was this removed? It seems to still be needed later in this function.

That's because for some reason there was a duplicate definition. I removed this definition, and kept the one inside the loop some lines later in line 298.

Copy link
Copy Markdown
Collaborator

@TaperChipmunk32 TaperChipmunk32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@TaperChipmunk32 made 1 comment.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on mshannon-sil).

@TaperChipmunk32
Copy link
Copy Markdown
Collaborator

silnlp/nmt/test.py line 300 at r1 (raw file):

            other_verse_scores: Dict[str, float] = {}
            if "chrf3" in scorers:
                chrf3_verse_score = sacrebleu.sentence_chrf(

Why was this changed?

@mshannon-sil
Copy link
Copy Markdown
Collaborator Author

silnlp/nmt/test.py line 300 at r1 (raw file):

Previously, TaperChipmunk32 (Matthew Beech) wrote…

Why was this changed?

When I was making the fixes, I noticed that a couple scorers were still using corpus level metrics, even though these are sentence level scores. So I adjusted those scorers to use their sentence-level rather than corpus-level scorers.

Copy link
Copy Markdown
Collaborator

@TaperChipmunk32 TaperChipmunk32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@TaperChipmunk32 made 1 comment and resolved 2 discussions.
Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on mshannon-sil).

@mshannon-sil mshannon-sil merged commit 6503ead into master Mar 6, 2026
1 check was pending
@mshannon-sil mshannon-sil deleted the bug_fixes branch March 6, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

experiment.py: --save-confidences does not support multilingual drafting configurations High RAM usage during test step

2 participants