Skip to content

feat: per-item HM IoU and quality_percentile in evaluation#23

Merged
ziv-lazarov-nagish merged 1 commit intonagishfrom
feat/per-video-eval
Apr 18, 2026
Merged

feat: per-item HM IoU and quality_percentile in evaluation#23
ziv-lazarov-nagish merged 1 commit intonagishfrom
feat/per-video-eval

Conversation

@ziv-lazarov-nagish
Copy link
Copy Markdown
Contributor

Summary

  • Compute HM IoU per batch item (average of per-item harmonic means), matching the training validation_step calculation. Previously HM was computed as harmonic mean of the already-averaged sign/sentence IoUs.
  • Add --quality_percentile CLI arg to filter platform dataset videos by quality score during evaluation.

Files changed

  • sign_language_segmentation/evaluate.py — per-item HM IoU, quality_percentile arg

Test plan

  • ruff check . — all checks passed
  • pytest — 61 passed
  • python -m sign_language_segmentation.evaluate --checkpoint ... --datasets dgs --split test --device cuda — HM IoU prints correctly
  • python -m sign_language_segmentation.evaluate --checkpoint ... --datasets platform --split test --device cuda --quality_percentile 0.8 — quality filtering works

Copy link
Copy Markdown
Contributor

@AmitMY AmitMY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm confused by quality_percentile

help="drop predicted segments shorter than this many frames (0=off)")
parser.add_argument("--merge_gap", type=int, default=0,
help="merge predicted segments separated by ≤ this many frames (0=off)")
parser.add_argument("--quality_percentile", type=float, default=1.0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used in build_datasets when passing eval_args (acts like the parser variable).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you link it? i can't see it used - it is added in this PR, but not used in this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is being used here when cls is AnnotationPlatformSegmentationDataset

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, i can read the quality_percentile that was used in training from the splits_manifest.json file (it's being written when manifest is created) in the checkpoint's directory and remove the argument, but that prevents us from using a different quality_percentile in evaluation. what do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need, now i get it

help="drop predicted segments shorter than this many frames (0=off)")
parser.add_argument("--merge_gap", type=int, default=0,
help="merge predicted segments separated by ≤ this many frames (0=off)")
parser.add_argument("--quality_percentile", type=float, default=1.0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you link it? i can't see it used - it is added in this PR, but not used in this PR?

- compute HM IoU per batch item (matching training validation_step)
  instead of HM of averaged IoUs
- add --quality_percentile arg for platform dataset filtering
help="drop predicted segments shorter than this many frames (0=off)")
parser.add_argument("--merge_gap", type=int, default=0,
help="merge predicted segments separated by ≤ this many frames (0=off)")
parser.add_argument("--quality_percentile", type=float, default=1.0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need, now i get it

@ziv-lazarov-nagish ziv-lazarov-nagish merged commit 85fb616 into nagish Apr 18, 2026
2 checks passed
ziv-lazarov-nagish added a commit that referenced this pull request Apr 18, 2026
…_model

evaluate_model now returns hm_IoU computed per item (nagish PR #23).
Wrapping it with _add_hm_iou would overwrite that with the less accurate
average-of-averages metric.
ziv-lazarov-nagish added a commit that referenced this pull request Apr 19, 2026
…_model

evaluate_model now returns hm_IoU computed per item (nagish PR #23).
Wrapping it with _add_hm_iou would overwrite that with the less accurate
average-of-averages metric.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants