feat: per-item HM IoU and quality_percentile in evaluation#23
feat: per-item HM IoU and quality_percentile in evaluation#23ziv-lazarov-nagish merged 1 commit intonagishfrom
Conversation
AmitMY
left a comment
There was a problem hiding this comment.
i'm confused by quality_percentile
| help="drop predicted segments shorter than this many frames (0=off)") | ||
| parser.add_argument("--merge_gap", type=int, default=0, | ||
| help="merge predicted segments separated by ≤ this many frames (0=off)") | ||
| parser.add_argument("--quality_percentile", type=float, default=1.0, |
There was a problem hiding this comment.
used in build_datasets when passing eval_args (acts like the parser variable).
There was a problem hiding this comment.
can you link it? i can't see it used - it is added in this PR, but not used in this PR?
There was a problem hiding this comment.
on second thought, i can read the quality_percentile that was used in training from the splits_manifest.json file (it's being written when manifest is created) in the checkpoint's directory and remove the argument, but that prevents us from using a different quality_percentile in evaluation. what do you think?
| help="drop predicted segments shorter than this many frames (0=off)") | ||
| parser.add_argument("--merge_gap", type=int, default=0, | ||
| help="merge predicted segments separated by ≤ this many frames (0=off)") | ||
| parser.add_argument("--quality_percentile", type=float, default=1.0, |
There was a problem hiding this comment.
can you link it? i can't see it used - it is added in this PR, but not used in this PR?
- compute HM IoU per batch item (matching training validation_step) instead of HM of averaged IoUs - add --quality_percentile arg for platform dataset filtering
fcec4f8 to
8e2a358
Compare
| help="drop predicted segments shorter than this many frames (0=off)") | ||
| parser.add_argument("--merge_gap", type=int, default=0, | ||
| help="merge predicted segments separated by ≤ this many frames (0=off)") | ||
| parser.add_argument("--quality_percentile", type=float, default=1.0, |
…_model evaluate_model now returns hm_IoU computed per item (nagish PR #23). Wrapping it with _add_hm_iou would overwrite that with the less accurate average-of-averages metric.
…_model evaluate_model now returns hm_IoU computed per item (nagish PR #23). Wrapping it with _add_hm_iou would overwrite that with the less accurate average-of-averages metric.
Summary
validation_stepcalculation. Previously HM was computed as harmonic mean of the already-averaged sign/sentence IoUs.--quality_percentileCLI arg to filter platform dataset videos by quality score during evaluation.Files changed
sign_language_segmentation/evaluate.py— per-item HM IoU, quality_percentile argTest plan
ruff check .— all checks passedpytest— 61 passedpython -m sign_language_segmentation.evaluate --checkpoint ... --datasets dgs --split test --device cuda— HM IoU prints correctlypython -m sign_language_segmentation.evaluate --checkpoint ... --datasets platform --split test --device cuda --quality_percentile 0.8— quality filtering works