Update LlamaVideoSummarization to use TIMELINE_CHECK_ACCEPTABLE_THRESHOLD #409

regexer · 2025-09-25T18:52:46Z

Issues:

Related PRs:

Check LlamaVideoSummarization timestamps #400 (OBE; addressed by this PR.)

This change is

jrobble

@jrobble reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @regexer)

python/LlamaVideoSummarization/plugin-files/descriptor/descriptor.json line 74 at r1 (raw file):
Update to:

timeline within the "desired" range of TIMELINE_CHECK_TARGET_THRESHOLD

python/LlamaVideoSummarization/tests/test_llama_video_summarization.py line 416 at r1 (raw file):

        component = LlamaVideoSummarizationComponent()

        drone_timeline_segment_1 = copy.deepcopy(DRONE_TIMELINE_SEGMENT_1)

Good catch on needing the copy as not to modify the original dicts.

python/LlamaVideoSummarization/tests/test_llama_video_summarization.py line 441 at r1 (raw file):

        self.assertEqual(mpf.DetectionError.DETECTION_FAILED, cm.exception.error_code)
        self.assertIn("Timeline event start time occurs too soon before segment start time. (179.9798 - 0.0) > 20.", str(cm.exception))

Why remove this check?

python/LlamaVideoSummarization/tests/test_llama_video_summarization.py line 449 at r1 (raw file):

        self.assertEqual(mpf.DetectionError.DETECTION_FAILED, cm.exception.error_code)
        self.assertIn("Timeline event end time occurs too late after segment stop time. (381.17 - 299.96633333333335) > 20.", str(cm.exception))

Why remove this check?

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 222 at r1 (raw file):

                         f'abs({segment_start_time} - {min_event_start}) > {target_threshold}.'))
                if abs(segment_start_time - min_event_start) > accept_threshold:
                    acceptable_checks['after_seg_start'] = False

A better name than "after_seg_start" would be "near_seg_start" since we're comparing against an absolute value.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 231 at r1 (raw file):

                            f'abs({max_event_end} - {segment_stop_time}) > {target_threshold}.'))
                if abs(max_event_end - segment_stop_time) > accept_threshold:
                    acceptable_checks['before_seg_stop'] = False

A better name than "before_seg_stop" would be "near_seg_stop since we're comparing against an absolute value.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 234 at r1 (raw file):

        if not hard_error:
            if list({v for v in acceptable_checks.values()}) == [True]:

According to AI this can be simplified using all(). See first example:

# Example dictionary with all truthy values
dict1 = {"key1": True, "key2": 1, "key3": "hello"}
result1 = all(dict1.values())
print(f"Are all values in dict1 truthy? {result1}")

# Example dictionary with at least one falsy value
dict2 = {"keyA": True, "keyB": 0, "keyC": "world"}
result2 = all(dict2.values())
print(f"Are all values in dict2 truthy? {result2}")

# Example dictionary with only falsy values
dict3 = {"keyX": False, "keyY": 0, "keyZ": ""}
result3 = all(dict3.values())
print(f"Are all values in dict3 truthy? {result3}")

# Example of an empty dictionary (returns True)
dict4 = {}
result4 = all(dict4.values())
print(f"Are all values in dict4 truthy? {result4}")

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 235 at r1 (raw file):

        if not hard_error:
            if list({v for v in acceptable_checks.values()}) == [True]:
                self._store_acceptable_response(response_json)

It's a little odd that this is stored as a class variable. Consider instead returning a second value from this method, a boolean that indicates if the response was acceptable or not. Use that to keep track of the acceptable response json in the calling method:

response_json = {}
acceptable_response_json = {}  # NEW
error = None
while max(attempts.values()) < max_attempts:

jrobble

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @regexer)

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 121 at r1 (raw file):

            acceptable_timeline = self._fetch_acceptable_response()
            if acceptable_timeline is not None:
                return acceptable_timeline

Log a warning here saying the target criteria couldn't be reached so falling back to acceptable criteria.

jrobble · 2025-09-26T17:31:40Z

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 121 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Log a warning here saying the target criteria couldn't be reached so falling back to acceptable criteria.

Actually, make that an info line.

regexer

@regexer reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @jrobble)

python/LlamaVideoSummarization/tests/test_llama_video_summarization.py line 441 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Why remove this check?

The new logic to handle acceptable thresholds, required soft errors (that fail through to the acceptable timelines), and hard errors, that fail always. With these changes in timeline validation, the previous test timeline entries were no longer caught by the previous expected error.

I was unable to create a test timeline that would fail to these specific errors, but I did not want to remove the error logic.

python/LlamaVideoSummarization/tests/test_llama_video_summarization.py line 449 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Why remove this check?

See above.

jrobble · 2025-09-26T18:55:59Z

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 207 at r1 (raw file):

            if (timestamp_end - segment_stop_time) > target_threshold:
                soft_error = (f'Timeline event end time occurs too late after segment stop time. '

After further discussion, this check may no longer be needed. It seems to be covered by:

"Max timeline event end time not close enough to segment stop time."

jrobble · 2025-09-26T18:56:00Z

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 200 at r1 (raw file):

            
            if (segment_start_time - timestamp_start) > target_threshold:
                soft_error = (f'Timeline event start time occurs too soon before segment start time. '

After further discussion, this check may no longer be needed. It seems to be covered by:

"Min timeline event start time not close enough to segment start time."

jrobble · 2025-09-26T18:57:26Z

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 200 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

After further discussion, this check may no longer be needed. It seems to be covered by:

"Min timeline event start time not close enough to segment start time."

The new code returns hard errors first. The old code could return soft errors first. Whenever this soft error would have occurred the hard error would have also occurred. It's just that the old code never got there.

jrobble · 2025-09-26T19:05:39Z

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 200 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

The new code returns hard errors first. The old code could return soft errors first. Whenever this soft error would have occurred the hard error would have also occurred. It's just that the old code never got there.

Scratch that. This isn't related to hard vs. soft. It has more to do with this code:

if len(minmax_errors) > 0:
            soft_error = minmax_errors.pop()

The minmax errors are returned over all other soft errors.

Still though, a minmax error should always occur if a "Timeline event start time occurs too soon before segment start time." is to occur.

jrobble · 2025-09-29T16:07:38Z

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 222 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

A better name than "after_seg_start" would be "near_seg_start" since we're comparing against an absolute value.

After further discussion, we don't need this check. "Min timeline event start time not close enough to segment start time."

jrobble · 2025-10-01T18:43:21Z

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 121 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Actually, make that an info line.

I pushed this change.

jrobble

@jrobble reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @regexer)

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 200 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Scratch that. This isn't related to hard vs. soft. It has more to do with this code:
if len(minmax_errors) > 0:
            soft_error = minmax_errors.pop()
The minmax errors are returned over all other soft errors.

Still though, a minmax error should always occur if a "Timeline event start time occurs too soon before segment start time." is to occur.

I addressed this in the changes I pushed.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 207 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

After further discussion, this check may no longer be needed. It seems to be covered by:

"Max timeline event end time not close enough to segment stop time."

I pushed this change.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 222 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

After further discussion, we don't need this check. "Min timeline event start time not close enough to segment start time."

I pushed this change.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 231 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

A better name than "before_seg_stop" would be "near_seg_stop since we're comparing against an absolute value.

I pushed this change.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 234 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

According to AI this can be simplified using all(). See first example:

# Example dictionary with all truthy values
dict1 = {"key1": True, "key2": 1, "key3": "hello"}
result1 = all(dict1.values())
print(f"Are all values in dict1 truthy? {result1}")

# Example dictionary with at least one falsy value
dict2 = {"keyA": True, "keyB": 0, "keyC": "world"}
result2 = all(dict2.values())
print(f"Are all values in dict2 truthy? {result2}")

# Example dictionary with only falsy values
dict3 = {"keyX": False, "keyY": 0, "keyZ": ""}
result3 = all(dict3.values())
print(f"Are all values in dict3 truthy? {result3}")

# Example of an empty dictionary (returns True)
dict4 = {}
result4 = all(dict4.values())
print(f"Are all values in dict4 truthy? {result4}")

I pushed this change.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 235 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

It's a little odd that this is stored as a class variable. Consider instead returning a second value from this method, a boolean that indicates if the response was acceptable or not. Use that to keep track of the acceptable response json in the calling method:
response_json = {}
acceptable_response_json = {}  # NEW
error = None
while max(attempts.values()) < max_attempts:

I pushed this change.

python/LlamaVideoSummarization/plugin-files/descriptor/descriptor.json line 74 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Update to:

timeline within the "desired" range of TIMELINE_CHECK_TARGET_THRESHOLD

I pushed this change.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 174 at r1 (raw file):

        # start with passing checks, then fail as secondary checks are conducted
        acceptable_checks = dict(

I changed the logic so these init to false and must be changed to true. That way we can avoid potential logic errors in our code where we don't check any of these and the default is to assume the response is acceptable.

python/LlamaVideoSummarization/llama_video_summarization_component/__init__.py line 384 at r1 (raw file):

def _get_timestamp_value(seconds: Any) -> float:
    if isinstance(seconds, str):
        if re.match(r"\s*\d+(\.\d*)?\s*[Ss]?", seconds):

This regex doesn't work for "7:12", which is used by one of the unit tests. The reason is because the "7" and "12" are each detected as a valid match. To get around this, I had to use the "^" and "$" symbols at the beginning and end of the regex to ensure it matches the entire string.

…HOLD (#409) * Validate timestamps. --------- Co-authored-by: jrobble <jrobble@mitre.org>

regexer and others added 12 commits May 22, 2025 15:21

add TIMELINE_CHECK_TARGET_THRESHOLD to descriptor.json

1b535ec

implement TIMELINE_CHECK_ACCEPTABLE_THRESHOLD feature

d5e6e5b

different failure states

dfed446

Validate timestamps.

ec6153d

Fix bug.

14f30ad

refactor tests

260ed4c

Merge branch 'master' into hf/llama-acceptable-threshold

d688fa1

raise exception on invalid timestamps

aa88502

Merge branch 'master' into bhuyck-wip/llama-video-timestamps

9e5170b

merge changes from hf/llama-video-timestamps

b1ab910

update test assertion

b930877

fix bug in timestamp check pattern

d588a08

regexer mentioned this pull request Sep 25, 2025

Check LlamaVideoSummarization timestamps #400

Closed

regexer requested a review from jrobble September 25, 2025 19:04

jrobble requested changes Sep 26, 2025

View reviewed changes

regexer commented Sep 26, 2025

View reviewed changes

jrobble added 5 commits September 29, 2025 12:15

Address PR comments.

69262eb

Remove self.acceptable_timeline.

807ad3e

Fix unit tests.

610e970

Ensure target threshold is set when acceptable threshold is set.

f73fe90

Change exception to warning.

20980e7

jrobble approved these changes Oct 1, 2025

View reviewed changes

jrobble merged commit 89d79b0 into master Oct 1, 2025
2 checks passed

jrobble deleted the hf/llama-acceptable-threshold branch October 1, 2025 21:34

jrobble mentioned this pull request Oct 1, 2025

Merge/llama acceptable threshold #410

Merged

hhuangMITRE pushed a commit that referenced this pull request Oct 14, 2025

Update LlamaVideoSummarization to use TIMELINE_CHECK_ACCEPTABLE_THRES…

0138e9b

…HOLD (#409) * Validate timestamps. --------- Co-authored-by: jrobble <jrobble@mitre.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update LlamaVideoSummarization to use TIMELINE_CHECK_ACCEPTABLE_THRESHOLD #409

Update LlamaVideoSummarization to use TIMELINE_CHECK_ACCEPTABLE_THRESHOLD #409

Uh oh!

regexer commented Sep 25, 2025 •

edited

Loading

Uh oh!

jrobble left a comment

Uh oh!

jrobble left a comment

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

regexer left a comment

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

jrobble commented Sep 29, 2025

Uh oh!

jrobble commented Oct 1, 2025

Uh oh!

jrobble left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update LlamaVideoSummarization to use TIMELINE_CHECK_ACCEPTABLE_THRESHOLD #409

Update LlamaVideoSummarization to use TIMELINE_CHECK_ACCEPTABLE_THRESHOLD #409

Uh oh!

Conversation

regexer commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrobble left a comment

Choose a reason for hiding this comment

Uh oh!

jrobble left a comment

Choose a reason for hiding this comment

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

regexer left a comment

Choose a reason for hiding this comment

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

jrobble commented Sep 26, 2025

Uh oh!

jrobble commented Sep 29, 2025

Uh oh!

jrobble commented Oct 1, 2025

Uh oh!

jrobble left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

regexer commented Sep 25, 2025 •

edited

Loading

jrobble left a comment •

edited

Loading