fix a small issue in OWSM decode_long #5703

jctian98 · 2024-03-15T03:10:13Z

Why?

The last time-stamp prediction in the last chunk is slightly smaller than the whole speech length, making to an extra chunk that usually has < 1000 samples. The decoding behavior of this very short chunk leads to unpredictable behavior.

Remove one redundant line.

codecov · 2024-03-15T03:25:31Z

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 76.60%. Comparing base (d004740) to head (d52dcb3).
Report is 63 commits behind head on master.

Files	Patch %	Lines
espnet2/bin/s2t_inference.py	0.00%	7 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #5703       +/-   ##
===========================================
+ Coverage   23.30%   76.60%   +53.30%     
===========================================
  Files         746      761       +15     
  Lines       69369    69880      +511     
===========================================
+ Hits        16163    53534    +37371     
+ Misses      53206    16346    -36860

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`62.92% <ø> (ø)`
test_integration_espnet2	`48.84% <0.00%> (?)`
test_integration_espnetez	`27.98% <ø> (?)`
test_python_espnet1	`18.20% <0.00%> (-0.12%)`	⬇️
test_python_espnet2	`52.41% <0.00%> (?)`
test_python_espnetez	`13.95% <0.00%> (?)`
test_utils	`20.91% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jctian98 · 2024-03-15T04:52:27Z

The Codecov fails since we don't have a test case for decode_long

Maybe it's hard to have a valid test case for decode_long, as it should properly predict the time stamps. However, for test cases, it cannot be easy. What do you think? @pyf98

sw005320 · 2024-03-15T11:15:43Z

OK, for the test code coverage.
If @pyf98 is OK for this change, I can merge it.

Where does 0.2 come from?
It would be safe to make it as a variable.

pyf98 · 2024-03-15T19:18:16Z

espnet2/bin/s2t_inference.py

@@ -575,6 +575,12 @@ def decode_long(
        text_prev = init_text
        while offset < len(speech):
            logging.info(f"Current start time in seconds: {offset / fs:.2f}")
+            if offset + segment_len > len(speech) and len(segment) / fs < 0.2:


Should this be placed after segment = speech[offset : offset + segment_len]? Otherwise, segment is the previous one.
Also, we can make 0.2 as an argument in this function.
I think we do not need offset + segment_len > len(speech). If segment is shorter than the segment length, this condition must be true.

This is a mistake during copy-paste. Thanks!

pyf98 · 2024-03-15T19:19:39Z

espnet2/bin/s2t_inference.py

@@ -575,6 +575,12 @@ def decode_long(
        text_prev = init_text
        while offset < len(speech):
            logging.info(f"Current start time in seconds: {offset / fs:.2f}")
+            if offset + segment_len > len(speech) and len(segment) / fs < 0.2:
+                logging.warning(
+                    f"Skip the last clip as it's too short: {len(segment)/ fs:.2f}s"


Keep the style consistent. Now there is a space after / but no space before it.

We can change "clip" to "segment" or "chunk"

pyf98 · 2024-03-15T19:24:57Z

The issue addressed here is:
When processing the last segment in a long audio, the end timestamp can be slightly shorter or longer than the actual speech length due to discretization (timestamps are multiples of 20ms).
If the prediction is longer, the decoding will terminate and there is no problem.
If the prediction is shorter, we need to decode one more segment which can be very short. We can skip it.

jctian98 · 2024-03-17T21:54:04Z

It should be okay for review again.

pyf98 · 2024-03-19T22:12:26Z

espnet2/bin/s2t_inference.py

            segment = speech[offset : offset + segment_len]
+            if (
+                offset + segment_len > len(speech)


Do we need this condition? This is equivalent to len(segment) < segment_len, where segment_len is much larger than our threshold in the next line. So, I think this condition will be satisfied automatically if the next condition is true.

That's true. Didn't notice that if len(segment) < segment_len, that's the last chunk.

pyf98 · 2024-03-19T22:15:10Z

espnet2/bin/s2t_inference.py

@@ -188,6 +188,7 @@ def __init__(
        lang_sym: str = "<eng>",
        task_sym: str = "<asr>",
        predict_time: bool = False,
+        skip_last_chunk_threshold: float = 0.2,


I was thinking adding this argument in decode_long only, since it is specific to long-form decoding but never used in short-form __call__. If we remove it from __init__, we can also remove the argument in inference and argparser.

pyf98 · 2024-03-19T22:16:59Z

Thanks @jctian98!
Sorry for asking more, but I think we can add the argument only in decode_long. This is more consistent with the design of other arguments.

jctian98 · 2024-03-20T01:37:06Z

fixed. Please review.

pyf98 · 2024-03-20T16:59:10Z

espnet2/bin/s2t_inference.py

@@ -531,6 +531,7 @@ def decode_long(
        end_time_threshold: str = "<29.00>",
        lang_sym: Optional[str] = None,
        task_sym: Optional[str] = None,
+        skip_last_chunk_threshold=0.2,


One final suggestion: let's add a type hint here: float = 0.2

I remember it exists in previous versions

pyf98 · 2024-03-21T14:47:48Z

LGTM

sw005320 · 2024-03-21T14:53:22Z

OK!
Does it come from this PR?
https://github.com/espnet/espnet/actions/runs/8364098601/job/22898621380?pr=5703#step:8:811

jctian98 · 2024-03-21T20:53:02Z

It fails on test-shell but this PR doesn't change shell scripts.
Can we re-run the CI test?

jctian98 · 2024-03-21T22:18:05Z

CI fails again due to a time-out error.

sw005320 · 2024-03-22T00:19:37Z

Thanks, @jctian98!

fix a small issue in OWSM decode_long

c351239

mergify bot added the ESPnet2 label Mar 15, 2024

sw005320 requested a review from pyf98 March 15, 2024 03:12

sw005320 added the OWSM Open Whisper-style Speech Model label Mar 15, 2024

sw005320 added this to the v.202405 milestone Mar 15, 2024

sw005320 added the Bugfix label Mar 15, 2024

pyf98 reviewed Mar 15, 2024

View reviewed changes

jctian98 added 2 commits March 15, 2024 16:54

format update

98b91f4

change clip to chunk

5cd368d

pyf98 reviewed Mar 19, 2024

View reviewed changes

jctian98 added 2 commits March 19, 2024 17:26

change parameter format

b70eed4

change parameter style2

6c2df0d

pyf98 reviewed Mar 20, 2024

View reviewed changes

update

d52dcb3

sw005320 merged commit dbd73dd into espnet:master Mar 22, 2024
34 of 35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix a small issue in OWSM decode_long #5703

fix a small issue in OWSM decode_long #5703

jctian98 commented Mar 15, 2024 •

edited

codecov bot commented Mar 15, 2024 •

edited

jctian98 commented Mar 15, 2024

sw005320 commented Mar 15, 2024

pyf98 Mar 15, 2024

jctian98 Mar 15, 2024

pyf98 Mar 15, 2024

pyf98 Mar 15, 2024

pyf98 commented Mar 15, 2024

jctian98 commented Mar 17, 2024

pyf98 Mar 19, 2024

jctian98 Mar 19, 2024

pyf98 Mar 19, 2024

jctian98 Mar 19, 2024

pyf98 commented Mar 19, 2024

jctian98 commented Mar 20, 2024

pyf98 Mar 20, 2024

pyf98 Mar 20, 2024

pyf98 commented Mar 21, 2024

sw005320 commented Mar 21, 2024

jctian98 commented Mar 21, 2024

jctian98 commented Mar 21, 2024

sw005320 commented Mar 22, 2024

fix a small issue in OWSM decode_long #5703

fix a small issue in OWSM decode_long #5703

Conversation

jctian98 commented Mar 15, 2024 • edited

Why?

codecov bot commented Mar 15, 2024 • edited

Codecov Report

jctian98 commented Mar 15, 2024

sw005320 commented Mar 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pyf98 commented Mar 15, 2024

jctian98 commented Mar 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pyf98 commented Mar 19, 2024

jctian98 commented Mar 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pyf98 commented Mar 21, 2024

sw005320 commented Mar 21, 2024

jctian98 commented Mar 21, 2024

jctian98 commented Mar 21, 2024

sw005320 commented Mar 22, 2024

jctian98 commented Mar 15, 2024 •

edited

codecov bot commented Mar 15, 2024 •

edited