Skip to content

Conversation

TnR2
Copy link
Contributor

@TnR2 TnR2 commented Sep 23, 2025

Description:

Fix IndexError when processing empty strings after event removal in rich_transcription_postprocess(s).

Problem:

After removing the event character from s_list[i] when get_event(s_list[i]) == cur_ent_event, the string could become empty ''. This empty string was then passed to get_event(s) function, which attempts to access s[0] causing an IndexError: string index out of range.

Example:

When rich_transcription_postprocess(s) input:

<|lang|><|EMO_UNKNOWN|><|Event_UNK|><|woitn|> <|lang|><|EMO_UNKNOWN|><|BGM|><|woitn|>语音内容1 <|lang|><|EMO_UNKNOWN|><|BGM|><|woitn|> <|lang|><|EMO_UNKNOWN|><|BGM|><|woitn|>语音内容2

When for i in range(1, len(s_list)) meets i==2, s_list[2] is '🎼' (BGM event), after event removal: s_list[2] = s_list[2][1:], s_list[2] becomes '' and then is passed to get_event(s[i]), which causes IndexError.

Solution:

Added string check after event removal.

…sing (def rich_transcription_postprocess(s))
Copy link
Contributor

Summary of Changes

Hello @TnR2, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical IndexError within the rich_transcription_postprocess function, which previously caused crashes when certain event characters were stripped from strings, inadvertently creating empty strings. These empty strings would then lead to an out-of-range access in a subsequent function call. The implemented solution introduces a robust check to ensure that such empty strings are properly handled, thereby preventing the error and significantly improving the stability of the transcription processing pipeline.

Highlights

  • Bug Fix: Resolves an IndexError that occurred in the rich_transcription_postprocess function when processing strings after event removal.
  • Root Cause: The error was triggered when removing an event character from a string resulted in an empty string, which was then passed to get_event(s) causing an IndexError due to attempting to access s[0].
  • Solution: A check for empty strings (len(s_list[i]) == 0) has been added immediately after event character removal to prevent the IndexError.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@TnR2 TnR2 changed the title fix: avoid IndexError by handling empty strings after event removal in rich transcription processing (rich_transcription_postprocess(s)) fix: avoid IndexError by handling empty strings after event removal in rich transcription processing Sep 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a potential IndexError in rich_transcription_postprocess by adding a check for empty strings. The fix is effective. My review includes a suggestion to refactor the related code for improved readability and efficiency by avoiding redundant function calls.

Comment on lines 415 to 420
if get_event(s_list[i]) == cur_ent_event and get_event(s_list[i]) != None:
s_list[i] = s_list[i][1:]
if len(s_list[i]) == 0:
continue
# else:
cur_ent_event = get_event(s_list[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the added check correctly prevents the IndexError, this block of code can be refactored for better readability and efficiency.

Currently, get_event(s_list[i]) is called multiple times. You can store its result in a variable to avoid these redundant calls. This also allows for restructuring the logic to make the update to cur_ent_event clearer and more efficient.

Suggested change
if get_event(s_list[i]) == cur_ent_event and get_event(s_list[i]) != None:
s_list[i] = s_list[i][1:]
if len(s_list[i]) == 0:
continue
# else:
cur_ent_event = get_event(s_list[i])
event = get_event(s_list[i])
if event is not None and event == cur_ent_event:
s_list[i] = s_list[i][1:]
if not s_list[i]:
continue
cur_ent_event = get_event(s_list[i])
else:
cur_ent_event = event

@LauraGPT LauraGPT merged commit 528f92f into modelscope:main Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants