Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove LONG_SPEAKER_ID and instead only use SPEAKER_ID #1643

Closed
jrobble opened this issue Feb 10, 2023 · 1 comment
Closed

Remove LONG_SPEAKER_ID and instead only use SPEAKER_ID #1643

jrobble opened this issue Feb 10, 2023 · 1 comment

Comments

@jrobble
Copy link
Member

jrobble commented Feb 10, 2023

Related to #1674.

Having both LONG_SPEAKER_ID and SPEAKER_ID is a little confusing to end users. The format of LONG_SPEAKER_ID is <start>-<stop>-# where "start" and "stop" refer to the frames or times (in ms) of a media segment. (Currently, only video files are segmented by the Workflow Manager.) The format of SPEAKER_ID is just the # part of that.

The LONG_SPEAKER_ID is always set by speech-to-text components. On the other hand, In some cases the SPEAKER_ID is set to 0 to indicate that it should not be used and that the LONG_SPEAKER_ID should be used instead. This happens in cases where a video is segmented. Since each segment is processed independently the speakers need to be identified relative to their segments. For example, a speaker with id 0 in segment A may not be the same person as the speaker with id 0 in segment B. The <start>-<stop>- prefix ensures that each of these speakers has a unique id.

Because sometimes the SPEAKER_ID is valid and sometimes it's not, creating some confusion, moving forward we've decided to instead only use the full <start>-<stop>-# format to represent speaker ids. Specifically, we're dropping LONG_SPEAKER_ID from the JSON output object and instead re-purposing the existing SPEAKER_ID to use the long <start>-<stop>-# format.

@cdglasz
Copy link
Contributor

cdglasz commented Feb 20, 2023

This will be pushed to the next major release. All speech components have temporary logic to rename SPEAKER_ID to LONG_SPEAKER_ID, and overwrite SPEAKER_ID with 0. This temporary logic will merely need to be removed, unit tests updated to reference SPEAKER_ID rather than LONG_SPEAKER_ID, and openmpf-python-component-sdk/detection/component_util/mpf_component_util/job_config.py altered to set speaker_id according to SPEAKER_ID rather than LONG_SPEAKER_ID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants