Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions python/AzureSpeechDetection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,7 @@ Returned `AudioTrack` objects have the following members in their `detection_pro

| Property Key | Description |
|--------------------------------||
| `LONG_SPEAKER_ID` | A unique speaker identifier, of the form "`<start_offset>-<stop_offset>-<#>`, where `<start_offset>` and `<stop_offset>` are integers indicating the segment range (in frame counts for video jobs, milliseconds for audio jobs) for sub-jobs when a job has been segmented by the Workflow Manager. The final `#` portion of the ID is a 1-indexed counter for speaker identity within the indicated segment range. When jobs are not segmented, or not submitted through the Workflow Manager at all, `stop_offset` may instead be `EOF`, indicating that the job extends to the end of the file. |
| `SPEAKER_ID` | A dummy field set to "0". |
| `SPEAKER_ID` | A unique speaker identifier, of the form "`<start_offset>-<stop_offset>-<#>`, where `<start_offset>` and `<stop_offset>` are integers indicating the segment range (in frame counts for video jobs, milliseconds for audio jobs) for sub-jobs when a job has been segmented by the Workflow Manager. The final `#` portion of the ID is a 1-indexed counter for speaker identity within the indicated segment range. When jobs are not segmented, or not submitted through the Workflow Manager at all, `stop_offset` may instead be `EOF`, indicating that the job extends to the end of the file. |
| `GENDER` | Only present if supplied by an upstream component. The gender of the speaker. |
| `GENDER_CONFIDENCE` | Only present if supplied by an upstream component. The confidence of the gender classification. |
| `TRANSCRIPT` | The text of the utterance transcript. Words are space-separated. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,6 @@ def get_detections_from_job(
logger.exception(f'Exception raised while processing audio: {e}')
raise

# Remove this block to drop LONG_SPEAKER_ID
for track in audio_tracks:
track.detection_properties['LONG_SPEAKER_ID'] = track.detection_properties['SPEAKER_ID']
track.detection_properties['SPEAKER_ID'] = '0'

logger.info('Processing complete. Found %d tracks.' % len(audio_tracks))
return audio_tracks

Expand Down
2 changes: 1 addition & 1 deletion python/AzureSpeechDetection/tests/test_acs_speech.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ def test_diarization(self):
# There should be two speakers with diarization, one without
len_raw, len_dia = [
len(set([
track.detection_properties['LONG_SPEAKER_ID']
track.detection_properties['SPEAKER_ID']
for track in result
]))
for result in results
Expand Down