Skip to content

Commit

Permalink
incorporating the PR comments, adding sanity check
Browse files Browse the repository at this point in the history
- not more than 20% utterances can be dropped on `kaldi import`
  • Loading branch information
KarelVesely84 committed Aug 24, 2023
1 parent d072615 commit 087c5ef
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions lhotse/kaldi.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,13 +148,16 @@ def fix_id(t: str) -> str:

# remove recordings with 'None' duration (i.e. there was a read error)
for recording_id, dur_value in durations.items():
if dur_value == None:
if dur_value is None:
logging.warning(

Check warning on line 152 in lhotse/kaldi.py

View check run for this annotation

Codecov / codecov/patch

lhotse/kaldi.py#L152

Added line #L152 was not covered by tests
f"[{recording_id}] Could not get duration. "
f"Failed to read audio from `{recordings[recording_id]}`. "
f"Dropping the recording from manifest."
"Dropping the recording from manifest."
)
del recordings[recording_id]

Check warning on line 157 in lhotse/kaldi.py

View check run for this annotation

Codecov / codecov/patch

lhotse/kaldi.py#L157

Added line #L157 was not covered by tests
# make sure not too many utterances were dropped
if len(recordings) < len(durations) * 0.8:
raise RuntimeError(f"Failed to load more than 20% utterances of the dataset: \"{path}\"")

Check warning on line 160 in lhotse/kaldi.py

View check run for this annotation

Codecov / codecov/patch

lhotse/kaldi.py#L160

Added line #L160 was not covered by tests

# assemble the new RecordingSet
recording_set = RecordingSet.from_recordings(
Expand Down

0 comments on commit 087c5ef

Please sign in to comment.