The mismatch between split and split-lazy, specifically counting from 0 or 1 #1152

yfyeung · 2023-09-16T14:18:47Z

I note that lhotse split will generate splits that counting from 1, while lhotse split-lazy will generate splits that counting from 0.
In some icefall recipes like gigaspeech, we initially use lhotse split and then change to use lhotse split-lazy. However, we didn't consider this mismatch, resulting in some bugs.

The text was updated successfully, but these errors were encountered:

yfyeung · 2023-09-16T14:22:38Z

For lhotse split:

lhotse/lhotse/utils.py

Line 336 in 567ba29

def split_sequence(

For lhotse split-lazy:

lhotse/lhotse/utils.py

Line 277 in 567ba29

def split_manifest_lazy(

pzelasko · 2023-09-16T18:46:56Z

Sorry for that, I think initially I wanted to be compatible with Kaldi's convention of 1-based splits for data directories, and later I didn't remember that when introducing the lazy thing. We should fix that. I probably prefer to adopt 0-based counting everywhere, seems more consistent with the rest of the and library and Python in general. WDYT @yfyeung @csukuangfj @desh2608 @danpovey?

yfyeung · 2023-09-17T06:35:37Z

Thanks. I agree with adhering to a 0-based counting system.

Best regards.

desh2608 · 2023-09-17T11:46:35Z

Submitting jobs on an SGE cluster requires indices starting at 1, which I suppose is why 1-based indexing is used in Kaldi. I would suggest adding a "start_index" option to those functions, which defaults to 0.

desh2608 · 2023-09-17T11:54:57Z

The above is also why we added the num_digits option in the lazy version. In the original implementation, I think padding was by default, but this made it hard to use the split manifests in array job submissions.

* Tutorial materials in main readme page * Fixes for #1152 #1153 and #1154 * Fix isinstance use in Python 3.7-3.9

pzelasko · 2023-09-18T01:34:26Z

Thanks, should be fixed now.

pzelasko added a commit that referenced this issue Sep 18, 2023

Fixes for #1152 #1153 and #1154

cc3183e

pzelasko mentioned this issue Sep 18, 2023

Fixes for #1152 #1153 and #1154 #1156

Merged

pzelasko closed this as completed in #1156 Sep 18, 2023

pzelasko added a commit that referenced this issue Sep 18, 2023

Fixes for #1152 #1153 and #1154 (#1156)

3dde48d

* Tutorial materials in main readme page * Fixes for #1152 #1153 and #1154 * Fix isinstance use in Python 3.7-3.9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The mismatch between split and split-lazy, specifically counting from 0 or 1 #1152

The mismatch between split and split-lazy, specifically counting from 0 or 1 #1152

yfyeung commented Sep 16, 2023

yfyeung commented Sep 16, 2023

pzelasko commented Sep 16, 2023

yfyeung commented Sep 17, 2023

desh2608 commented Sep 17, 2023

desh2608 commented Sep 17, 2023

pzelasko commented Sep 18, 2023

The mismatch between split and split-lazy, specifically counting from 0 or 1 #1152

The mismatch between split and split-lazy, specifically counting from 0 or 1 #1152

Comments

yfyeung commented Sep 16, 2023

yfyeung commented Sep 16, 2023

pzelasko commented Sep 16, 2023

yfyeung commented Sep 17, 2023

desh2608 commented Sep 17, 2023

desh2608 commented Sep 17, 2023

pzelasko commented Sep 18, 2023