Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate manifest id after mixing #1265

Closed
MarcoMultichannel opened this issue Jan 22, 2024 · 4 comments
Closed

Duplicate manifest id after mixing #1265

MarcoMultichannel opened this issue Jan 22, 2024 · 4 comments

Comments

@MarcoMultichannel
Copy link

Hello,
Using lhotse 1.18 everything works fine, but since the 1.19 there's a problem that I discovered while trying the zipformer recipe in icefall, in particular during the mixing phase with Musan samples.

The assertion fails and this is the output:

batch = train_dl.dataset[cuts]
  File "/home/marco/icefall_test/.venv/lib/python3.10/site-packages/lhotse/dataset/speech_recognition.py", line 109, in __getitem__
    cuts = tnfm(cuts)
  File "/home/marco/icefall_test/.venv/lib/python3.10/site-packages/lhotse/dataset/cut_transforms/mix.py", line 70, in __call__
    ).to_eager()
  File "/home/marco/icefall_test/.venv/lib/python3.10/site-packages/lhotse/serialization.py", line 380, in to_eager
    return cls.from_items(self)
  File "/home/marco/icefall_test/.venv/lib/python3.10/site-packages/lhotse/cut/set.py", line 310, in from_cuts
    return CutSet(cuts=index_by_id_and_check(cuts))
  File "/home/marco/icefall_test/.venv/lib/python3.10/site-packages/lhotse/utils.py", line 710, in index_by_id_and_check
    assert m.id not in id2man, f"Duplicated manifest ID: {m.id}"
AssertionError: Duplicated manifest ID: <MANIFEST_ID>

The problem isn't related to my manifests, since it works if you use lhotse 1.18.

This is the code where the transform is added:

cuts_musan = load_manifest(self.args.manifest_dir / "musan_cuts.jsonl.gz")
transforms.append(CutMix(cuts=cuts_musan, p=0.5, snr=(10, 20), preserve_id=True))

Setting preserve_id to False also solves the issue.

@pzelasko
Copy link
Collaborator

Thanks for reporting, this is the same issue as #1267 which is now resolved via #1268

@Mahaotian1
Copy link

I have met thie problem just now,
assert m.id not in id2man, f"Duplicated manifest ID: {m.id}"
AssertionError: Duplicated manifest ID: roots_29_morris_0109-8426
But I have a question, I met this problem when I was training 8th epoch. I have not change any things of the cutset and the h5 file. But why it happened suddenly, the version of the lhotse is "1.20.0.dev+git.b3373c0.clean"

@pzelasko
Copy link
Collaborator

I just merged it ~1h ago — you’d need to pip uninstall lhotse and then pip install git+https://github.com/lhotse-speech/lhotse to get this fix. I intend to release a new version of lhotse to pip with the fix soon.

@Mahaotian1
Copy link

I just merged it ~1h ago — you’d need to pip uninstall lhotse and then pip install git+https://github.com/lhotse-speech/lhotse to get this fix. I intend to release a new version of lhotse to pip with the fix soon.

Is that the same question as above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants