Adding alignments to supervision #304

desh2608 · 2021-05-11T23:59:53Z

This PR will add alignments to SupervisionSegment, as discussed in #298.

… recipes

…ipes

Conflicts: lhotse/qa.py

… recipes

desh2608 · 2021-05-12T00:08:14Z

Some questions/comments:

I don't fully understand what you meant about using __post_init__ in SupervisionSegment. Could you explain a little more?
Where should from_ctm and to_ctm be implemented? CTM word supervisions are usually at the recording level, so a CTM would more naturally be associated with a SupervisionSet I think.
Some functions in cuts.py which prepare supervision masks can be modified to optionally use the alignments for mask generation. This would be particularly useful for VAD training etc.
What's a natural workflow for adding alignments? Should they be included in Lhotse recipes, or do users get to add them to the prepared supervisions manifest?

pzelasko

Looks good! I left some comments and suggestions.

Before you go on with adding more stuff, I suggest you add/extend unit tests to cover the new code paths that are being introduced. In particular you might want to see lhotse.testing.dummies (https://github.com/lhotse-speech/lhotse/blob/master/lhotse/testing/dummies.py) and add alignments to supervision segments to see how many existing tests will break (if any). You can also extend lhotse.testing.fixtures with it which performs randomized testing with PyTorch datasets (https://github.com/lhotse-speech/lhotse/blob/master/lhotse/testing/fixtures.py#L81).

Your questions:

I don't fully understand what you meant about using post_init in SupervisionSegment. Could you explain a little more?

When you store the manifest in JSON/sth else, it will convert AlignmentItem to a 3-element list. When we read that JSON, we need to convert these 3-element lists back to AlignmentItems. We can do it in two ways: one is using a special member in dataclasses that is called __post_init__, which is called after __init__ (this allows to use default __init__ in dataclasses and still customize it). But I think there's actually a better way to do it -- I think there is a method called from_dict in supervision segment that you can adjust instead.

Where should from_ctm and to_ctm be implemented? CTM word supervisions are usually at the recording level, so a CTM would more naturally be associated with a SupervisionSet I think.

Yeah SupervisionSet makes sense to me. Actually if we read with from_ctm, we would not have the speaker etc. information anymore -- maybe we can instead make it sth like add_ctm_info (not sure how to name it well...) that adds alignments to an existing object? Or we can read and merge two supervision sets. Or maybe there is another way -- it's up to you.

Some functions in cuts.py which prepare supervision masks can be modified to optionally use the alignments for mask generation. This would be particularly useful for VAD training etc.

Makes sense!

What's a natural workflow for adding alignments? Should they be included in Lhotse recipes, or do users get to add them to the prepared supervisions manifest?

I think users will typically extend existing supervision manifests, as not too many corpora provide alignments.

lhotse/supervision.py

danpovey · 2021-05-12T03:39:30Z

Guys, I haven't looked at this in detail, but I do want to mention something that's been on my mind recently RE snowfall and (eventual) Icefall.

Firstly, the timeline for designing icefall is basically in the next few weeks we'll be figuring out how to start, and we'd like Icefall to be "officially releasable" in time for the Intespeech tutorial around September 1st. Think of Icefall as a "properly designed" version of snowfall.

Anyway, the issue related to this is, I'd like to be able to deal with things like alignments in Icefall/Snowfall, e.g. for purposes of building trees and the like, or for segmenting training data, data-cleaning and analysis, and so on. I'm not saying the format we'd use for such things would necessarily be identical to the the way they are represented in Lhotse. But a workflow that I think might end up being fairly common, is we train some fairly basic neural net model that only sees limited acoustic context (like a TDNN), for purposes of data alignment. If we're using a model where there is a phone for silence or at least a unit that represents "end-of-word", it may be useful to have a blank-free topology here. I don't believe we currently have code for such a topology in snowfall.

And we might want some way to represent those data alignments. E.g. one simple representation might just be the label sequences, indexed somehow by utterance-id. There are several different label-sequences that might be relevant here, depending on the type of system: the ilabel from the model, the phone-label "without repetitions", which we could store as a separate attribute on the graphs by using the "inner_labels=xxx" arg to compose in the appropriate stage of graph creation; and the olabel which is the word label. One possibility is to just store these as a dict indexed by utterance-id and then by 'ilabel', 'olabel' and 'phone_label' (for phone labels without repetitions) or something like that, and store it as a .pt file with torch.save().

pzelasko · 2021-05-12T03:47:03Z

That makes sense to me. In another project, where I prototyped some of the alignment-related stuff, I'm using this CTM-like format together with methods to convert to/from a frame-level alignment (int sequence). It handles conversions such as 10ms - 12ms frame shift etc. I think it's going to play well with what you're planning for Icefall.

danpovey · 2021-05-12T07:49:33Z

Something else the alignments might be useful for, if we can get word-level alignments, is in designing BPE-type units. It's desirable to have units where words that have a short pronunciation also have a short representation, so we can keep the frame rate slow.

…ython 3.7

pzelasko · 2021-05-18T13:10:39Z

lhotse/supervision.py

@@ -251,12 +254,49 @@ def from_segments(segments: Iterable[SupervisionSegment]) -> 'SupervisionSet':
    def from_dicts(data: Iterable[Dict]) -> 'SupervisionSet':
        return SupervisionSet.from_segments(SupervisionSegment.from_dict(s) for s in data)

-    def add_alignments_from_ctm(ctm) -> 'SupervisionSet':
+    def add_alignments_from_ctm(self, ctm_file: Pathlike, type: str = 'word') -> 'SupervisionSet':


I'd suggest with_alignments_from_ctm, add suggests we're mutating the original object

pzelasko · 2021-05-18T13:12:00Z

lhotse/supervision.py

+        :return: A new SupervisionSet with AlignmentItem objects added to the segments.
+        """
+        ctm_words = []
+        with open(ctm_file, 'r') as f:


Suggested change

with open(ctm_file, 'r') as f:

with open(ctm_file) as f:

pzelasko · 2021-05-18T13:12:52Z

lhotse/supervision.py

+        with open(ctm_file, 'r') as f:
+            for line in f:
+                reco_id, channel, start, duration, symbol = line.strip().split()
+                ctm_words.append((reco_id, channel, float(start), float(duration), symbol))


shouldn't channel be an int?

(at least in Lhotse we always map them to ints and expect int, so be careful there)

Ideally, yes. But sometimes they are also denoted by A, B, etc. in CTM files.

yes, channels do not have to be int

pzelasko · 2021-05-18T13:14:27Z

lhotse/supervision.py

+            for line in f:
+                reco_id, channel, start, duration, symbol = line.strip().split()
+                ctm_words.append((reco_id, channel, float(start), float(duration), symbol))
+        ctm_words = sorted(ctm_words, key=lambda x:x[0])


is python's sorted guaranteed to be stable? otherwise you might want to change the key to be a tuple of (reco_id, start)

Yes, it is guaranteed (https://docs.python.org/3/library/functions.html#sorted). Also, we don't particularly need the alignments to be sorted by time I suppose? This sorting by reco_id is just to enable groupby.

We probably don't, but I think it's a reasonable expectation that they are sorted (I know I'd be surprised if they're not). If it's guaranteed then it's cool :)

pzelasko · 2021-05-18T13:15:43Z

lhotse/supervision.py

+        reco_to_ctm = defaultdict(list, {k: list(v) for k,v in groupby(ctm_words, key=lambda x:x[0])})
+        segments = []
+        for reco_id in reco_to_ctm:
+            segs = [s for s in self if s.recording_id == reco_id]


I think a faster variant is self.find(recording_id=reco_id) which internally builds an index and caches it

pzelasko · 2021-05-18T13:17:53Z

lhotse/supervision.py

+            for seg in segs:
+                alignment = [AlignmentItem(word[4], word[2], word[3]) for word in ctm_words 
+                             if overspans(
+                                 TimeSpan(start=seg.start, end=seg.end),


I think it's OK to just pass seg here due to duck typing

pzelasko · 2021-05-18T13:19:14Z

lhotse/supervision.py

+        return SupervisionSet.from_segments(segments)
+
+
+    def write_ctm(self, ctm_file: Pathlike, type: str = 'word') -> None:


how about write_alignment_to_ctm? it's un-ambiguous, as it's also possible to convert "normal" supervisions to a CTM

pzelasko · 2021-05-18T13:20:36Z

lhotse/supervision.py

+            segs = [s for s in self if s.recording_id == reco_id]
+            for seg in segs:
+                alignment = [AlignmentItem(word[4], word[2], word[3]) for word in ctm_words 
+                             if overspans(


what should happen if the alignment item overlaps a supervision segment? maybe we should issue some warning about potential mismatch? I'm not sure.

pzelasko

Thanks @desh2608! I left some comments, could you also add unit tests for the two new methods of import/export to CTM?

pzelasko · 2021-05-18T13:23:48Z

(you can add dummy CTMs to test/fixtures, don't forget to add them to git)

desh2608 · 2021-05-19T18:32:25Z

@pzelasko I have made the changes you suggested and also added tests for reading/writing CTM. Perhaps the PR can be merged now if it looks good to you (I will try to make the changes to supervision mask generation in Cuts later).

pzelasko · 2021-05-20T01:09:01Z

OK I am merging as it is. We can try them out and adjust or optimize as needed.

Thanks for all the work on this, great job!

csukuangfj · 2021-06-05T07:47:59Z

lhotse/supervision.py

+                    for item in ali
+                ]
+                for ali_type, ali in self.alignment.items()
+            } if self.alignment else None


Line 192 already checks whether self.alignment is None, so no need to do an extra check here.

csukuangfj · 2021-06-05T08:24:18Z

lhotse/supervision.py

+                    segments.append(fastcopy(seg, alignment={type: alignment}))
+            else:
+                segments.append([s for s in self.find(recording_id=reco_id)])
+        print (segments)


This kind of debugging code should not be checked into the final commit.

csukuangfj · 2021-06-05T08:46:38Z

lhotse/supervision.py

+        We use the recursive conversion only if alignments are present, since it may
+        potentially be slower due to type checking of member objects.
+        """
+        return asdict_nonull(self) if self.alignment is None else asdict_nonull_recursive(self)


The AlignmentItem class is no longer a subclass of NamedTuple, is asdict_nonull_recursive really necessary? It makes the code slow, I believe.

Also, the code in cut.py should also be updated to remove asdict_nonull_recursive.

Good point, @desh2608 could you address Fangjuns comments?

Btw @csukuangfj we should make sure that Lhotse alignments and Snowfall alignments interact well together. If there are any additions/changes that are helpful on Lhotse side, please let us know.

desh2608 · 2021-06-05T21:57:30Z

@csukuangfj thanks for your review. I have made suggested changes in #313.

desh2608 and others added 25 commits April 30, 2021 17:26

initial commit for gale_arabic

00a2a68

Added GALE Mandarin recipe

3915eea

Merge branch 'master' into recipes

a5ca1ee

Merge branch 'master' of https://github.com/lhotse-speech/lhotse into…

0ab84a2

… recipes

Added language in SupervisionSegment

11aada0

Merge branch 'recipes' of https://github.com/desh2608/lhotse into rec…

7c796a8

…ipes

Added CMU Kids recipe

5a187c2

remove repeated corpora

c744109

Added CSLU kids recipe

aae1f9f

minor change in CMU and CSLU kids bin

118e31c

table of corpora in docs

4b35af5

added CSLU kids in docs

07d435d

resolved merge conflict

623c05d

fix in CSLU kids comments

9429158

Adds mtedx recipe

6f6f86d

Merge branch 'master' of https://github.com/lhotse-speech/lhotse

e837243

Conflicts: lhotse/qa.py

Small fix to mtedx documentation

90809e0

Merge branch 'master' of https://github.com/lhotse-speech/lhotse into…

dfadfb8

… recipes

Merge branch 'mtedx' of https://github.com/m-wiesner/lhotse into recipes

ec9df69

make AMI CLI mode consistent with python usage

df2afd1

initial commit for alignments

b5de72e

removed older files

9011d11

removed mtedx script

63594af

remove unnecessary staged files

9ce0a3b

minor formatting change

6e0d137

pzelasko reviewed May 12, 2021

View reviewed changes

lhotse/supervision.py Show resolved Hide resolved

lhotse/supervision.py Outdated Show resolved Hide resolved

lhotse/supervision.py Outdated Show resolved Hide resolved

lhotse/supervision.py Outdated Show resolved Hide resolved

desh2608 added 7 commits May 14, 2021 16:19

save namedtuples as list; add tests with alignment

cb62495

add alignments in supervision tests

a806cf0

revert namedtuple serialization to dict

50ce232

add recursive serialization for cuts if supervision contains alignment

a1eb7df

added conversion from and to ctm

b751d0e

use dataclass instead of namedtuple for AlignmentItem due to bug in P…

15c6213

…ython 3.7

fix CTM conversion; add test for transform_alignment

9dafee6

pzelasko added this to the v0.7 milestone May 18, 2021

pzelasko reviewed May 18, 2021

View reviewed changes

desh2608 added 2 commits May 19, 2021 14:06

add tests for reading and writing CTM

3dcd447

add original segments if no alignment present in CTM

3f85d95

pzelasko merged commit a3aa32e into lhotse-speech:master May 20, 2021

pzelasko changed the title ~~[WIP] Adding alignments to supervision~~ Adding alignments to supervision May 20, 2021

csukuangfj reviewed Jun 5, 2021

View reviewed changes

desh2608 mentioned this pull request Jun 5, 2021

Remove recursive serialization for alignment #313

Merged

csukuangfj mentioned this pull request Jun 6, 2021

Begin to add small tools for snowfall k2-fsa/snowfall#207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding alignments to supervision #304

Adding alignments to supervision #304

desh2608 commented May 11, 2021 •

edited

Loading

desh2608 commented May 12, 2021

pzelasko left a comment

danpovey commented May 12, 2021

pzelasko commented May 12, 2021

danpovey commented May 12, 2021

pzelasko May 18, 2021

pzelasko May 18, 2021

pzelasko May 18, 2021

pzelasko May 18, 2021

desh2608 May 19, 2021

jtrmal May 19, 2021

pzelasko May 18, 2021

desh2608 May 19, 2021

pzelasko May 19, 2021

pzelasko May 18, 2021

pzelasko May 18, 2021

pzelasko May 18, 2021

pzelasko May 18, 2021

pzelasko left a comment

pzelasko commented May 18, 2021

desh2608 commented May 19, 2021 •

edited

Loading

pzelasko commented May 20, 2021

csukuangfj Jun 5, 2021

csukuangfj Jun 5, 2021

csukuangfj Jun 5, 2021

pzelasko Jun 5, 2021

desh2608 commented Jun 5, 2021

		return SupervisionSet.from_segments(segments)


		def write_ctm(self, ctm_file: Pathlike, type: str = 'word') -> None:

Adding alignments to supervision #304

Adding alignments to supervision #304

Conversation

desh2608 commented May 11, 2021 • edited Loading

desh2608 commented May 12, 2021

pzelasko left a comment

Choose a reason for hiding this comment

danpovey commented May 12, 2021

pzelasko commented May 12, 2021

danpovey commented May 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzelasko left a comment

Choose a reason for hiding this comment

pzelasko commented May 18, 2021

desh2608 commented May 19, 2021 • edited Loading

pzelasko commented May 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

desh2608 commented Jun 5, 2021

desh2608 commented May 11, 2021 •

edited

Loading

desh2608 commented May 19, 2021 •

edited

Loading