Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetOneBestTimes() prevent overlap #1465

Open
danpovey opened this issue Feb 28, 2017 · 3 comments
Open

GetOneBestTimes() prevent overlap #1465

danpovey opened this issue Feb 28, 2017 · 3 comments
Labels
stale-exclude Stale bot ignore this issue

Comments

@danpovey
Copy link
Contributor

Relates to MBR decoding outputting CTM. See following exchange on kaldi-help:

Rémi Francis via googlegroups.com
9:01 AM (4 hours ago)

to kaldi-help
Hi,

I'm using lattice-to-ctm-conf with mbr decoding to get the ctm with confidences, but I've noticed that sometimes the timing of the words overlap a bit:
audio-000000 1 92.17 0.85 football 0.66
audio-000000 1 93.03 0.34 scouting 0.99
audio-000000 1 93.36 0.54 agency 1.00

Here "scouting" ends at 93.37 whereas agency starts at 93.36.
I've looked at the code and it seems to come from averaging the times of arcs in the lattice, but I don't know if this is a bug or just something to expect.

An easy fix would be to fiddle the timings so it doesn't overlap, since anyway the timings are precise only up to 0.03s, but I wonder if there is a better fix inside the binaries to do.

Daniel Povey dpovey@gmail.com
1:12 PM (1 minute ago)

to kaldi-help
It's inherent to the algorithm that the timings may overlap a bit... it would be possible to modify the algorithm to do a second pass over the times just to remove any overlap. If you had time to do that it would be great. Search for GetOneBestTimes(). I'll create an issue to keep track of it.

@stale
Copy link

stale bot commented Jun 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale bot on the loose label Jun 19, 2020
@stale
Copy link

stale bot commented Jul 19, 2020

This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.

@stale stale bot closed this as completed Jul 19, 2020
@kkm000 kkm000 added the ask-dan label Jul 19, 2020
@danpovey danpovey reopened this Jul 20, 2020
@stale stale bot removed the stale Stale bot on the loose label Jul 20, 2020
@kkm000 kkm000 added stale-exclude Stale bot ignore this issue and removed ask-dan labels Jul 20, 2020
@sdrobert
Copy link
Contributor

Hi @kkm000,

It's not exactly what was asked, but I've written a Perl script that does the fiddling of the timings. resolve_ctm_overlaps.py also resolves overlaps, but does so by deleting overlapping segments. My script does the dumb thing and decreases the duration of the segments until they no longer overlap with later segments, plus some handling of empty segments.

This script could easily be added as an optional final stage to the get_ctm_conf{,_fast}.sh scripts. Let me know if you're interested in me making a PR.

Thanks for your time,
Sean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale-exclude Stale bot ignore this issue
Projects
None yet
Development

No branches or pull requests

3 participants