Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNA audio and silence splitting interaction #52

Open
joanise opened this issue Apr 20, 2021 · 5 comments
Open

DNA audio and silence splitting interaction #52

joanise opened this issue Apr 20, 2021 · 5 comments
Assignees

Comments

@joanise
Copy link
Member

joanise commented Apr 20, 2021

Marc found a situation where silence splitting made the first word span half the DNA range at the beginning of a file. Notice in the data below that [0,4576] is DNA audio, but the first word in the SMIL file starts exactly at 4576/2=2.288s.

Config:

{
        "do-not-align": {
                "method": "remove",
                "segments": [{
                                "begin": 0,
                                "end": 4576
                        },
                        {
                                "begin": 11726,
                                "end": 26267
                        }
                ]
        }
}

Command:
readalongs align --config ./s0387_intro2.json --debug --save-temps --force-overwrite --language iku s0387_intro2.xml s0387_intro2.mp3 s0387_intro2.config 2> s0387_intro2.out.config

Data files: Eric received the data files to reproduce this by e-mail from Marc on 2021-04-19.

Output:

<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0">
    <body>
        <par id="par-t0b0d0p0s0w0">
            <text src="s0387_intro2.config.xml#t0b0d0p0s0w0"/>
            <audio src="s0387_intro2.config.mp3" clipBegin="2.288" clipEnd="5.016"/>
        </par>
        <par id="par-t0b0d0p0s0w1">
            <text src="s0387_intro2.config.xml#t0b0d0p0s0w1"/>
            <audio src="s0387_intro2.config.mp3" clipBegin="5.016" clipEnd="5.526"/>
        </par>
        <par id="par-t0b0d0p0s0w2">
            <text src="s0387_intro2.config.xml#t0b0d0p0s0w2"/>
            <audio src="s0387_intro2.config.mp3" clipBegin="5.526" clipEnd="6.806"/>
        </par>
        <par id="par-t0b0d0p0s0w3">
            <text src="s0387_intro2.config.xml#t0b0d0p0s0w3"/>
            <audio src="s0387_intro2.config.mp3" clipBegin="6.806" clipEnd="8.201"/>
        </par>
        <par id="par-t0b0d0p0s0w4">
            <text src="s0387_intro2.config.xml#t0b0d0p0s0w4"/>
            <audio src="s0387_intro2.config.mp3" clipBegin="8.201" clipEnd="9.301"/>
        </par>
        <par id="par-t0b0d0p0s0w5">
            <text src="s0387_intro2.config.xml#t0b0d0p0s0w5"/>
            <audio src="s0387_intro2.config.mp3" clipBegin="9.301" clipEnd="10.126"/>
        </par>
        <par id="par-t0b0d0p0s0w6">
            <text src="s0387_intro2.config.xml#t0b0d0p0s0w6"/>
            <audio src="s0387_intro2.config.mp3" clipBegin="10.126" clipEnd="10.606"/>
        </par>
        <par id="par-t0b0d0p0s0w7">
            <text src="s0387_intro2.config.xml#t0b0d0p0s0w7"/>
            <audio src="s0387_intro2.config.mp3" clipBegin="10.606" clipEnd="11.716"/>
        </par>
    </body>
</smil>
@roedoejet
Copy link
Collaborator

Are we sure this is an issue though and not just a coincidence? Using remove will remove the first 4576 ms of the audio, and so it's possible that the first word does align at 2.288s in the new audio. What happens if you change the dna method to mute instead? Also, when visualizing the readalong, is it wrong?

@roedoejet roedoejet self-assigned this Apr 20, 2021
@roedoejet roedoejet added the question Further information is requested label Apr 20, 2021
@joanise
Copy link
Member Author

joanise commented Apr 20, 2021

Since soundswallower would not have seen that range at all, with remove, it can't have been aligned to that timestamp, it has to have happened in some of the postprocessing we do with the results from soundswallower.
Yes, the readalong is wrong when looking at it.
Good idea to try and see what happens with mute, I'll test that.

@roedoejet
Copy link
Collaborator

Right. It looks like this is an interaction with the way we're adjoining silence between words:

if not bare:
        # Split adjoining silence/noise between words
        last_end = 0.0
        last_word = dict()
        for word in results["words"]:
            silence = word["start"] - last_end
            midpoint = last_end + silence / 2
            if silence > 0:
                if last_word:
                    last_word["end"] = midpoint
                word["start"] = midpoint
            last_word = word
            last_end = word["end"]
        silence = final_end - last_end
        if silence > 0:
            if last_word is not None:
                last_word["end"] += silence / 2

I'm not really sure what the intended functionality should be here. Maybe we should include dna segments as possible last_end values?

@roedoejet
Copy link
Collaborator

This is a possible fix: 752553f

@roedoejet roedoejet added bug Something isn't working and removed question Further information is requested bug Something isn't working labels Apr 20, 2021
@joanise
Copy link
Member Author

joanise commented Apr 20, 2021

752553f: just reading the code, I think it should fix the case where the silence goes back into the previous dna segment, so that probably works. I don't have a test case so I cannot check right now, but will it also avoid pushing the silence at the end of a word into a dna segment that follows it?
I'll have to test this fix anyway, but I'm not ready to do that right now, although maybe Marc would be able to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants