Possible fixes from fork #35

nmfisher · 2023-04-05T01:03:52Z

Thanks for this great package! I forked the repo to tweak a few things to help my use case, and some of them might be useful to merge back into the master branch. I haven't submitted a PR because some of them might not be appropriate/desirable to merge, so I figured you could tell me which ones you want and I could clean up the code/add some tests if necessary and submit a PR then.

Fork is at https://github.com/nmfisher/charsiu

Changes are:

don't require sampling rate to be explicitly provided as librosa can resample to 16000Hz when loading a file
re-instate punctuation and insert the punctuation token, rather than silence, into the phone list
downweight silence to minimize erroneous insertion of silence in the middle of a word (this should probably be a parameter rather than a hardcoded 0.1)
ignore silence where left and right phones are identical (to completely avoid inserting silence frames in the middle of consecutive frames for a single phone). This works for me right now but needs a bit more thought because if phones are intentionally repeated (e.g. "ai ai"), this will fold silence between them into the left phone, so "ai [SIL] ai") will always becomes "ai ai". Solution is probably just to pass a parameter for a minimum silence duration (so if silence is greater than X, it's presered, otherwise it's folded into the left phone).

lingjzhu · 2023-04-07T17:47:23Z

Thank you for your help! I had undergone many changes in my life so I didn't update this repo regularly. So there changes are highly appreciated!

I think 2 is really helpful for some applications but not for others. For example, sometimes people only want to work with phonemes so punctuations are not necessary for them. Could you make it optional?

3 and 4 are really, really helpful! Thank you so much!!!
Let me know if I can help in any way. I am working on a improved model so hopefully I can also incorporate your features in the new models. But that might take a few months to complete :)

phliulei · 2023-04-19T16:24:34Z

Firstly, I would like to express my gratitude for the development of such an excellent tool. During my testing using L2 Mandarin speech, it became clear that these speakers tend to speak more slowly, which often results in the insertion of false [SIL]. The modified script has shown to produce better results with this speech, but I am curious to know if there is a way to completely avoid the insertion of [SIL], particularly when it is inserted in the middle of one Chinese character, given this is a rare occurrence in Mandarin.

nmfisher · 2023-04-21T02:56:33Z

@phliulei I think the best way is to specify a minimum silence duration, so anything shorter is ignored/treated as part of the previous phone. I mentioned this in point (4) above but I haven't had a chance to implement yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible fixes from fork #35

Possible fixes from fork #35

nmfisher commented Apr 5, 2023

lingjzhu commented Apr 7, 2023

phliulei commented Apr 19, 2023

nmfisher commented Apr 21, 2023

Possible fixes from fork #35

Possible fixes from fork #35

Comments

nmfisher commented Apr 5, 2023

lingjzhu commented Apr 7, 2023

phliulei commented Apr 19, 2023

nmfisher commented Apr 21, 2023