New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No Praat TextGrid file output #9
Comments
Hi Loreto, I haven't seen that WARNING (from HTK, not the script) before. It strikes me as a weird one though: is your pronunciation dictionary have a lot of ambiguity (i.e., multiple pronunciations for a single orthographic word?). Presumably this could result in search failure (which results in no TextGrids). In general, if HTK is not completely silent, something probably went wrong once the data was handed over to HTK. See the recently posted issue about error codes; if this warning gives a non-zero return code, a future version of Prosodylab-Aligner will catch it. I am working on said issue, and in general making error messages more informative, later this week. What's the data like? How much data (in minutes)? It takes a considerable amount of data to train a new acoustic model (as mentioned in the README) Kyle On Mar 20, 2013, at 7:32 PM, Loreto Parisi notifications@github.com wrote:
|
When using the bash script it ends up with macbookproloreto:Prosodylab-Aligner loreto$ ./align_ex.sh data/River.wav data/River.lab I've found that the error LatFromPaths: Align have dur<=0 was due to a error in a conditional check in HTKLib/HRec.c, lines 1626 and 1651 where labid != splabid should be replaced with labpr != splabid After patching and compiling HTK again, it worked in some cases, but now it stopped working in almost all other alignment tests. |
On Mar 20, 2013, at 7:46 PM, Loreto Parisi notifications@github.com wrote:
So, HTK's first HVite pass failed, so no TextGrids were generated apparently.
Is this a documented bug? If so, has it been accepted in HTK? Could you provide a citation? |
Hi Kyle, http://speechtechie.wordpress.com/2009/06/12/using-htk-3-4-1-on-mac-os-10-5/ I'm new to HTK, so I cannot say if it has been accepted, but I will look into. Regarding, the pronunciation dictionary, I guess I miss something in my workflow. I'm going to build the dictionary again. I will be back to you as soon as I generate the dictionary again. Thanks! |
I don't see any reason to assume that that bug is real. If changing a line makes it work, but breaks other things, it's probably not a bug. (And who is Felix, and why didn't he submit the bug to the HTK bugtracker?). How much audio do you have for training? |
Yes I suppose you're right 👍 and I don't know who is Felix. The audio is about 4 minutes. |
FYI I just pushed some new features: try the newest version when you get a chance. |
Thanks I tried it and now (without improving the pronunciation dictionary) it capture the error macbookproloreto:Prosodylab-Aligner loreto$ ./align_ex.sh data/River.wav data/River.lab |
Thanks, this is easier to understand. Could you send to me, or post, the audio and transcription files? Kyle On Mar 22, 2013, at 1:32 PM, Loreto Parisi notifications@github.com wrote:
|
It also would be worthwhile to see if that hack attributed to Felix improves things, or if using HTK 3.4.0 helps. Kyle On Mar 22, 2013, at 1:32 PM, Loreto Parisi notifications@github.com wrote:
|
Hi Kyle, Considering that p2f seems to work (no errors coming from the HVite) we could say that the patched version of HVite worked out (Felix patch), but I cannot be sure of that in any case. Going to send you these results by email. |
Oh it's music, you should have said so! I am unaware of any working forced alignment with music. This technology only works well for loud, close-mic'ed recordings (preferably in an anechoic chamber) with a minimal amount of background noise at the moment. This is true of just about any speech technology except for in state-of-the-art academic and commercial research systems. If you wanted to align this, what you'd want to do is to take a look at software for making "a cappellas" and "instrumentals", and do that as a preprocessing step. This is outside of my area of expertise but I'm sure others have ideas about this. You also generally get better alignments by chopping up things into smaller pieces. While the technology is relatively robust to long files in general, you often get a suboptimal alignment that way, because alignments are computed using heuristic search with a narrow "beam". Also, it helps to put "sil" tokens into the label file where you expect silences (but they have to be real silences, not just pauses in singing…). This is presumably the source of your error. I took a look at the P2FA alignment (I was vaguely involved in that project, but that aligner has very serious problems: no training, numerous stale bugs, and it doesn't deal with silence in a standard fashion; this is why we developed Prosodylab-Aligner). While I don't have a ready copy of "Blue", I can quickly see from the TextGrid that the P2FA alignments failed quite spectacularly. There is no way there is a single 27-second long nasal sound in that song, for instance. Kyle On Mar 22, 2013, at 4:36 PM, Loreto Parisi loretoparisi@gmail.com wrote:
|
Looking at the TextGrid now I realized that it's true! Of course I'm applying your technology - we can say - in the worst case, So, you are right when saying "You also generally get better alignments by In fact it was my aim to start with that, making suboptimal alignement for 2013/3/23 Kyle Gorman kylebgorman@gmail.com
Dott. Ing. Loreto Parisi Email: loretoparisi@gmail.com |
dont know why, but sending by email it was poste by yourself 🎱 |
I have a good guess why: GitHub passes around fancy reply-to headers and you happened to include the ones associated with my username when you replied. But interesting engineering! |
Loreto, okay if I close this issue? |
Hi Kyle, It could be interesting to fork this process and continue the work with Prosodylab in the specific case of the Music if it makes sense of course. Of course this comments is outside the specific problem I raised here, it could be a specific discussione somewhere else maybe. Thanks! Let's see what happens. |
Best of luck , and feel free to fork it! |
Running a training and a sample test alignement, the procedure ends up with no TextGrid data, and no errors:
macbookproloreto:Prosodylab-Aligner loreto$ ./align.py -s 44010 -t data data
Nearest viable SR is 40000 Hz
Initializing...
Training...
Modeling silence...
More training...
Realigning...
WARNING [-8221] InitPronHolders: Total of 296 duplicate pronunciations removed in HVite
More training...
Final aligning...
Making TextGrids...
Alignment complete.
No TextGrid file was found in ./data/ directory
The text was updated successfully, but these errors were encountered: