Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to download and generate the data set #1

Open
miyamamoto opened this issue Mar 8, 2017 · 7 comments
Open

How to download and generate the data set #1

miyamamoto opened this issue Mar 8, 2017 · 7 comments

Comments

@miyamamoto
Copy link

That's a great task.
Please put a sample dataset for running the code.

@klauscc
Copy link
Owner

klauscc commented Mar 9, 2017

The model part is easy to understand I suppose!
I am trying to replication this work recently but it still performs very bad: on our small word level dataset it can only achieve accuracy of 57%(remove GRU layers, with logloss).
If you are working on this too, I am glad to discuss with you.

@rizkiarm
Copy link

rizkiarm commented May 2, 2017

Stumbled upon your works!

I see that you're working at the same thing as to what I did here: https://github.com/rizkiarm/LipNet
Using that model, I managed to achieve 10.2% CER, 15.0% WER, and 84.4% BLEU score in 15 epoch, which is approximately only ~3% more error than the actual model. You can check it out and use it if you're interested.

It is still under development. It would be great if you can contribute towards its development and share some results with me :)

@klauscc
Copy link
Owner

klauscc commented May 4, 2017

My result is
wer: 13.2% cer:2.44% on seen people(unlike the paper, test set was not in the training set)
wer: 23.7 cer:5.76 on unseen people(1,2,20,21).
The cer was nearly the same as the paper declared, but the wer is far higher than which in the paper.
In the paper they have splited sentences into words but I got worse result if I do like that.

@rizkiarm
Copy link

rizkiarm commented May 4, 2017

How did you manage to outperforms the paper (CER) in unseen speakers?
Because I see that you haven't employed any postprocessing towards the output, didn't implement any language model for decoding, and didn't use any special strategy.
May I know how many epoch your model has been trained on?

@klauscc
Copy link
Owner

klauscc commented May 4, 2017

I statistic all the metrics in The callback lipnet-replication/model/lipnet.py/StatisticCallback, which will evaluate the performace on validation set on each epoch end. Decoding I only use greedy search now.
The best model(evaluate by test on seen people loss) was trained 176 epoch.

@klauscc
Copy link
Owner

klauscc commented May 4, 2017

The CER is the same as mean edit distance.
I noticed the differences between my code and the code released by the author: he add the dropout layer after each BI-GRU layer but I do not. I tried to add dropout after GRU, the CER on seen people could achieve the accuracy in the paper but on the unseen people the CER and WER is very bad.

@rizkiarm
Copy link

rizkiarm commented May 4, 2017

I see, that explains why your model achieved that much accuracy compared to mine (176 epoch compared to 15 epoch).
Yeah I know that CER is edit distance, that's why its surprised me.
How bad is it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants