New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SoX deps, resolves #3487 #3488
Conversation
Note I have also added |
@@ -18,7 +18,12 @@ RUN apt-get update && apt-get install -y --no-install-recommends \ | |||
locales \ | |||
python3-venv \ | |||
unzip \ | |||
wget | |||
wget \ | |||
sox \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please ensure proper indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great pickup, thx @lissyx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings:
- it's easy and not a big deal, so why not
- it opens the door to potentially everybody wanting to put its required deps here, and we end up with a mess we can't properly maintain
I'm really not sure it's required we add vim
/nano
there, but maybe I am biaised by my workflow such that I mostly never have to work within the container.
For the sox
deps, my workflow is always to derive a new image from the base image we produce, so it would be where I place the dep.
@reuben what's your take on that?
I have literally never used the Dockerfiles so my opinion counts very little I'm afraid. I agree with you that adding all required deps of all importers will quickly become untenable, but I guess Common Voice is "first-party" enough that it's warranted to add a dep for that importer. So overall I'm OK with this PR as is. |
All fair and valid points. The reason that I'm using a |
I still feel like advising people to product their own image based on ours and thus managing their deps would be a good requirements. (and we could still merge those anyway) what do you think @KathyReid ? |
I think what you propose is a good way forward; with The one downside I can see from this approach is that by encouraging people to train a model from a Docker environment, we are trying to standardise or reduce the variability of environments that are trained with - thus reducing the range of issues that come up for support, and reducing support volume. If we encourage people to modify their own Docker images for training, that will increase variability in support issues. Should we add something to the |
I may be showing my inexperience with docker here, but would it be feasible to have two dockerfiles? For those looking to apply it in their own version with specialist needs, they'd potentially start with the lite one but for use in the playbook/with beginners, the batteries included one would be the place to start because it's simple to get going and will reduce support confusion stemming from those who are less experienced |
Thanks @nmstoker for the feedback - appreciated. I don't have strong opinions either way on your suggestion - my guiding principle here is "what does the community need to reduce the hurdle of getting to train a model?". For me, that's likely to be a "batteries included" Dockerfile that allows someone to train a model from say the CV corpus, with minimal preparation work. There are already two Sorry I'm not more helpful on this one, but it's probably Lissy or Reuben's call here. |
The truth is that I was working on that on a spare project: https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/ This is the project you want. And the goal was that our
@KathyReid @reuben Maybe it is time for https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/ to move somewhere else and be more supported? |
I re-tested this with the Docker Hub image (not one I built myself) and tried to do data formatting for CV datasets, which is the example given in the PlayBook, and it fails on Can we please have this pulled into DeepSpeech so that the Docker Hub image includes
|
I'm really not sure we should enable that, as I stated earlier. |
OK, so the alternatives I can see here are;
So I think option #1 would be better - what do you think? |
people should derive using
no that's no good You forget #3: finally address my suggestion stated in #3488 (comment) |
|
My point is, reducing ongoing support for the usecase you want to cover in the playbook is best addressed by finding an answer to that question
If at least we advertise clearly that:
I think it should be okay. |
I've resolved this via |
No description provided.