Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train own model with Ubuntu 18? #7

Open
TDHTTTT opened this issue Feb 17, 2021 · 11 comments
Open

Train own model with Ubuntu 18? #7

TDHTTTT opened this issue Feb 17, 2021 · 11 comments

Comments

@TDHTTTT
Copy link

TDHTTTT commented Feb 17, 2021

I tried to bypass the version check in snowboy_pmdl.py: elif platform.linux_distribution()[0] == "Ubuntu" and platform.linux_distribution()[1] in ("16.04", "18.04"):.
When I try to run the generate_pmdl.py, output (no error):

template cut
personal enroll
channels: 1, sample rate: 16000, bits: 16
processing xxx.wav
processing xxx.wav
processing xxx.wav
saving file to hotword.pmdl
finished

But when I try to use the hotword.pmdl with demo, it doesn't work:

terminate called after throwing an instance of 'std::runtime_error'
  what():  ERROR (ReadToken():snowboy-io.cc:131) Fail to read token in ReadToken(), position -1

[stack trace: ]
/home/tdhttt/workspace/snowboy/examples/Python/_snowboydetect.so(_ZN7snowboy13GetStackTraceEv+0x35) [0x7f627ff5d1d5]
/home/tdhttt/workspace/snowboy/examples/Python/_snowboydetect.so(_ZN7snowboy13SnowboyLogMsgD1Ev+0x47a) [0x7f627ff5d7ba]
/home/tdhttt/workspace/snowboy/examples/Python/_snowboydetect.so(_ZN7snowboy9ReadTokenEbPSsPSi+0x270) [0x7f627ff69980]
/home/tdhttt/workspace/snowboy/examples/Python/_snowboydetect.so(_ZN7snowboy14PipelineDetect14ClassifyModelsERKSsPSsS3_+0x1f4) [0x7f627ff4b7d4]
/home/tdhttt/workspace/snowboy/examples/Python/_snowboydetect.so(_ZN7snowboy14PipelineDetect8SetModelERKSs+0x15a) [0x7f627ff4bd3a]
.
.
.
python(PyRun_FileExFlags+0x82) [0x56536f768222]
python(PyRun_SimpleFileExFlags+0x18d) [0x56536f767c4d]
python(Py_Main+0x616) [0x56536f716a86]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f6281fc3b97]
python(_start+0x2a) [0x56536f71637a]

Aborted (core dumped)

This issue makes me think that the pmdl is corrupted so it was not generated properly in the first place. May I ask how can I make it work in Ubuntu 18? What's the difference between Ubuntu 16 and 18? Maybe I can try Docker?

@dschnabel
Copy link

I had the same problem so I made a Dockerfile:

FROM ubuntu:16.04

RUN apt update && apt --yes --force-yes install wget unzip build-essential python python-dev virtualenv portaudio19-dev
RUN wget https://github.com/seasalt-ai/snowboy/archive/master.zip && unzip master.zip

RUN cd snowboy-master/ && \
    virtualenv -p python2 venv/snowboy && \
    . venv/snowboy/bin/activate && \
    cd examples/Python && \
    pip install -r requirements.txt

RUN apt -y remove wget unzip build-essential portaudio19-dev && apt -y autoremove && apt clean && rm -rf /var/lib/apt/lists/*

CMD cd snowboy-master/ && \
    . venv/snowboy/bin/activate && \
    cd examples/Python && \
    python generate_pmdl.py -r1=model/record1.wav -r2=model/record2.wav -r3=model/record3.wav -lang=en -n=model/hotword.pmdl

Save the above in a file called Dockerfile and from the same directory build a docker image like this:

docker build -t snowboy-pmdl .

This will create an image which you can run to train your personal model. In order for this to work you'll need to create a directory called model on your host machine (Ubuntu 18 or whatever) and place your three audio files in there. So the directory should look something like this (note: the wav files need the exact names as below or it won't work):

$ ls model/
record1.wav  record2.wav  record3.wav

Finally you can call docker (note: need to be in the parent directory of model):

docker run -it -v $(pwd)/model:/snowboy-master/examples/Python/model snowboy-pmdl

This command mounts the model directory in the docker container and runs a script which calls generate_pmdl.py

If everything went well, you should now have a file called hotword.pmdl in your model directory.

@Hemanshu-Bhargav
Copy link

Hi Daniel Schnabel, I'm not using the Ubuntu version, but it seems the API is not functional at the moment. I see your Dockerfile creates a personal model, however, is this personal model successfully trained on new audio samples? Just to confirm, your Dockerfile does indeed work for personal models trained after the API was shutdown on December 31st, 2020?

@dschnabel
Copy link

My dockerfile does not use the API, it uses the python script to train a personal model. See this portion from the Dockerfile which does the job:

python generate_pmdl.py -r1=model/record1.wav -r2=model/record2.wav -r3=model/record3.wav -lang=en -n=model/hotword.pmdl

@Hemanshu-Bhargav
Copy link

Hemanshu-Bhargav commented Feb 20, 2021

My dockerfile does not use the API, it uses the python script to train a personal model.

Thanks for your reply. Do you know if this script performs the same function as the API (without using training_service)? @chenguoguo Can you maybe clarify? Thanks!

@dschnabel
Copy link

@Hemanshu-Bhargav I asked a similar question in #5

@Hemanshu-Bhargav
Copy link

Hemanshu-Bhargav commented Feb 21, 2021

@dschnabel
Ah thanks, I didn't know any of the original model training was still functional for new "hotwords". Pre-existing models trained before the shutdown, however, remain functional.

@chenguoguo @hs79hs
If this new script is intended to replace the API, then, for the sake of example, in the Python demo, training_service.py has been removed in favour of the new script, correct? However, as referenced in #3, what would the execution flow look like for Android/iOS?

@chenguoguo
Copy link
Collaborator

@Hemanshu-Bhargav yes we are supposed to use the new script to replace the old training_service.py script. For Android/iOS, you can try to set up your own service, and then call the service from Android/iOS.

@dschnabel would you like to add your dockerfile to the repo? By the way it looks great!

@Hemanshu-Bhargav
Copy link

Hemanshu-Bhargav commented Apr 26, 2021

@chenguoguo Thanks for the confirmation. I had another question related to Ubuntu if you don't mind. I can open another issue if that keeps things organized.

Although I believe that support for the Raspberry Pi was later added to the original repository by other collaborators, I've been experiencing an issue with SciPy on the Raspberry Pi and I wanted to ask if you've perhaps had a similar experience.

I've tried different versions of SciPy, Python virtual environments, and @dschnabel's Dockerfile on both Ubuntu and Raspbian, but they all fail— either stating that SciPy is not available, or that Ubuntu 16.04 is required. The Dockerfile works without issue on Ubuntu running on any other architecture.

Any thoughts?

@dschnabel
Copy link

@chenguoguo I created a PR: #14

@rrsaikat
Copy link

rrsaikat commented Oct 15, 2022

Hi @dschnabel @chenguoguo
The dockerfile works exactly what i wanted, but i'm facing a major issue which is about the generated pmdl file size. I used 3 recording with different sizes (245kb, 350kb and 300kb) , but got the output only 35kb. I also used other recorded files to check if it resturns the same or not, actullay the scripts always return a file size in between 30 to 35kb.
And because of that detection is not really good, so can you suggest me what can i do?

Thank you

@codakkk
Copy link

codakkk commented Oct 24, 2022

I'm having the same issue, but cannot find a way to make it working.
Is there any way to make it work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants