Welcome to Piper Discussions! #136

synesthesiam · 2023-07-11T20:00:02Z

synesthesiam
Jul 11, 2023
Maintainer

👋 Welcome!

We’re using Discussions as a place to connect with other members of our community. We hope that you:

Ask questions you’re wondering about.
Share ideas.
Engage with other community members.
Welcome others and are open-minded. Remember that this is a community we
build together 💪.

To get started, comment below with an introduction of yourself and tell us about what you do with this community.

thorstenMueller · 2023-07-11T21:13:01Z

thorstenMueller
Jul 11, 2023

Guude (Hi) 👋,
my name is Thorsten Müller (aka. Thorsten-Voice) and i'm a german guy with lots of passion for open voice technology 😊.

0 replies

amdrozdov · 2023-07-12T14:17:53Z

amdrozdov
Jul 12, 2023

Hello, maybe it's a good idea to add C++ examples (without python) for whisper.cpp users. Let's say project is using whisper.cpp for speech recognition and piper for speech generation.

0 replies

prevoste · 2023-07-13T13:22:02Z

prevoste
Jul 13, 2023

Hi Guys,

Great project, I have managed to install piper on a Raspberry Pi 4 and I am impressed with the quality of the synthesis.

So far I have managed to use the ‘en_US-lessac-medium.onnx’ model successfully but a couple others I have tried have generated some sort of json error, ‘en_GB-northern_english_male-medium.onnx’ model for example.

Error:

terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_2::detail::parse_error'
what(): [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: '<'

But my main reason for this submission is to ask, is it possible to direct the output directly to a speaker rather than to a wav file?

Thanks for your time, great job.

Ernie

1 reply

synesthesiam Jul 17, 2023
Maintainer Author

Thanks! I'll have to check on the config error.

You can use --output-raw with piper to send raw (not WAV) audio directly to stdout . It will be 16-bit mono PCM samples with the same rate of the voice.

prevoste · 2023-07-18T12:38:19Z

prevoste
Jul 18, 2023

Hi Michael,

Thanks for your response.  Since I made the post I came across the ‘output-raw’ option which helps a lot.

What my aim is to create a voice assistant where ‘piper’ would be the response side. I can now pass a message to ‘piper’ and then pipe the output to ‘aplay’ to output to the speaker. This all works at the command line but is slow! For each phrase to be spoken ‘piper’ needs to load with the appropriate voice model and then load ‘aplay’ to sound the voice.

What I was hoping to do was load ‘piper’ as a subprocess in a python script or ‘c’ program and have it wait for input phrases on ‘strdin’ and do the same with ‘aplay’, passing the ‘output-raw’ data from ‘piper’ ‘stdout’ to ‘aplay’ ‘stdin’, in the hope this would speed everything up.

Does this sound sensible or can you suggest another approach?

Thanks again for your reply.

Ernie

1 reply

synesthesiam Jul 19, 2023
Maintainer Author

Another approach is to use Piper as a subprocess (Popen) and do this for each phrase:

Write the phrase to the process stdin with a newline + flush()
Read a line from the process stdout to get the path to the WAV file

I usually use the tempfile module to create a TemporaryDirectory and set that as the --output_dir for Piper. After reading each output WAV, I delete it.

Here is some sample code: https://github.com/rhasspy/rhasspy3/blob/master/programs/tts/piper/bin/piper_server.py

prevoste · 2023-07-28T13:36:38Z

prevoste
Jul 28, 2023

Hi Michael,

Thanks again for all the useful information you given me.

I have not used socket makefile() before. I usually just use send() / recv().

I assume the client program basically just sends the text to be spoken over to your server program. Do you have an example client program so I can see the format of the data sent?

Just to understand your approach, you create a subprocess using popen() to run ‘piper’ which takes the text to be spoken as stdin and creates a WAV file in a temporary directory via stdout.

How is the WAV file then played?

Thanks again for your help.

Ernie

1 reply

synesthesiam Jul 29, 2023
Maintainer Author

Just call aplay as a subprocess with the path to the WAV file.

MetaMachina · 2024-04-14T17:14:49Z

MetaMachina
Apr 14, 2024

Hi Michael ( @synesthesiam )

I would like to request a new feature similar to SpeakProgressEventArgs.Text and SpeakProgressEventArgs.AudioPosition properties in Microsoft's System.Speech.Synthesis namespace.

For example, the sentence "Struggles and challenges are integral to growth and resilience" has 9 words in it.
Now, in addition to a .wav file, I want Piper to also generate a .csv file that contains timestamps for each spoken word. At minimum, the .csv can have only two columns Text and AudioPosition.

Text: It will be the actual spoken word
AudioPosition: It will be the position in the audio output stream where the word ends. Value can be in milliseconds etc.

Assuming the .wav produced has total duration of 1984ms, then the .csv file's data will look somewhat like this:

Struggles,320
and,448
challenges,800
are,928
integral,1216
to,1312
growth,1536
and,1664
resilience,1984

Thanks.

3 replies

MetaMachina Apr 22, 2024

@synesthesiam, I'm writing to see if you've had a chance to look into the problem I mentioned earlier. I'm curious to know if it is even feasible to implement the feature in Piper? Could you please advise if there are any potential workarounds? Appreciate your help on it. Thanks.

synesthesiam Apr 22, 2024
Maintainer Author

This is definitely feasible to do in Piper: #425
It will require re-exporting all of the voices and adjusting the API first. Once that's done, the piper command can be expanded to support writing out the phonemes + timestamps to a file.

MetaMachina Apr 30, 2024

@synesthesiam, is it possible to share an approximate time-frame as to when the next release of Piper, with the capability to export phonemes and their timestamps to a file, will be published? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to Piper Discussions! #136

{{title}}

Replies: 6 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Welcome to Piper Discussions! #136

synesthesiam Jul 11, 2023 Maintainer

👋 Welcome!

Replies: 6 comments · 6 replies

synesthesiam Jul 17, 2023 Maintainer Author

synesthesiam Jul 19, 2023 Maintainer Author

synesthesiam Jul 29, 2023 Maintainer Author

synesthesiam Apr 22, 2024 Maintainer Author

synesthesiam
Jul 11, 2023
Maintainer

Replies: 6 comments 6 replies

synesthesiam Jul 17, 2023
Maintainer Author

synesthesiam Jul 19, 2023
Maintainer Author

synesthesiam Jul 29, 2023
Maintainer Author

synesthesiam Apr 22, 2024
Maintainer Author