New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice navigation #49

Merged
merged 80 commits into from Dec 17, 2017

Conversation

3 participants
@rinigus
Contributor

rinigus commented Aug 14, 2017

This PR adds voice navigation using an installed external TTS program. At present, mimic, flite, picotts, and espeak are supported. Description of this PR has been given in #30. However, for completeness, I am repeating it below. When compared to earlier description, support for OSRM and Mapquest Open was added.

Background

With the voice commands, we are having several restrictions. Namely, voice commands are not available in all languages and, it is probably common, user may want to specify the voice command language. Second, the voice synthesis availability is also rather poor. There are several options, more about it below. Third, when using better quality voices, synthesis may take some time. The developed code is able to handle these limitations, as much as I could.

Text to speech

After some search, I think we have three packages providing TTS in SFOS: mimic (based on flite), picotts, and espeak. Out of these, mimic supports English only, picotts has few other languages (de, es, fr, and it), and espeak has more. Quality of espeak is, though, rather poor. All packages are available at OpenRepos (mimic and picotts I have uploaded myself from OBS).

To handle these options, I made a class VoiceEngineBase that is later used as a base class by voice engines: VoiceEngineMimic, VoiceEngineFlite (supports flite if you prefer), VoiceEnginePicoTTS, and VoiceEngineEspeak. These voice engines are used by VoiceCommand that handles

  • selection of the voice engine according to given language by searching an list of engines sorted by quality (subjective)
  • getting requests for new voice commands (prompts) and asking voice engine to make it asynchronously
  • giving a file with the voice corresponding to the command
  • handling a cache of voice commands

Voice prompts

Voice prompts are given through Narrative and its Maneuver. Audio is handled through QML Audio by giving the audio file. When compared to earlier Maneuver, we have now to store and play verbal instructions. In Valhalla, its alert (like 200-300 meters before), pre-maneuver (turn left now), and post-maneuver (continue for 1 km). So, the routing engine (Valhalla and others) would have to provide these data. If only narrative is provided, its going to be used for alert and pre-maneuver prompt.

When compared to earlier Narrative, the proposed version has to define current_maneuver (to know when to play post-maneuver), interface with VoiceCommand, and remember which prompts have been voiced already (at present, the corresponding prompt is just deleted from a current maneuver copy after the prompts has been voiced). To keep the track of maneuvers, Narrative has begin and end methods called at the start and end of navigation by QML Map.

To support slower TTS (like mimic's ap voice), I made the voices to be synthesized in advance. At present, 3 maneuvers ahead. This should also allow us to extend to online synthesis, if we wish.

To ensure that we don't miss the maneuver, I reduced the period of narration timer to one second. At present, the preferred voice gender (when possible, only female voices in picotts) is specified in Preferences. Maybe we can move it to the new Navigation page.

Overall preference on whether to enable voice commands is stored in config.py/voice_commands, but doesn't have GUI, as you suggested.

rinigus added some commits Aug 4, 2017

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Nov 27, 2017

Owner

Could you explain the reasoning for the pre-maneuver prompt times? Why does it vary? Why is it shown twice if there's plenty of time – is it common practice in navigators? (I'm not really familiar with navigators or voice navigation.)

https://github.com/rinigus/poor-maps/blob/bba6774db2c7935ca398d9cabe7175d04d74e122/poor/narrative.py#L83-L92

Owner

otsaloma commented Nov 27, 2017

Could you explain the reasoning for the pre-maneuver prompt times? Why does it vary? Why is it shown twice if there's plenty of time – is it common practice in navigators? (I'm not really familiar with navigators or voice navigation.)

https://github.com/rinigus/poor-maps/blob/bba6774db2c7935ca398d9cabe7175d04d74e122/poor/narrative.py#L83-L92

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Nov 27, 2017

Owner

Now that I have thought a bit more, I understand the double prompt after a long drive, but still why the 20 vs. 30 vs. 35 seconds? And another question: the minimum leg duration of 60 seconds, below which to ignore prompts, seems a bit high. Is that to leave room for a previous post-maneuver prompt?

Owner

otsaloma commented Nov 27, 2017

Now that I have thought a bit more, I understand the double prompt after a long drive, but still why the 20 vs. 30 vs. 35 seconds? And another question: the minimum leg duration of 60 seconds, below which to ignore prompts, seems a bit high. Is that to leave room for a previous post-maneuver prompt?

@rinigus

This comment has been minimized.

Show comment
Hide comment
@rinigus

rinigus Nov 28, 2017

Contributor

In general, I am sure that the timings can be tuned further. Sometimes they seem to be a bit on a short notice side. Its all about psychology on when to notify, when to assume that the driver needs double notification (as when we have been driving longer stretch without maneuvers), when to keep it not too frequent. But I am sure that if we manage to collect the feedback from others, its possible to make it decently well.

20/30/35 was adjusted as a compromise. However, its possible that I overdid it since I was tuning at least twice the timings - once when tuning by distance and later (current implementation) by time. 20 seems to be on the short side, but at least it gives a prompt. 60 sec cutoff was to not to be too intrusive, but maybe it should be reduced to 45 sec or so. You need the space for post-maneuver prompt as well, indeed.

One issue that is still there is for example when you are on the roundabout. It keeps giving the post-maneuvers that are interrupted by the next maneuver for example. But I consider it an issue with Valhalla actually - its a case where maybe we shouldn't have post-maneuver prompt given by it. I would suggest to address it a bit later.

Contributor

rinigus commented Nov 28, 2017

In general, I am sure that the timings can be tuned further. Sometimes they seem to be a bit on a short notice side. Its all about psychology on when to notify, when to assume that the driver needs double notification (as when we have been driving longer stretch without maneuvers), when to keep it not too frequent. But I am sure that if we manage to collect the feedback from others, its possible to make it decently well.

20/30/35 was adjusted as a compromise. However, its possible that I overdid it since I was tuning at least twice the timings - once when tuning by distance and later (current implementation) by time. 20 seems to be on the short side, but at least it gives a prompt. 60 sec cutoff was to not to be too intrusive, but maybe it should be reduced to 45 sec or so. You need the space for post-maneuver prompt as well, indeed.

One issue that is still there is for example when you are on the roundabout. It keeps giving the post-maneuvers that are interrupted by the next maneuver for example. But I consider it an issue with Valhalla actually - its a case where maybe we shouldn't have post-maneuver prompt given by it. I would suggest to address it a bit later.

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Nov 28, 2017

Owner

That overlap handling somewhat bothers me. Not just roundabouts, but I expect one block legs in a city. I find the approach of maneuver-relative times and separate attributes for the different prompts difficult in that sense. I'm tempted to rewrite the different prompts of all maneuvers into a single list, something like

[
    VoicePrompt(dist=100,
                time=60,
                text="In 200 meters, turn left",
                priority=2,
                voice_generated=False,
                passed=False),

    VoicePrompt(dist=350,
                time=70,
                text="Turn left",
                priority=3,
                voice_generated=False,
                passed=False),
                
    VoicePrompt(dist=450,
                time=80,
                text="Continue for 1 kilometer",
                priority=1,
                voice_generated=False,
                passed=False),

    ...
]

After initial population, overlaps would be removed, comparing times, keeping the one with the highest priority. And generation and playback would be simply based on the distance to destination (or from origin), which is already calculated by the existing functions.

Do you see any obvious problem here? If not, I'll do it now rather than later.

As for progress, the rest of the package code is done, routers and QML are left, but they look fine, there's not much to do there. I haven't pushed any commits yet, since some partial changes mean some stuff is broken, I'll push the code before I begin testing.

Owner

otsaloma commented Nov 28, 2017

That overlap handling somewhat bothers me. Not just roundabouts, but I expect one block legs in a city. I find the approach of maneuver-relative times and separate attributes for the different prompts difficult in that sense. I'm tempted to rewrite the different prompts of all maneuvers into a single list, something like

[
    VoicePrompt(dist=100,
                time=60,
                text="In 200 meters, turn left",
                priority=2,
                voice_generated=False,
                passed=False),

    VoicePrompt(dist=350,
                time=70,
                text="Turn left",
                priority=3,
                voice_generated=False,
                passed=False),
                
    VoicePrompt(dist=450,
                time=80,
                text="Continue for 1 kilometer",
                priority=1,
                voice_generated=False,
                passed=False),

    ...
]

After initial population, overlaps would be removed, comparing times, keeping the one with the highest priority. And generation and playback would be simply based on the distance to destination (or from origin), which is already calculated by the existing functions.

Do you see any obvious problem here? If not, I'll do it now rather than later.

As for progress, the rest of the package code is done, routers and QML are left, but they look fine, there's not much to do there. I haven't pushed any commits yet, since some partial changes mean some stuff is broken, I'll push the code before I begin testing.

@rinigus

This comment has been minimized.

Show comment
Hide comment
@rinigus

rinigus Nov 29, 2017

Contributor

Its OK that it bothers - that keeps Poor Maps standards high :)

Just a comment regarding overlap effect before other points. As a result of the overlap, at present, the voice prompt is stopped in the middle and a new one is voiced. The two prompts never speak at the same time, just one is interrupted. Now, back to the main topic

Indeed, having a global list would allow to get rid of the overlap. However, there are points to think about:

  • I would suggest to target playback by time to the maneuver, not distance. Its much better experience when the prompts come as determined by time (global time on the route?). Technically it can still be done via global distance since we know the average speed of the leg, but see below.

  • Ideally, we should take into account the current speed and not the average leg speed. This would require additional data to be provided to the calling routine, but maybe its worth it.

  • We should avoid synthesis of all voice prompts in advance. Long roads could have plenty of them and there is a bigger chance that it will be rerouted. It would take significant CPU resources to do it, which due to rerouting could be wasted anyway. As a result, I would suggest to keep a global "window" of synthesized prompts, similar to what's done now, except that list would be global.

  • To properly check for overlaps, we would need to know the duration of the prompt. Alternative, is to do a solution that would work most of the time and then just demand minimal amount of time between prompts. Such alternative solution would allow to generate the list for the whole road and drop prompts by their priority, as you suggested.

In sum, I think its a right direction and will work. Just few points to consider.

Contributor

rinigus commented Nov 29, 2017

Its OK that it bothers - that keeps Poor Maps standards high :)

Just a comment regarding overlap effect before other points. As a result of the overlap, at present, the voice prompt is stopped in the middle and a new one is voiced. The two prompts never speak at the same time, just one is interrupted. Now, back to the main topic

Indeed, having a global list would allow to get rid of the overlap. However, there are points to think about:

  • I would suggest to target playback by time to the maneuver, not distance. Its much better experience when the prompts come as determined by time (global time on the route?). Technically it can still be done via global distance since we know the average speed of the leg, but see below.

  • Ideally, we should take into account the current speed and not the average leg speed. This would require additional data to be provided to the calling routine, but maybe its worth it.

  • We should avoid synthesis of all voice prompts in advance. Long roads could have plenty of them and there is a bigger chance that it will be rerouted. It would take significant CPU resources to do it, which due to rerouting could be wasted anyway. As a result, I would suggest to keep a global "window" of synthesized prompts, similar to what's done now, except that list would be global.

  • To properly check for overlaps, we would need to know the duration of the prompt. Alternative, is to do a solution that would work most of the time and then just demand minimal amount of time between prompts. Such alternative solution would allow to generate the list for the whole road and drop prompts by their priority, as you suggested.

In sum, I think its a right direction and will work. Just few points to consider.

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Nov 29, 2017

Owner

Ideally, we should take into account the current speed and not the average leg speed.

I have thought about this too, but I figured that the current or past speed is not necessarily a good predictor for speed on the rest of the leg, especially in the city with traffic lights, varying traffic conditions, etc. So, I think I'll skip that.

To properly check for overlaps, we would need to know the duration of the prompt.

I expect using an average characters per second value with a bit of extra added would work well enough, but we'll see.

Owner

otsaloma commented Nov 29, 2017

Ideally, we should take into account the current speed and not the average leg speed.

I have thought about this too, but I figured that the current or past speed is not necessarily a good predictor for speed on the rest of the leg, especially in the city with traffic lights, varying traffic conditions, etc. So, I think I'll skip that.

To properly check for overlaps, we would need to know the duration of the prompt.

I expect using an average characters per second value with a bit of extra added would work well enough, but we'll see.

@rinigus

This comment has been minimized.

Show comment
Hide comment
@rinigus

rinigus Nov 29, 2017

Contributor

Re speed: yes, let's keep that and see if we need something more precise later

Re chars: it could work quite well as a first approximation. Let's see it in action. Also, maybe just requirement of minimal time between prompts may work almost as well.

What about the case where we have only with the same priority prompts left (in case of frequent maneuvers)? Maybe, if the priority is high, we should make an exception then and keep them all and hope for the best. This would keep driver alert and not just having the navigator assistant silently hoping that you can constantly glance on the screen

Contributor

rinigus commented Nov 29, 2017

Re speed: yes, let's keep that and see if we need something more precise later

Re chars: it could work quite well as a first approximation. Let's see it in action. Also, maybe just requirement of minimal time between prompts may work almost as well.

What about the case where we have only with the same priority prompts left (in case of frequent maneuvers)? Maybe, if the priority is high, we should make an exception then and keep them all and hope for the best. This would keep driver alert and not just having the navigator assistant silently hoping that you can constantly glance on the screen

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Nov 29, 2017

Owner

Yes, I agree, the highest priority can be all kept.

Owner

otsaloma commented Nov 29, 2017

Yes, I agree, the highest priority can be all kept.

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Dec 13, 2017

Owner

Took a while, but I've now done the adaptation – mainly revised verbal prompts and overlap removal, using standard locales and fallbacks without hard-coded mappings and also small fixes and improvements such as saving maneuvers between sessions and rerouting voice prompts etc. Hope I didn't break anything. The commits are not very atomic, sorry about that, it's a result of file-by-file work and fixing all errors found in testing in one go.

I've done some desktop testing with the fake position source, it seems to work OK. I'll do more testing, but since I don't have a car, it'll be just desktop testing and actual field testing is appreciated to see if the parameters need tweaking. The most relevant parameters are:

And, @rinigus, do you actually use the voice navigation? When testing with Mimic, I found the pronounciation of Finnish road names to be horribly disturbing, at times it even seems to fall back on just enumerating the letters. It can't be much better in Estonian, or is it?

Owner

otsaloma commented Dec 13, 2017

Took a while, but I've now done the adaptation – mainly revised verbal prompts and overlap removal, using standard locales and fallbacks without hard-coded mappings and also small fixes and improvements such as saving maneuvers between sessions and rerouting voice prompts etc. Hope I didn't break anything. The commits are not very atomic, sorry about that, it's a result of file-by-file work and fixing all errors found in testing in one go.

I've done some desktop testing with the fake position source, it seems to work OK. I'll do more testing, but since I don't have a car, it'll be just desktop testing and actual field testing is appreciated to see if the parameters need tweaking. The most relevant parameters are:

And, @rinigus, do you actually use the voice navigation? When testing with Mimic, I found the pronounciation of Finnish road names to be horribly disturbing, at times it even seems to fall back on just enumerating the letters. It can't be much better in Estonian, or is it?

@rinigus

This comment has been minimized.

Show comment
Hide comment
@rinigus

rinigus Dec 14, 2017

Contributor

Thank you very much! Its a huge change and I will read (and learn) the new code with pleasure. I hope you were not too frustrated with the original PR ...

As for use, yes, I do use it every time I am using navigation. A bit more in the summer than now, but still I do it every time I need to go out of the routine places. When you drive its of great help, even with these issues of streets names. I think picotts was doing a better job with the streets, but mimic has much better voice quality, so I made it the default. In addition, mimic/flite are developed in proper open source fashion, while picotts is non-developed anymore source code dump from Android.

State of TTS is far from ideal on Linux (and SFOS, as a result). However, by making apps using it we'll at least expose the problems and maybe someone (or us) get interest in fixing them.

The original PR was developed to have ability to plug in online TTS engines, if we wish. I'll have to see if this is still the case (whether we can pre-synthesise well in advance).

I am sad to see English Pirate go. I think it was very appropriate for Sailfish OS with all its "sailors". Its quite well developed by Valhalla and adds some personality to the navigation software. I wonder if its possible to restore it in Poor Maps?

I'll test the new version today and will report back.

Contributor

rinigus commented Dec 14, 2017

Thank you very much! Its a huge change and I will read (and learn) the new code with pleasure. I hope you were not too frustrated with the original PR ...

As for use, yes, I do use it every time I am using navigation. A bit more in the summer than now, but still I do it every time I need to go out of the routine places. When you drive its of great help, even with these issues of streets names. I think picotts was doing a better job with the streets, but mimic has much better voice quality, so I made it the default. In addition, mimic/flite are developed in proper open source fashion, while picotts is non-developed anymore source code dump from Android.

State of TTS is far from ideal on Linux (and SFOS, as a result). However, by making apps using it we'll at least expose the problems and maybe someone (or us) get interest in fixing them.

The original PR was developed to have ability to plug in online TTS engines, if we wish. I'll have to see if this is still the case (whether we can pre-synthesise well in advance).

I am sad to see English Pirate go. I think it was very appropriate for Sailfish OS with all its "sailors". Its quite well developed by Valhalla and adds some personality to the navigation software. I wonder if its possible to restore it in Poor Maps?

I'll test the new version today and will report back.

@rinigus

This comment has been minimized.

Show comment
Hide comment
@rinigus

rinigus Dec 14, 2017

Contributor

From testing: I think it works well. I tested it few times today (had to drive around a bit) and was prompted to make maneuvers in time. When I could, I checked the distance displayed on Poor Maps and compared it with the prompt text.

Few times, it seemed that the prompt was said 100 meters early, like at 400 meters I was told that its 300 to go. I'll read the code and look at the parameters that you listed earlier.

Couple of times it happened that post-maneuver text was interrupted by the next one. However, I didn't notice it on roundabouts which was an issue earlier. Although, more testing will be needed regarding it. (Please see the note below, though)

One question, before I forget: do we still have ability to have multiple prompts before maneuver? If the leg was really long.

Note that while I drive, I don't need perfect solution, but the one that helps. These prompts are very handy and I don't mind if they interrupt and tell the next, more relevant prompt. Its all to help the driver and keep the eyes on the road, not map, when we can. Right now, it already works well with prompts and occasional glancing over the map. :)

Off to read the new code...

Contributor

rinigus commented Dec 14, 2017

From testing: I think it works well. I tested it few times today (had to drive around a bit) and was prompted to make maneuvers in time. When I could, I checked the distance displayed on Poor Maps and compared it with the prompt text.

Few times, it seemed that the prompt was said 100 meters early, like at 400 meters I was told that its 300 to go. I'll read the code and look at the parameters that you listed earlier.

Couple of times it happened that post-maneuver text was interrupted by the next one. However, I didn't notice it on roundabouts which was an issue earlier. Although, more testing will be needed regarding it. (Please see the note below, though)

One question, before I forget: do we still have ability to have multiple prompts before maneuver? If the leg was really long.

Note that while I drive, I don't need perfect solution, but the one that helps. These prompts are very handy and I don't mind if they interrupt and tell the next, more relevant prompt. Its all to help the driver and keep the eyes on the road, not map, when we can. Right now, it already works well with prompts and occasional glancing over the map. :)

Off to read the new code...

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Dec 14, 2017

Owner

Do you think it would make sense to add transformations for non-English letters? Using English TTS voice, the result for Finnish street names sounds much better at least with Mimic and Flite if I replace "ä" with "ae" etc. – try e.g. "Jämeräntaival" vs. "Jaemeraentaival"?

I am sad to see English Pirate go.

I understand, but I don't want to include something that hardly anyone uses more than once, if it takes up space on a small screen and needs special casing in code.

One question, before I forget: do we still have ability to have multiple prompts before maneuver? If the leg was really long.

Yes: https://github.com/rinigus/poor-maps/blob/voice/poor/narrative.py#L533

Owner

otsaloma commented Dec 14, 2017

Do you think it would make sense to add transformations for non-English letters? Using English TTS voice, the result for Finnish street names sounds much better at least with Mimic and Flite if I replace "ä" with "ae" etc. – try e.g. "Jämeräntaival" vs. "Jaemeraentaival"?

I am sad to see English Pirate go.

I understand, but I don't want to include something that hardly anyone uses more than once, if it takes up space on a small screen and needs special casing in code.

One question, before I forget: do we still have ability to have multiple prompts before maneuver? If the leg was really long.

Yes: https://github.com/rinigus/poor-maps/blob/voice/poor/narrative.py#L533

@rinigus

This comment has been minimized.

Show comment
Hide comment
@rinigus

rinigus Dec 14, 2017

Contributor

Do you think it would make sense to add transformations for non-English letters? Using English TTS voice, the result for Finnish street names sounds much better at least with Mimic and Flite if I replace "ä" with "ae" etc. – try e.g. "Jämeräntaival" vs. "Jaemeraentaival"?

I would prefer to avoid it. In reality, its TTS issue which has to be solved properly by mimic - Valhalla interaction. Just imagine transliteration of some languages we are not aware about - I have no clue how it will work with Russian or Chinese, for example. I would prefer to feed it what we get from Valhalla and hope for the best.

Proper solutions in Linux are tricky - we either need someone to sell us a good TTS, use some online service, or maybe we can get Android TTS layer exposed in SFOS. I had written in a thread at TJC regarding state of TTS, but there was not much happening there. Maybe when navigation with spoken instructions will show up, things will become more interesting for everyone.

I understand, but I don't want to include something that hardly anyone uses more than once, if it takes up space on a small screen and needs special casing in code.

I am surely biased, but I actually used it more than regular English. Its somewhat amusing and surely brings you out of routine. No idea how much others were using it, though. [And here I went against my own policy of not fixing TTS by spelling out few words. But that was due to the fact that pirate-spoken TTS is a wayy too far from SFOS]. I do understand your reasoning as well.

Have to stop here and get some sleep. I'll continue tomorrow.

Contributor

rinigus commented Dec 14, 2017

Do you think it would make sense to add transformations for non-English letters? Using English TTS voice, the result for Finnish street names sounds much better at least with Mimic and Flite if I replace "ä" with "ae" etc. – try e.g. "Jämeräntaival" vs. "Jaemeraentaival"?

I would prefer to avoid it. In reality, its TTS issue which has to be solved properly by mimic - Valhalla interaction. Just imagine transliteration of some languages we are not aware about - I have no clue how it will work with Russian or Chinese, for example. I would prefer to feed it what we get from Valhalla and hope for the best.

Proper solutions in Linux are tricky - we either need someone to sell us a good TTS, use some online service, or maybe we can get Android TTS layer exposed in SFOS. I had written in a thread at TJC regarding state of TTS, but there was not much happening there. Maybe when navigation with spoken instructions will show up, things will become more interesting for everyone.

I understand, but I don't want to include something that hardly anyone uses more than once, if it takes up space on a small screen and needs special casing in code.

I am surely biased, but I actually used it more than regular English. Its somewhat amusing and surely brings you out of routine. No idea how much others were using it, though. [And here I went against my own policy of not fixing TTS by spelling out few words. But that was due to the fact that pirate-spoken TTS is a wayy too far from SFOS]. I do understand your reasoning as well.

Have to stop here and get some sleep. I'll continue tomorrow.

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Dec 14, 2017

Owner

I would prefer to avoid it. In reality, its TTS issue which has to be solved properly by mimic - Valhalla interaction. Just imagine transliteration of some languages we are not aware about - I have no clue how it will work with Russian or Chinese, for example.

Yes, I figured there might be generality issues, and Wikipedia seems to agree, though Russian and Chinese surely don't use "ä"!

I'm still tempted to special case Finnish as there are a lot of users here, no TTS engine and those characters seem to trigger letter enumeration, which is quite bad, e.g. "taival" part of "Jämeräntaival". I'll give it some more thought.

locale = poor.util.get_default_locale()
if locale.startswith("fi") and self.language.startswith("en"):
    self.text = self.text.replace("ä", "ae")
    self.text = self.text.replace("ö", "oe")
    self.text = self.text.replace("å", "aa")

Edit 1: Default locale does not of course necessarily match the language of the street names, so it's far from perfect.

Edit 2: Or maybe I should just file a bug report against Mimic on this?

Owner

otsaloma commented Dec 14, 2017

I would prefer to avoid it. In reality, its TTS issue which has to be solved properly by mimic - Valhalla interaction. Just imagine transliteration of some languages we are not aware about - I have no clue how it will work with Russian or Chinese, for example.

Yes, I figured there might be generality issues, and Wikipedia seems to agree, though Russian and Chinese surely don't use "ä"!

I'm still tempted to special case Finnish as there are a lot of users here, no TTS engine and those characters seem to trigger letter enumeration, which is quite bad, e.g. "taival" part of "Jämeräntaival". I'll give it some more thought.

locale = poor.util.get_default_locale()
if locale.startswith("fi") and self.language.startswith("en"):
    self.text = self.text.replace("ä", "ae")
    self.text = self.text.replace("ö", "oe")
    self.text = self.text.replace("å", "aa")

Edit 1: Default locale does not of course necessarily match the language of the street names, so it's far from perfect.

Edit 2: Or maybe I should just file a bug report against Mimic on this?

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Dec 14, 2017

Owner

This is what I mean:

Owner

otsaloma commented Dec 14, 2017

This is what I mean:

@rinigus

This comment has been minimized.

Show comment
Hide comment
@rinigus

rinigus Dec 15, 2017

Contributor

If we think it through, its quickly becoming unrealistic and maybe some hacky way is OK. Ideally

  • Valhalla should convey the language info in the prompt. Something like
    <en>Turn right into</en><fi>Jämeräntaival</fi><en>street</en>. This is probably doable if we fill the issue in Valhalla and work on/test it. Street name language should be known by Valhalla and it should be possible to insert it into the resulting string, if requested. There is a standard that allows it do properly, see https://www.w3.org/TR/speech-synthesis11/#AppB

  • Our multilingual TTS engine should be able to parse such string and generate the WAV file. And here it is quickly becoming a pipe dream. We have only one language TTS of decent quality (maybe add few more if we count picoTTS), but far from covering possible requirements. I think unless we get such engine that is able to process multilingual input, there is no big point in bugging Valhalla. However, it looks like mimic is expanding to support more languages. See repos available under https://github.com/MycroftAI?utf8=%E2%9C%93&q=mimic&type=&language= . I will have to look into it at some point and see whether I can release an updated version

Maybe, for now, we can just replace ä, ö and few other chars for any language (Estonian would benefit from it too) and see later how to progress from there.

Contributor

rinigus commented Dec 15, 2017

If we think it through, its quickly becoming unrealistic and maybe some hacky way is OK. Ideally

  • Valhalla should convey the language info in the prompt. Something like
    <en>Turn right into</en><fi>Jämeräntaival</fi><en>street</en>. This is probably doable if we fill the issue in Valhalla and work on/test it. Street name language should be known by Valhalla and it should be possible to insert it into the resulting string, if requested. There is a standard that allows it do properly, see https://www.w3.org/TR/speech-synthesis11/#AppB

  • Our multilingual TTS engine should be able to parse such string and generate the WAV file. And here it is quickly becoming a pipe dream. We have only one language TTS of decent quality (maybe add few more if we count picoTTS), but far from covering possible requirements. I think unless we get such engine that is able to process multilingual input, there is no big point in bugging Valhalla. However, it looks like mimic is expanding to support more languages. See repos available under https://github.com/MycroftAI?utf8=%E2%9C%93&q=mimic&type=&language= . I will have to look into it at some point and see whether I can release an updated version

Maybe, for now, we can just replace ä, ö and few other chars for any language (Estonian would benefit from it too) and see later how to progress from there.

@rinigus

This comment has been minimized.

Show comment
Hide comment
@rinigus

rinigus Dec 15, 2017

Contributor

From reading the code - it looks nice and tidy, as expected, thank you very much! Reorganization and rewriting surely helped. I enjoyed reading it and I hope I didn't miss anything important.

Maybe we should cut a bit the distance at which we stop giving voice directions. Right now its 200 meters, maybe 100 meters are OK, as for rerouting. Right now, when I change the route, I sometimes get the instruction from the old route. This would reduce such event probability.

On my part, its ready :)

Contributor

rinigus commented Dec 15, 2017

From reading the code - it looks nice and tidy, as expected, thank you very much! Reorganization and rewriting surely helped. I enjoyed reading it and I hope I didn't miss anything important.

Maybe we should cut a bit the distance at which we stop giving voice directions. Right now its 200 meters, maybe 100 meters are OK, as for rerouting. Right now, when I change the route, I sometimes get the instruction from the old route. This would reduce such event probability.

On my part, its ready :)

@otsaloma

This comment has been minimized.

Show comment
Hide comment
@otsaloma

otsaloma Dec 16, 2017

Owner

Maybe, for now, we can just replace ä, ö and few other chars for any language (Estonian would benefit from it too) and see later how to progress from there.

OK, I'm really not looking for more than small hack to work around the issue.

Maybe we should cut a bit the distance at which we stop giving voice directions. Right now its 200 meters, maybe 100 meters are OK, as for rerouting.

Sure. Thanks for testing, I'll make these remaining changes soon, then merge and do a couple other fixes before making a release.

Owner

otsaloma commented Dec 16, 2017

Maybe, for now, we can just replace ä, ö and few other chars for any language (Estonian would benefit from it too) and see later how to progress from there.

OK, I'm really not looking for more than small hack to work around the issue.

Maybe we should cut a bit the distance at which we stop giving voice directions. Right now its 200 meters, maybe 100 meters are OK, as for rerouting.

Sure. Thanks for testing, I'll make these remaining changes soon, then merge and do a couple other fixes before making a release.

@otsaloma otsaloma changed the title from [WIP] Voice navigation to Voice navigation Dec 17, 2017

@otsaloma otsaloma merged commit 7791993 into otsaloma:master Dec 17, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@otsaloma otsaloma referenced this pull request Dec 17, 2017

Closed

Add voice navigation #30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment