Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localize opsdroid #422

Open
anxodio opened this issue Jan 24, 2018 · 11 comments
Open

Localize opsdroid #422

anxodio opened this issue Jan 24, 2018 · 11 comments

Comments

@anxodio
Copy link
Contributor

anxodio commented Jan 24, 2018

Opsdroid is designed with English in mind. A developer can write his skills (regex and responses) in any other language, but actually it has some complications:

  • Some parts of the core code assume English (see this or this examples)
  • As a developer you could want have your skill ready for multiple languages:
    • But you can only have one matcher, and write a regexpr ready for various languages it's hard and difficult to mantain
    • And there is no system to translate the answer strings, so it has to be implemented in every skill that need it

So let's discuss what we need to do to go multilanguage (i already used some @jacobtomlinson ideas)

Breaking changes

The idea of the multi language design is to don't break anything. So if you don't change the language, Opsdroid has to assume english and work as normally.

Configuration

We need a new language configuration item, that can be filled with any ISO 693-1 two-letter code. By default it will be en.

Core localization

We need to use some Python translation library. In my opinion, we should use the gettext library, it is extended, easy to use, and in standard libraries. Here is a post explaining the usage.

So the rule should be mark as translatable every string that can be read by the end user through the chat (like the Whoops error).

Parser/Matcher localization

Now we have different behaviours:

  • For always or crontab there is no problem.
  • For regex we should add some extra decorators with the locale code at the end (for example @match_regex_es). The basic decordator will be mandatory, and it will point always to english. And if the configurated language is not defined by a decorator, it will default to english. For example, for the skill-hello, it can be like:
@match_regex(r'hi|hello|hey|hallo')
@match_regex_es(r'hola|buenas|saludos')
async def hello(opsdroid, config, message):
    [...]
  • recastai or dialogflow requires a language code to be sent in every request. So my proposal is to define a list of accepted languages (defined by the service) and check if the configurated language is in it. If the configurated language it's not suported by the service, fallback to english.
  • luisai or witai don't recieve a language code in the request. Instead, you need to configure the language as a configuration item on the service. So here, if you need various languages, you need various endpoints. So we need to add more configuration items, like:
parsers:
    - name: witai
      access-token: XJF475SKGITJ98KHFO
      min-score: 0.6
    - name: witai
      lang: es
      access-token: XJF475JD8DJ2J98F2DX
      min-score: 0.7
    - name: luisai
      appid: "<application-id>"
      appkey: "<subscription-key>"
      verbose: True
      min-score: 0.6
    - name: luisai
      appid: "<application-id>"
      appkey: "<subscription-key>"
      lang: es
      min-score: 0.7
  • rasanlu works locally and depends on us. So the point here is to translate the intents files. By defualt, we have intents.md pointing to english. We can also add a lang code at the end to translate it, like intents_es.md. But the languages that rasa supports are limited, so probably we need to define a list and check it (like in recastai or dialogflow)

Skill answer localization

I think we sould bet for gettext too. We need to implement an interface thinking in the skills too, and let every skill module to has his own translations.
Another time, if it's not translated or gettext methods are not used, it will not break and default to english.

@anxodio anxodio changed the title Lozalize opsdroid Localize opsdroid Jan 24, 2018
@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jan 24, 2018

This is a great idea and I basically agree with everything you have said. Here are a few additional thoughts from me:

  • All logging should be translated into the language specified in the config.
  • We need to be consistent with the word we use for translation configuration, you use langauge and lang in your comments. My vote would be for lang.
  • I'm apprehensive to duplicate matchers to add language features. I would rather add kwargs to the existing matchers. e,g
@match_regex(r'hi|hello|hey|hallo')
@match_regex(r'hola|buenas|saludos', lang='es')
async def hello(opsdroid, config, message):
    [...]
  • We need to decide what happens if the parsers do not support your language. Do we fall back to english or disable the parser and log an error?
  • We could optionally take a dictionary in the Message.respond() method too which would allow multilingual responses. E.g
@match_regex(r'hi|hello|hey|hallo')
@match_regex(r'hola|buenas|saludos', lang='es')
async def hello(opsdroid, config, message):
    message.respond({"en": "Hi", "es": "Hola"})

@anxodio
Copy link
Contributor Author

anxodio commented Jan 24, 2018

  • In my opinion, logging translation is not as important as answer translation, because logging is read by developers/system administrators and normally they speak English. But I think it's a nice plus, so I agree with the idea of add the mechanism to translate them and do it with core logs.
  • Totally agree with the use of lang in configuration.
  • I agree with your @match_regex design. I was thinking about use introspection to avoid code duplication, but definitely the solution you propose is simpler.
  • About the decision of what happens if the parsers do not support your language... it's hard to define, because all options have pros and cons. Probably I will vote to fallback to English and log a warning, and let the developer decide if he wants to change something or disable it.
  • I prefer gettext than a dictionary in message respond. I think that have the translated strings in .pot files simplifies the work for translators. A person that don't know anything about developing can translate strings with some software like Poedit, but if we use the dictionary solution minimal knowledge of development will be required. Also I think that maintaining strings in different files included in the skill module it's easier than maintaining the dictionaries along the code.

@jacobtomlinson
Copy link
Member

  • As a native English speaker it's always hard for me to know what people expect to be translated and what is acceptable in English. Let's aim to translate the logs though as it would be nice.
  • Falling back to English is probably best, but it needs to be clear in the logs that this has happened.
  • I've not used gettext before so need to read up on it. But happy to take your advice.
    • Would you imagine that skill developers would include a .pot file along with their skill in the same way Rasa NLU skills include a intends.md file?
    • Would the skill developer need to import gettext and configure where the .pot file is or can we do that for them in the core module?
    • To simplify skill development perhaps we should provide some helper functions that wrap gettext?

@FabioRosado
Copy link
Member

I like the idea of allowing other people that don't speak english very well to use opsdroid.
I also like the idea of using a language param on the @match_regex it should make things a bit easier to implement.

English should be the fallback language if the parser doesn't support the chosen language, this way everything will work fine - I agree with the warning if the language is not supported.

I never played around with the gettext so I'm just going to ask: Will the fact that we mostly use the .format() in string formatting create some sort of issue?

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jan 24, 2018

It shouldn't cause any problems. The translated string should contain the same placeholder.

"My name is {}".format("Jacob")

will become

_("My name is {}").format("Jacob")

and if the language is set to Spanish then the translated string could be mi nombre es {} which will still work with the formatting.

We might want to think about switching our formatting from arg based to kwarg based which will allow translations to omit placeholders if they don't make sense or change the order if that is appropriate.

For example in english we would say "The blue car" but in Spanish you might say "El auto azul". The adjective goes in a different place. If we wanted to format the colour and object we would need to have something like The {colour} {object} and the translation would be El {object} {colour}. We would of course need to translate the templated words too.

_("The {colour} {object}").format(colour=_("blue"), object=_("car"))

@jacobtomlinson
Copy link
Member

@FabioRosado I found this blog post useful.

@anxodio
Copy link
Contributor Author

anxodio commented Jan 24, 2018

Yeah, I agree totally with the use of .format() if it's possible (I think logging uses old formatting tools with %). This is a great lecture: pyformat

And I think it's easy for the skills developers to use gettext, because it is in standard libraries, and the .pot files are automatically created by a command. Of course, we need to document the process and publish some example translated skills

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jan 24, 2018

Sounds good.

So the agreed design is:

  • Use gettext for translations.
  • Add a lang option to the main config for users to set their language.
  • Add a lang kwarg to the applicable matchers to allow the skill developer to specify which language it is.
  • Add language support for matchers like LUIS by passing the language through to the API.
  • Add language support for skill training by handling intents_(es|de|fr).md files for Rasa NLU.
  • Document and encourage the use of gettext in skill responses.
  • Update all log messages to use gettext.

As there is quite a bit of work here I'm happy to take multiple PRs to tick off these requirements. This issue will be closed once all are complete.

@afibanez I ❤️ pyformat! Thanks for sharing that.

@FabioRosado
Copy link
Member

FabioRosado commented Jan 24, 2018

Sounds awesome! Yeah I was thinking that using something like: '{1} {0}'.format('one', 'two') to print (two, one) could work, or how you wrote it down @jacobtomlinson. We just need to update all the formatted strings since we have been just passing {}.
I'll start checking the parsers that I created to see if we can just use different languages in all of them 😄

I can also give a hand translating things to Portuguese and might even bother the fiancé to help with polish 👍

@haardikdharma10
Copy link
Contributor

@jacobtomlinson, Is Hindi a good option to add here? I'd love to help adding Hindi translations.

@jacobtomlinson
Copy link
Member

Absolutely!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants