Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a button to choose toot lang #3478

Closed
ghost opened this issue May 31, 2017 · 14 comments · Fixed by #18420
Closed

Add a button to choose toot lang #3478

ghost opened this issue May 31, 2017 · 14 comments · Fixed by #18420
Labels
suggestion Feature suggestion

Comments

@ghost
Copy link

ghost commented May 31, 2017

TL;DR: Please make the "choose your toot language" happen.

I've been browsing the issues list, it seems that the team choose to go with CLD2 for language detection… and closed this issue #691

But CLD2 detects only around 80 languages, there is more than 7000 out there. On top of that it can be wrong.

On the other hand, I totally agree that most people speak one of the top 50 languages spoken on Earth.

But there is still some weirdos to use regional dialects or conlang. I'm one of them. For the record, I toot in french, english, toki pona and kotava. I've also seen quite a lot of Esperanto and medieval french.

So I'm asking the team to consider a button allowing mastonauts to force language on a per-toot basis, with possibly a default language and CLD2 fallback if nothing was specified.

Thank you

@Cassolotl
Copy link

Cassolotl commented May 31, 2017

Looks like my toots are being correctly assigned about 68% of the time? https://docs.google.com/spreadsheets/d/1BexKpvslEWedQSdhCH4Htlm0ZTxVbRjG449mgsBvs24/edit?usp=sharing (Edit: It's up to 79%!)

I would love this feature!

@mjankowski
Copy link
Contributor

I think the language count is a bit higher than that, and we're using CLD3 now ... but I think your general point still stands, and we're never going to have 100% perfect coverage on this.

I think switching the filter to be "opt out" instead of "opt in" was an improvement ... because now people are only selecting things to not see, and we should only be tagging things we are confident about, so that should reduce element of false positives.

I have no objection to the eventual inclusion of a UI here, but I have a strong preference for trying to handle as many cases as we can w/out one first.

There's a just-recently-merged but not yet running anywhere commit which removes usernames and hashtags from text before language identification, which should help improve things. I'd like to let that get out there and see what it does to general identification.

For what it's worth, the current behavior already is "if the detector can't find the language, fall back on the user locale, and if they don't have one, fall back on the instance locale". I think that's what you are proposing towards the end there...?

@ghost
Copy link
Author

ghost commented Jun 1, 2017

I believe automatic language selection is here to make Mastodon as user friendly as you can, and I totally agree with this idea, since it's what helps a software to gain adoption.
The current behavior isn't bad at all in this way. I'm just personally in some exotic cases.

My use case is the following :
– I'm tooting in a weird language (kotava –ISO 639-3: avk–, toki pona –ISO 639-3 mis–, etc…)
– CLD fail to detect what language I'm using (not very surprising here)
– Mastodon goes for my locale (french) or my instance locale (I think it's also french)
– Wrong language ends assigned to the toot.

On top of that, I think CLD would probably fail with slang and toots with a lot of grammar and orthographic mistakes. Unless you are using a huge set of data to train the neural network.

So here is a quick mockup about the less intrusive way of doing things that I'm able to think of :
capture

Assuming that in my user preferences :
– I said I'm french (so by default if CLD doesn't understand, my toot are tagged french)
– I said I'm also tooting in english, toki pona and kotava
On hovering the toot button, Mastodon would make the dropdown appear allowing me to easily force toot language. If I don't and click the first button, CLD will try to determine what language I'm using and we will go down the normal way.

I believe that exotic language users would probably don't mind going to parameters to set a few thing up. And for regular users, CLD will just work as expected.

Last note : I'm wondering what data CLD3 uses to determine what language we're tooting in. Since it's a Google product and privacy matters to some Mastodon users, I hope this doesn't send the toot content to Google. And the CLD3 repo says it's intended to work in Chrome, here again I hope this won't let non chrome users out.

Thank you for reading

@Cassolotl
Copy link

With your dropdown, that requires each language to have a different word for "toot", and it also requires the user to remember which translation of "toot" belongs to which language. So as much as I like it, I think it would cause problems!

@ghost
Copy link
Author

ghost commented Jun 1, 2017

Well that's an exemple, it could contain the iso code or even the language name.
But if I'm fluent enough in a language to toot in it… I probably can remember what « toot » translation correspond to what language.
Also we could add a flag or an icon for some of them.

And maybe, but I think this feature would be rather hard to implement, toot lang might be updated in the same menu that allow to unfold or erase it.

@Cassolotl
Copy link

if I'm fluent enough in a language to toot in it… I probably can remember what « toot » translation correspond to what language.

This may be true for you, but it is not true for everyone. I think you would be surprised at how much of a struggle it would be. You are a person who is really into language(s), and your mind is very much tuned to that, right? It's different for people for whom language is not a special interest - being multi-lingual can cause aphasia, and aphasia is a symptom of various mental illnesses and developmental disorders.

it could contain the iso code or even the language name.

Using whatever the most people will recognise seems most sensible. I liked @Lomplac's mock-up a lot - the language code is small, and when you click it, the menu contains both the language code and the full language's name.

mockup

@ghost
Copy link
Author

ghost commented Jun 1, 2017

This may be true for you, but it is not true for everyone.

You're right… okay.

I initially liked @Lomplac idea, that's what I had in mind when I created the issue.
But on a second thought it adds a new button or a new field to UI. This might be confusing for most of the users who don't need this feature.

So I came up with my mockup… well, well, we can put ISO 639-3 code and language into the dropdown of the « TOOT! » button.

@Gargron
Copy link
Member

Gargron commented Jul 2, 2017

I could allow setting language directly through the API, but I wouldn't want to implement any dropdowns like this in the web UI.

@Iteratix
Copy link

I don't usually bother bringing this up with centralized services but as decentralized services are attempting to be better, I'd like to bring this up. Spoken languages are not the only languages in our world, would you please consider adding signed languages such as American Sign Language, British Sign Language, etc.

These would be mostly useful as a tool for discovery, versus changing the UI to reflect the language choice. American Sign Language is currently conveyed through video. Adding support for this early would aid in decentralized services being truly for the people, and not just the spoken language parts of the world. Thanks.

@nightpool
Copy link
Member

nightpool commented Dec 27, 2017

@Iteratix If this was implemented, it would probably support ASL/BSL given that they're already valid BCP47 tags. (with the language tag of 'sgn-ase')

@Iteratix
Copy link

Neat. I'm just getting up to speed myself on how this kind of thing is implemented.

@MattiJarvinen
Copy link

As somewhat fluent in 3 languages I totally get the need for this.

... If user selects they toot in more than 1 language then add language dropdown to toot button ( with lang code ). Last tooted language would move to default spot.

  1. most user never see the feature
  2. Multilingual users gain the feature
  3. UI complexity is avoided

@ghost
Copy link
Author

ghost commented Apr 1, 2018

CLD isn't going to be able to cover media types such as images, videos, etc.

@oriold
Copy link

oriold commented Jun 23, 2018

I understand that in the w3c standard multiple translations are supported. I think the way that for example works in facebook (default language + multiple translations) would be very desirable. I always thought is one of twitter biggest flaws:

https://www.w3.org/TR/activitystreams-core/#naturalLanguageValues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
suggestion Feature suggestion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants