Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(i18n): able to finetune transliterator #12378

Merged
merged 1 commit into from
Sep 11, 2024
Merged

Conversation

BLumia
Copy link
Contributor

@BLumia BLumia commented Aug 2, 2024

Changes
After #11172, this gives user the ability to finetune the behavior of the existing transliterator to fit their own need. All they need to do is set the JELLYFIN_TRANSLITERATOR_ID environment variable to a preferred value, which is explained in this page and this page.

This approach is more flexible than only offers an option to disable the ICU transliterator, users can then:

  1. Only transliterate specified curtain languages instead of "Any", for example, use Hangul-Latin; Hiragana-Latin; instead of Any-Latin; so only Hangul and Hiragana characters will get transliterated. Other non-ascii characters (e.g. Kanji and Chinese characters) will be untouched.
  2. Finetune transliterator behavior (since you can set any valid Transliterator ID to that environment variable).
  3. Simply disable it. Set this environment variable with an empty value, the transliterator will simply do nothing.

Some side notes (some of the notes have nothing to do with the changes made in this PR though):

  1. Users (and plugins as well) can always override the sorting title by setting the sort title field manually. If a sort title is already manually set, Jellyfin won't attempt to override it.
  2. If you choose to set sort title manually or by plugin, make sure at least the first character of the sort title is an ascii character, or you'll not able to find that item by using the alphabet filter on the right-hand side of the page.
  3. When generating the fallback sort title, Jellyfin already attempt to remove diacritics in ModifySortChunks Before the ICU transliterator gets used. Keep that in mind when choosing the right transliterator rule (Transliterator ID).
  4. You need to trigger a re-scan to make Jellyfin update the existing sort title cache. Choosing "Update Missing Metadata" will be fine.

Issues

@gnattu
Copy link
Member

gnattu commented Aug 2, 2024

Related: #11880

Copy link
Member

@gnattu gnattu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need better documentation, ideally on jellyfin.org. Probably also should mention this is in changelog as user actions needed to fix undesired behaviors. Otherwise LGTM

private static readonly Lazy<Transliterator> _transliterator = new(() => Transliterator.GetInstance(
"Any-Latin; Latin-Ascii; Lower; NFD; [:Nonspacing Mark:] Remove; [:Punctuation:] Remove;"));
private static readonly Lazy<string> _transliteratorId = new(() =>
Environment.GetEnvironmentVariable("JELLYFIN_TRANSLITERATOR_ID")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally fine with reading env var directly but I'm not sure if there is any better way to integrate the configuration for string extensions

@BLumia
Copy link
Contributor Author

BLumia commented Sep 7, 2024

Need better documentation, ideally on jellyfin.org.

Agree.

Related to this, maybe we should simply change Sort title on the web UI and documentation to something else (e.g. "Transliterated title"), too? To be honest it actually caused me some confusion because of its name ( #11172 (comment) ). By doing this, users and developers could better understand the sortname metadata is supposed to be ascii-only, and maybe more plugin developer can make use of that metadata.

@gnattu
Copy link
Member

gnattu commented Sep 7, 2024

sortname metadata is supposed to be ascii-only

Actually it is not. Computers do sort non ascii characters but may or may not in the human acceptable way and using ascii only characters is a easier way to control sorting behavior. But if users really want to input a non-ascii character it is also OK.

@BLumia
Copy link
Contributor Author

BLumia commented Sep 8, 2024

Computers do sort non ascii characters but may or may not in the human acceptable way

Is there any real-life usage example to help me understand better about this usage? If it's intended to allow sortname contain non-ascii character, then why user cannot use the filter on the right-hand side to find there programs which their sortnames are started with non-ascii characters?

@gnattu
Copy link
Member

gnattu commented Sep 8, 2024

Computers do sort non ascii characters but may or may not in the human acceptable way

Is there any real-life usage example to help me understand better about this usage? If it's intended to allow sortname contain non-ascii character, then why user cannot use the filter on the right-hand side to find there programs which their sortnames are started with non-ascii characters?

If it's intended to allow sortname contain non-ascii character, then why user cannot use the filter on the right-hand side to find there programs which their sortnames are started with non-ascii characters?

Because there are too many of them

Is there any real-life usage example to help me understand better about this usage?

For example the ascii order of romaji of hiragana and katakanas are not in the same order as the Japanese traditional あいうえお order. Language is just complicated and you cannot assume the ASCII order is the most correct order.

@BLumia
Copy link
Contributor Author

BLumia commented Sep 9, 2024

Because there are too many of them

If a user has too many items which starts with a non-ascii character, then I'd say the right-hand side filter is completely useless for such user. Things like Alphabetic Index might be helpful to improve this, while I'm not 100% sure about it. Whatever if that can be improved, it will be a completely new discussion topic which is not related to this PR.

Anyway, related to this PR, as a conclusion, we won't change the copywriting of "Sort Title" to something else :)

@crobibero crobibero merged commit 81aca67 into jellyfin:master Sep 11, 2024
12 checks passed
@BLumia BLumia deleted the finetune branch September 11, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Issue]: Wrong content sorting in RU language.
3 participants