Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external_files: detect locales with a region like en-US #13916

Merged
merged 4 commits into from May 9, 2024

Conversation

guidocella
Copy link
Contributor

commit 1: external_files: rename variables in guess_lang_from_filename()

commit 2: external_files: detect locales with a region like en-US

This loads subtitle files like foo.en-US.srt with --sub-auto=exact.

To preserve the case of these locales and not convert them to e.g. en-us, stop lower casing filenames, and instead use case insensitive functions to check if the media filename is contained in the external filenames. Extensions, whitelisted cover art filenames and idx files were already being compared case insensitively.

Fixes #12372, fixes #13251.

Copy link

github-actions bot commented Apr 17, 2024

Download the artifacts for this pull request:

Windows
macOS

@dyphire
Copy link
Contributor

dyphire commented Apr 17, 2024

So what about sgn-FSL and en-simple as mentioned in #8144 ? There are also Chinese tags, such as zh-Hans and zh-Hant.

@guidocella
Copy link
Contributor Author

I guess we should just allow any length?

@dyphire
Copy link
Contributor

dyphire commented Apr 17, 2024

I guess we should just allow any length?

According to the BCP 47 language tags standard, there will only be 2 or 3 characters before the first - character, which can be used to determine validity.
The problem is that it is uncertain how many characters will follow the first - character. Perhaps we can choose to only support commonly seen language tags after the first - character, which are within the range of 2 to 6 characters.

player/external_files.c Outdated Show resolved Hide resolved
player/external_files.c Outdated Show resolved Hide resolved
i--;
}

if (n < 2 || i == 0 || name.start[i] != thing)
if (lang_length < 2 || lang_length > 6 || (name.start[i] != '-' && lang_length > 3))
return (struct bstr){NULL, 0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (struct bstr){NULL, 0};
return (struct bstr){0};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

player/external_files.c Outdated Show resolved Hide resolved
@dyphire
Copy link
Contributor

dyphire commented Apr 18, 2024

The current detection is too lenient, it defaults to loading subtitles that do not comply with the specifications, such as test-en and en-simpletest-simpletest.

As I mentioned earlier: according to the BCP 47 language tags standard, there will only be 2 or 3 characters before the first - character. And this best to match the valid language tags in language.c (Uncertain about this, which may result in previously loadable custom suffixes such as sc or chs no longer being recognized and loaded).

And according to the BCP 47 language tags standard, there will be a maximum of eight characters after each hyphens (-). Therefore, we still need to check the eight character limit after each hyphens (-).

Refer to relevant BCP 47 language tags normative content:
https://en.wikipedia.org/wiki/IETF_language_tag#Syntax_of_language_tags

@guidocella guidocella force-pushed the sub-region branch 3 times, most recently from 8cd0cf7 to edc0ed9 Compare April 18, 2024 16:09
@kasper93 kasper93 added this to the Release v0.39.0 milestone Apr 18, 2024
This loads subtitle files like foo.en-US.srt with --sub-auto=exact.

To preserve the case of these locales and not convert them to e.g.
en-us, stop lower casing filenames, and instead use case insensitive
functions to check if the media filename is contained in the external
filenames. Extensions, whitelisted cover art filenames and idx files
were already being compared case insensitively.

Fixes mpv-player#12372, fixes mpv-player#13251.
@guidocella guidocella force-pushed the sub-region branch 2 times, most recently from bb96c49 to c5812e6 Compare May 9, 2024 20:28
misc/language.c Outdated Show resolved Hide resolved
misc/language.c Outdated Show resolved Hide resolved
misc/language.c Outdated Show resolved Hide resolved
@kasper93 kasper93 merged commit 63d820b into mpv-player:master May 9, 2024
17 checks passed
@guidocella guidocella deleted the sub-region branch May 10, 2024 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mpv wont load/pickup subtitles? Poor support for locale specific subtitles
3 participants