Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sub: add sub-detect-rtl option #12985

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

llyyr
Copy link
Contributor

@llyyr llyyr commented Nov 27, 2023

libass assumes all text to be left-to-right for compatibility with VSFilter. We can force auto detection by setting Encoding=-1, but it may break older subtitles so don't enable it by default.

Fixes #12978

Copy link

github-actions bot commented Nov 27, 2023

Download the artifacts for this pull request:

Windows

sub/sd_ass.c Outdated Show resolved Hide resolved
@llyyr
Copy link
Contributor Author

llyyr commented Nov 27, 2023

Updated to add a force mode which just prepends the RTL mark at the start of every subtitle event.

@ShlomoCode can you try this version once it's finished rebuilding?

@sfan5
Copy link
Member

sfan5 commented Nov 27, 2023

Updated to add a force mode which just prepends the RTL mark at the start of every subtitle event.

That sounds like the least flexible solution, needs a better fix IMO.

@llyyr
Copy link
Contributor Author

llyyr commented Nov 27, 2023

That sounds like the least flexible solution, needs a better fix IMO.

That's just the auto detection mode, the force solution works in every situation where the auto detection fails. I'm not sure if there's a better fix here from our side at least

I asked in #libass and this is what I was recommended to do, I suppose libass could provide an API to do this on their side instead

@ShlomoCode
Copy link

ShlomoCode commented Nov 27, 2023

@ShlomoCode can you try this version once it's finished rebuilding?

  • --sub-detect-rtl=force Displays everything without RTL, even when the first word in the sentence is in Hebrew...
  • --sub-detect-rtl (noforce) is correct only when the first word is in Hebrew.
  • when I tried to load 2 subtitle tracks + --sub-detect-rtl=force, the player crashed on opening:
CleanShot.2023-11-28.at.01.21.02-converted.mp4
  • Another point is that it should be possible to activate the RTL by force only on a single subtitle strip, otherwise apparently in the case of double subtitles (for example Hebrew and English) and activating force mode, the English subtitles will be displayed distorted with RTL...

@llyyr
Copy link
Contributor Author

llyyr commented Nov 28, 2023

I'm afraid it's not possible to display this correctly then without changes from libass side. I still think there's some merit to having the sub-detect-rtl option without the force setting, so I'll change the PR back to that.

libass assumes all text to be left-to-right for compatibility with
VSFilter. We can force auto detection by setting Encoding=-1, but
it may break older subtitles so don't enable it by default.
@@ -2426,6 +2426,12 @@ Subtitles
``--sub-speed=25/23.976`` plays frame based subtitles which have been
loaded assuming a framerate of 23.976 at 25 FPS.

``--sub-detect-rtl=<yes|no>``
Default: no. Set the ``Encoding`` flag to ``-1`` to enable Fribidi's base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering it only applies to converted subs (and thus VSFilter is not relevant), in which situation would you possibly want to set this to no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know how well Fribidi's auto detection algorithm works, so I'm scared of introducing a change in default that might possibly regress some LTR subtitles. If you think it should be enabled by default then I could change that. also cc @avih

I should update the description, however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that either:

  • it works correctly, then it can be enabled and we don't need an option
  • it doesn't work correctly, then the entire PR isn't really that useful

That it regresses any LTR content is very unlikely, Unicode has clear definitions which glyphs/scripts are RTL. You'd have to mess up really bad.

Copy link
Member

@avih avih Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works correctly, then it can be enabled and we don't need an option
That it regresses any LTR content is very unlikely

I think I mostly agree with this.

Auto RTL detection (based on the first word language apparently) is highly unlikely to get wrong IMO.

However, it is still a guess for text which is missing RTL marks. So I think it would be useful to allow disabling it too, i.e. keep it an option.

I also think it would be useful to keep the force mode (i.e. force RTL instead of autodetect) because we know for a fact that in some cases detection is not good enough, though the method, while working (of inserting RTL marks), is a bit iffy, because libass doesn't currently have other methods to control direction.

But a user script can probably add RTL EMBEDDING marks for all sub lines? so just let user scripts allow forcing RTL?

Or maybe we could add an SDH filter which inserts RTL embedding?

@sfan5 sfan5 added the priority:on-ice may be revisited later label Nov 29, 2023
@llyyr llyyr marked this pull request as draft November 29, 2023 22:44
@llyyr
Copy link
Contributor Author

llyyr commented Nov 29, 2023

Converting this to a draft for now as it has questionable usefulness in its current state. It seems libass will provide us some way to force RTL, so I can add the force setting back in then.

Currently, this option can be emulated by setting --sub-ass-style-overrides=Encoding=-1

@dyphire
Copy link
Contributor

dyphire commented Dec 4, 2023

Isn't there already an existing ASS_FEATURE_WHOLE_TEXT_LAYOUT API for implementing bidirectional text processing in libass? Isn't it better to add an option to choose whether to enable it?

@llyyr
Copy link
Contributor Author

llyyr commented Dec 4, 2023

Isn't there already an existing ASS_FEATURE_WHOLE_TEXT_LAYOUT API for implementing bidirectional text processing in libass? Isn't it better to add an option to choose whether to enable it?

Encoding=-1 already implies whole-text layout, and as seen, it's not enough for webvtt files obtained from Youtube. This is because Youtube sub tracks don't contain bidi information themselves, but are provided to the browser through css.

@TheOneric
Copy link
Contributor

TheOneric commented Dec 4, 2023

To clarify further and recap additional details from the discussion on IRC a while ago:

  • WHOLE_TEXT affects how BiDi runs are split, but does nothing about the implicit base direction
  • Encoding=-1 is a libass-specific extension (files using it will have both gabled BiDi and broken fontselection in standard ASS renderers). It changes the implicit base direction to auto from ASS’ default of ltr and force-enables WHOLE_TEXT for affected events
  • features/encoding in different sub formats:
    • for native ASS no ASS_FEATURE_*s must be enabled, it breaks rendering
    • standard WebVTT files always use auto base direction and will want WHOLE_TEXT and probably most or all of INCOMPATIBLE_EXTENSIONS
    • for SRT it’s a toss up. There are SRT files which use ASS override tags and depend on VSFilter quirks but also files which know nothing of that and would very much like WHOLE_TEXT, Encoding=-1 etc
    • I don't know enough about other subtitle formats to say what they want
  • the samples from the referenced issue are from YouTube (auto-generated and auto-translated subs iiuc). YouTube has some custom WebVTT extensions, but also apparently sets the implicit base direction to something other than auto for some scenarios (presumably language-based?) and does so via an external channel without recording the base direction in the WebVTT file itself. Its files are thus broken and need to be fixed up before playback.
    (In standard WebVTT the correct and intended way to change the base direction is to insert Unicode RTL/LTR marks as needed and arguably this is what YT should preferably do)
  • FFmpeg currently strips RTL/LTR marks from WebVTT files when converting to (shoddy) ASS

llyyr added a commit to llyyr/mpv that referenced this pull request Dec 16, 2023
Standard webvtt files always use auto base direction, so we should
always enable incompatible extensions. This currently covers bidi base
direction, as well as bidi bracket matching, and unicode soft line
wrapping.

This _fixes_ right-to-left text rendering for webvtt files which
correctly mark rtl/ltr. Webvtt files obtained from sources which
sideload the RTL information through css also see an improvement
due to the auto detection.

Additionally, we could also consider enabling this for other sub formats
in the future on a case-by-case basis.

See also: mpv-player#12985 (comment)
llyyr added a commit to llyyr/mpv that referenced this pull request Dec 16, 2023
Enable ASS_FEATURE_INCOMPATIBLE_EXTENSIONS and auto base detection
by default, and add an option to disable this if needed.

This currently means auto base direction, changing how bidi runs are
split, bidi bracket matching, and unicode soft line wrapping.

This is strictly an improvement for Webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: mpv-player#12985 (comment)
llyyr added a commit to llyyr/mpv that referenced this pull request Dec 16, 2023
Enable ASS_FEATURE_INCOMPATIBLE_EXTENSIONS and auto base detection
by default, and add an option to disable this if needed.

This currently means auto base direction, changing how bidi runs are
split, bidi bracket matching, and unicode soft line wrapping.

This is strictly an improvement for Webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: mpv-player#12985 (comment)
llyyr added a commit to llyyr/mpv that referenced this pull request Dec 16, 2023
Enable ASS_FEATURE_INCOMPATIBLE_EXTENSIONS and auto base detection
by default, and add an option to disable this if needed.

This currently means auto base direction, changing how bidi runs are
split, bidi bracket matching, and unicode soft line wrapping.

This is strictly an improvement for Webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: mpv-player#12985 (comment)
llyyr added a commit to llyyr/mpv that referenced this pull request Dec 16, 2023
Enable ASS_FEATURE_{WHOLE_TEXT_LAYOUT, BIDI_BRACKETS} and auto base
detection by default, and add an option to disable this if needed.

This is strictly an improvement for webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: mpv-player#12985 (comment)
llyyr added a commit to llyyr/mpv that referenced this pull request Dec 16, 2023
Enable ASS_FEATURE_{WHOLE_TEXT_LAYOUT, BIDI_BRACKETS} and auto base
detection by default, and add an option to disable this if needed.

This is strictly an improvement for webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: mpv-player#12985 (comment)
llyyr added a commit to llyyr/mpv that referenced this pull request Dec 16, 2023
Enable ASS_FEATURE_{WHOLE_TEXT_LAYOUT, BIDI_BRACKETS} and auto base
detection by default, and add an option to disable this if needed.

This is strictly an improvement for webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: mpv-player#12985 (comment)
llyyr added a commit to llyyr/mpv that referenced this pull request Dec 25, 2023
Enable ASS_FEATURE_{WHOLE_TEXT_LAYOUT, BIDI_BRACKETS} and auto base
detection by default, and add an option to disable this if needed.

This is strictly an improvement for webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: mpv-player#12985 (comment)
@TheOneric
Copy link
Contributor

FFmpeg currently strips RTL/LTR marks from WebVTT files when converting to (shoddy) ASS

update: ffmpeg git master now preserves bidi marks

llyyr added a commit to llyyr/mpv that referenced this pull request Apr 14, 2024
Enable ASS_FEATURE_{WHOLE_TEXT_LAYOUT, BIDI_BRACKETS} and auto base
detection by default, and add an option to disable this if needed.

This is strictly an improvement for webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: mpv-player#12985 (comment)
kasper93 pushed a commit that referenced this pull request Apr 17, 2024
Enable ASS_FEATURE_{WHOLE_TEXT_LAYOUT, BIDI_BRACKETS} and auto base
detection by default, and add an option to disable this if needed.

This is strictly an improvement for webvtt files as they always use
auto base detection. This _fixes_ right-to-left text rendering for
webvtt files which correctly mark rtl/ltr. Webvtt files obtained from
sources which sideload the RTL information through css also see an
improvement due to the auto detection.

Generally SRT files also want this, but some are also written to
workaround VSFilter quirks.

See also: #12985 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:on-ice may be revisited later
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RTL display for subtitles
6 participants