New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Servo does not break Thai text line properly. #13088

Open
veer66 opened this Issue Aug 28, 2016 · 8 comments

Comments

Projects
None yet
5 participants
@veer66
Contributor

veer66 commented Aug 28, 2016

I tested on Servo a0f45c6 on Ubuntu 16.04. The text "การบ้านการบ้านการบ้าน" is supposed to be split but it isn't.

servo_break1

But it is supposed to look like this:

ffbreak1

The page for testing is here: http://file.veerkesto.net/servo_test/thai_line_break.html http://file.veer66.rocks/servo_test/thai_line_break.html

@Manishearth

This comment has been minimized.

Show comment
Hide comment
@Manishearth
Member

Manishearth commented Aug 28, 2016

cc @glennw

@mbrubeck

This comment has been minimized.

Show comment
Hide comment
@mbrubeck

mbrubeck Sep 26, 2016

Contributor

Thai line breaking require language-specific dictionaries or text analysis. UAX#14 says:

The third style is used for scripts such as Thai, which do not use spaces, but which restrict word-breaks to syllable boundaries, the determination of which requires knowledge of the language comparable to that required by a hyphenation algorithm. Such an algorithm is beyond the scope of the Unicode Standard.

Thai line breaking is implemented by libraries like libthai. To support this in Servo, we would need either a Rust binding to an existing library, or a Rust implementation of the algorithm. Then we would need to call it from TextRun::break_and_shape when shaping text using the Thai script.

Contributor

mbrubeck commented Sep 26, 2016

Thai line breaking require language-specific dictionaries or text analysis. UAX#14 says:

The third style is used for scripts such as Thai, which do not use spaces, but which restrict word-breaks to syllable boundaries, the determination of which requires knowledge of the language comparable to that required by a hyphenation algorithm. Such an algorithm is beyond the scope of the Unicode Standard.

Thai line breaking is implemented by libraries like libthai. To support this in Servo, we would need either a Rust binding to an existing library, or a Rust implementation of the algorithm. Then we would need to call it from TextRun::break_and_shape when shaping text using the Thai script.

@veer66

This comment has been minimized.

Show comment
Hide comment
@veer66

veer66 Sep 26, 2016

Contributor

Is this approach from 2007 still preferred https://bugzilla.mozilla.org/show_bug.cgi?id=336959 ?

Contributor

veer66 commented Sep 26, 2016

Is this approach from 2007 still preferred https://bugzilla.mozilla.org/show_bug.cgi?id=336959 ?

@mbrubeck

This comment has been minimized.

Show comment
Hide comment
@mbrubeck

mbrubeck Sep 26, 2016

Contributor

Yes, that's similar to what I suggested in my previous comment. It looks like Gecko currently uses Pango on GTK platforms, Uniscribe on Windows, system APIs on macOS, and its own rule-based line breaker on other platforms.

Contributor

mbrubeck commented Sep 26, 2016

Yes, that's similar to what I suggested in my previous comment. It looks like Gecko currently uses Pango on GTK platforms, Uniscribe on Windows, system APIs on macOS, and its own rule-based line breaker on other platforms.

@mbrubeck mbrubeck referenced this issue Sep 27, 2016

Merged

Implement `word-break: keep-all` (#9673) #13414

3 of 5 tasks complete
@veer66

This comment has been minimized.

Show comment
Hide comment
@veer66

veer66 Sep 27, 2016

Contributor

Since obtaining offsetWidth of span does not work yet #12939, I suppose that word break tests do not work as expected.

Contributor

veer66 commented Sep 27, 2016

Since obtaining offsetWidth of span does not work yet #12939, I suppose that word break tests do not work as expected.

@khaledhosny

This comment has been minimized.

Show comment
Hide comment
@khaledhosny

khaledhosny Oct 17, 2016

I don’t think Gecko uses Pango or Uniscribe for anything right now. ICU provides line breaking support that covers Thai and other languages, and I think Gecko is moving to replace its line breaker with ICU’s.

khaledhosny commented Oct 17, 2016

I don’t think Gecko uses Pango or Uniscribe for anything right now. ICU provides line breaking support that covers Thai and other languages, and I think Gecko is moving to replace its line breaker with ICU’s.

@nox

This comment has been minimized.

Show comment
Hide comment
@nox

nox Oct 4, 2017

Member

Sample is gone.

Member

nox commented Oct 4, 2017

Sample is gone.

@veer66

This comment has been minimized.

Show comment
Hide comment
@veer66

veer66 Oct 6, 2017

Contributor

I put a new url.

Contributor

veer66 commented Oct 6, 2017

I put a new url.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment