Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Servo does not break Thai text line properly. #13088

Open
veer66 opened this issue Aug 28, 2016 · 8 comments
Open

Servo does not break Thai text line properly. #13088

veer66 opened this issue Aug 28, 2016 · 8 comments

Comments

@veer66
Copy link
Contributor

@veer66 veer66 commented Aug 28, 2016

I tested on Servo a0f45c6 on Ubuntu 16.04. The text "การบ้านการบ้านการบ้าน" is supposed to be split but it isn't.

servo_break1

But it is supposed to look like this:

ffbreak1

The page for testing is here: http://file.veerkesto.net/servo_test/thai_line_break.html http://file.veer66.rocks/servo_test/thai_line_break.html

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Aug 28, 2016

cc @glennw

@mbrubeck
Copy link
Contributor

@mbrubeck mbrubeck commented Sep 26, 2016

Thai line breaking require language-specific dictionaries or text analysis. UAX#14 says:

The third style is used for scripts such as Thai, which do not use spaces, but which restrict word-breaks to syllable boundaries, the determination of which requires knowledge of the language comparable to that required by a hyphenation algorithm. Such an algorithm is beyond the scope of the Unicode Standard.

Thai line breaking is implemented by libraries like libthai. To support this in Servo, we would need either a Rust binding to an existing library, or a Rust implementation of the algorithm. Then we would need to call it from TextRun::break_and_shape when shaping text using the Thai script.

@veer66
Copy link
Contributor Author

@veer66 veer66 commented Sep 26, 2016

Is this approach from 2007 still preferred https://bugzilla.mozilla.org/show_bug.cgi?id=336959 ?

@mbrubeck
Copy link
Contributor

@mbrubeck mbrubeck commented Sep 26, 2016

Yes, that's similar to what I suggested in my previous comment. It looks like Gecko currently uses Pango on GTK platforms, Uniscribe on Windows, system APIs on macOS, and its own rule-based line breaker on other platforms.

@mbrubeck mbrubeck mentioned this issue Sep 27, 2016
3 of 5 tasks complete
@veer66
Copy link
Contributor Author

@veer66 veer66 commented Sep 27, 2016

Since obtaining offsetWidth of span does not work yet #12939, I suppose that word break tests do not work as expected.

@khaledhosny
Copy link

@khaledhosny khaledhosny commented Oct 17, 2016

I don’t think Gecko uses Pango or Uniscribe for anything right now. ICU provides line breaking support that covers Thai and other languages, and I think Gecko is moving to replace its line breaker with ICU’s.

@nox
Copy link
Member

@nox nox commented Oct 4, 2017

Sample is gone.

@veer66
Copy link
Contributor Author

@veer66 veer66 commented Oct 6, 2017

I put a new url.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.