Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Displaying unicode or different language like Bengali doesn't work. #598

Closed
monirz opened this issue Jan 4, 2020 · 23 comments
Closed

Displaying unicode or different language like Bengali doesn't work. #598

monirz opened this issue Jan 4, 2020 · 23 comments
Labels
Internationalisation I18n and support for non-bundled fonts and languages

Comments

@monirz
Copy link

monirz commented Jan 4, 2020

I'm trying to make an application using Fyne, where I need to display Bengali. But didn't find a way to display it. Is there a way to display Unicode/different language?

@andydotxyz
Copy link
Member

The building fonts do not support all languages and Bengali is one that is not included sorry. If you download a TTF don’t file and specify its path in the FYNE_FONT environment variable then you should be able to use the characters you want. “Noto Sans Bengali” would match the builtin fonts well.

@monirz
Copy link
Author

monirz commented Jan 5, 2020

Okay, I tried with FYNE_FONT variable and it now shows letters instead of blank boxes but it's broken. I tried with few popular Bengali fonts. The result is same.
Also, I need to the fonts combined in the binary for user. So the user won't have to provide the font, guess the user has fonts already installed.

@andydotxyz
Copy link
Member

Can you expand more please on “its broken”? If the letters are displayed what else do you expect?

@monirz
Copy link
Author

monirz commented Jan 5, 2020

I think I used the term "letter" wrong, it's more of a font is broken. Meaning it can not render the font properly.

@monirz
Copy link
Author

monirz commented Jan 5, 2020

This is the link of the image with Fyne and how it should be link

@andydotxyz
Copy link
Member

As someone who knows the language can you explain what the differences are - or what a possible cause may be?

It looks to me like characters are supposed to combine in some way - but that is an uneducated guess.

@monirz
Copy link
Author

monirz commented Jan 6, 2020

Well that problem happens with some other cases like in chrome because of when the proper font is not installed on the system. But in this case I'm not sure.

@mrezai
Copy link

mrezai commented Jan 8, 2020

I can't read Bengali but it seems this problem is related to "text shaping" and to support it you need something like HarfBuzz
Some related links:
https://en.wikipedia.org/wiki/Complex_text_layout
https://github.com/grisha/hbshape

@mrezai
Copy link

mrezai commented Jan 8, 2020

My previous comment was a guess and now after run the demo, it seems text rendering for languages known as "complex scripts" hadn't implemented at all.
In addition to HarfBuzz something like ICU is needed for BiDi.
I think its a good idea to add CTL to milestone 2.0 and all of this means "open Pandora's box"!

@beoran
Copy link

beoran commented Jan 8, 2020

For non-complex scripts such as Japanese or Chinese, etc. it would be enough to be able to embed a custom font in the binary using, e.g. go-bindata, if only fyne had a way to set that font other than setting FYNE_FONT.

Edit: additionally support for .odf and .ttc fonts might be very useful.

@andydotxyz
Copy link
Member

Applications can provide their own font by implementing a custom theme (like https://github.com/andydotxyz/beebui).
The process for doing so should be easier though!

@andydotxyz
Copy link
Member

Also I agree we should aim for full internationalisation in 2.0

@kkartaltepe
Copy link

IMO the easiest example of lack of support for shaping is in Arabic (and related languages). A simple test like
مرحبا
arabic_example
You can see the lack of RTL support (Full height Vertical character will appear on the right), and the lack of shaping (characters rendered individually instead of connected like cursive).

@Nik-U
Copy link

Nik-U commented Jul 24, 2020

Rendering "non-complex scripts" (e.g., for English, Russian, or Korean) can be done with a naïve text shaper that simply queries the font file for a glyph on a rune-by-rune basis, as long as one supplies an appropriate font file. One annoying limitation is that the SFNT format (used by TTF/OTF files) is limited to 216 glyphs, which is not enough to support all languages + emoji. Thus, even a text shaper that is limited to non-complex scripts will need to implement "font fallback" behavior in practice if one wants to properly render strings containing arbitrary languages. For the Gio project we implemented this with support for OpenType Collection files like these giant merged Noto fonts that I prepared. Alternatively, you can use the installed system fonts and read the fallback order from the system configuration (in most Linux distributions this is handled by fontconfig).

The Gio project has recently run into the same text shaping limitations for "complex scripts" (e.g., for Arabic or Bengali) that are discussed in this GitHub issue. See gio#146 for a more detailed write-up and suggested path forward. Fyne will need to walk a similar path to resolve this issue.

To summarize:

The key components for a solution are using HarfBuzz to compute glyphs and offsets, implementing an algorithm similar to the Pango library for preparing calls to hb_shape, implementing the unicode bidirectional algorithm, and then gluing the resulting glyph data to the existing font and rendering systems. HarfBuzz is the project that implements text shaping for complex scripts with the highest accuracy. It is so dominant in this space that it is used by the browser rendering engines (Chromium and Firefox), the major desktop GUI toolkits (Qt and GTK+), Android, and Java, among others. The project is large, complex, requires a lot of resources to maintain (shaping bugfixes for languages still come in weekly even after a decade of development), and is written in C++. Duplicating the effort is probably intractable; the Rustaceans opted to duplicate the effort with Allsorts, but it lags behind HarfBuzz in terms of language coverage. Windows and macOS each have a proprietary alternative (Uniscribe and Core Text, respectively).

There are a few possible paths forward discussed in gio#146, summarized below:

  • Write a cgo wrapper for HarfBuzz. This adds a sizeable system dependency for compilation.
  • Try to do an automated translation of HarfBuzz code into native Go or a specialty font encoding like Graphite. We're talking about (specialized) C++ to Go transpilation here, which is definitely non-trivial. Translation of lookup tables or unicode data files alone is insufficient.
  • Try to do something clever with linking (e.g., distributing a precompiled HarfBuzz and/or linking with .syso files).
  • Accept incomplete language coverage for now. This might mean implementing support for Graphite fonts and having users wait for appropriate fonts to be released, or it might mean porting or linking to the Allsorts project.
  • Something else we haven't thought of yet.

None of these options are without drawbacks. However, resolving the HarfBuzz linking problem in one way or another would be very beneficial for a lot of Go projects—not just Fyne and Gio, but pretty much anything that needs to render arbitrary text.

@kkartaltepe
Copy link

If you resign yourself to linking HB, you can simply link fribidi (with no dependencies its fairly nice) for bidi.

@beoran
Copy link

beoran commented Jul 24, 2020

I looked at HarfBuzz and I think 50% of the code is C(C++) support, like memory management, etc., that we don't need in Go, because the language provides it for us, furthermore, the HarfBuzz api isn't all that great, it suffers from being very C-like due to memory allocation problems. In Go we have the standard library "unicode", "golang.org/x/text/language", and "x/image/font". HarfBuzz uses Ragel parsers which can also be generated as Go language in stead of C. The remaining few thousands lines of code should not be too hard for a manual conversion, at least not for the basic shaping API. This could look something like this:

    buffer = gotesh.Buffer(text)
    buffer.SetDirection(gotesh.LeftToRight)
    buffer.SetScript(language.Bengali.Script());
    buffer.SetLanguage(language.Bengali);
    glyphInfo, glyphPosition, err := buffer.Shape(fontFace,features);

Nothing there yet, but go here if you like the challenge ;)
https://gitlab.com/beoran/gotesh

@andydotxyz
Copy link
Member

you can simply link fribidi

Unfortunately statically linking fribidi is not an option due to licensing.

@andydotxyz
Copy link
Member

Thus, even a text shaper that is limited to non-complex scripts will need to implement "font fallback" behavior in practice

Quite right @Nik-U - we have the fallback in place so when someone loads, for example, a Japanese font, they still see the english text that the project includes etc.

I think that realistically there are multiple steps -> user defined font -> app defined font -> language font -> toolkit fallback.

We have avoided doing system lookups this far because a consistent experience was deemed important and some distros (particularly some lightweight Linux ones) don't even ship vector fonts by default.

I did consider a massive combined font file - but this resulted in around 200MB once Japanese and Chinese (traditional and simplified) were added - which is clearly more than we can reasonably embed in binaries.
Mostly for that reason I think we may need to go down the route of a "language pack" that can be downloaded and used in a font-lookup-order mechanism like described above.

@beoran
Copy link

beoran commented Aug 13, 2020

On investigating the issue, I'd say that HarfBuzz, etc, are not the right idea, while, in stead, Graphite is: https://scripts.sil.org/cms/scripts/page.php?site_id=projects&item_id=graphite_home . In stead of having to program the language rules manually, graphite compiles them directly into the font. All that is needed is to implement a VM. That would be even easier to do in Go.

@andydotxyz
Copy link
Member

The rendering issues referred to here should be resolved in v2.3.0. The code that follows seems to render well:

package main

import (
	"os"

	"fyne.io/fyne/v2/app"
	"fyne.io/fyne/v2/container"
	"fyne.io/fyne/v2/widget"
)

func main() {
	os.Setenv("FYNE_FONT", "/Users/andy/Downloads/shruti.ttf")
	a := app.New()
	w := a.NewWindow("Hello")

	hello := widget.NewLabel("િદ્ધની")
	w.SetContent(container.NewVBox(
		hello,
	))

	w.ShowAndRun()
}

Screenshot 2022-12-02 at 14 25 35

@beoran
Copy link

beoran commented Dec 2, 2022

Great job for porting the whole of haarfbuzz from C++ to Go! You are a master amongst masters!

@andydotxyz
Copy link
Member

Great job for porting the whole of haarfbuzz from C++ to Go! You are a master amongst masters!

I appreciate the enthusiasm, but honestly all of the thanks belong to @benoitkugler for the porting work and @whereswaldon for building most of the https://github.com/go-text/typesetting project that made this possible!
There will be more improvements to the rendering efficiency which we will be sharing through go-text as well, but so far we are standing on the shoulders of giants!

@beoran
Copy link

beoran commented Dec 2, 2022

Sorry, I had that mixed up. Honor to those who deserve it! Giants indeed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internationalisation I18n and support for non-bundled fonts and languages
Projects
None yet
Development

No branches or pull requests

6 participants