Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a local font when the font isn't embedded (bug 1766039) #14844

Closed
wants to merge 1 commit into from

Conversation

calixteman
Copy link
Contributor

- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1766039;
- if the font is not present locally and it has a standard font replacement
  then use the standard font.
if (!Array.isArray(localNames)) {
localNames = [localNames];
}
localNames = localNames.map(name => `local('${name}')`).join(" ");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple local names to try, I believe the local('...') entries should be separated by commas, not just spaces.

@Snuffleupagus
Copy link
Collaborator

What sort of overhead does this lead to, given that it'll affect all documents without embedded fonts (of which there are a lot), since it'd force the worker-thread parsing to wait for the main-thread (potentially more than once) which really doesn't seem great in the general case?

Hence I cannot help wondering if we should instead perhaps consider always loading the standard font data, since that should also circumvent these types of issues (and we'd not need to depend on users installed fonts)?
That solution would obviously also have some overhead, and would require issue #14843 to be fixed, but it'd be interesting to compare those approaches!

@calixteman calixteman marked this pull request as draft April 26, 2022 17:04
@calixteman
Copy link
Contributor Author

Please correct me if I'm wrong somewhere:

  • I'm in favor of Replace most Foxit fonts with Liberation fonts #14843: it'd help to have something more consistent and at least provide a good coverage of languages using more or less a latin alphabet;
  • but it doesn't help for asian, arabic, ... missing fonts and I'm not sure we want to embed a fallback for each language... so an idea could be:
  • in case we only use the browser font render, we don't have to load Liberation in the worker because we don't need to sanitize it or whatever: we can just use it.
  • afaik, in Firefox, fetching in the worker is in fact fetching in the main thread so afaik there's no special value to fetch in the worker (except the fact we don't need to pass a message to the main thread).

So my ideas would be:

  • if a font is missing, then just send a message to the main thread to load the missing font,
  • load a local font: local(name in pdf), local(subst 1), local(subst 2), ...
  • if we fail then load Liberation,
  • no await in the worker: we don't have to have the font,
  • in case we need to have it (should be pretty rare.... is it possible ??) just ask to the main thread to pass Liberation to the worker

This way there is no overhead in the worker, almost no overhead in main thread (compared to the current situation).
I think it's in general better to try to use system fonts when they're available: I guess most linux distros have Liberation or most of Windows setup have Times or Arial so having a list of possible substitutions would help to use them instead of loading them.

@Snuffleupagus
Copy link
Collaborator

* I'm in favor of [Replace most Foxit fonts with Liberation fonts #14843](https://github.com/mozilla/pdf.js/issues/14843): it'd help to have something more consistent and at least provide a good coverage of languages using more or less a latin alphabet;

Thanks!

* but it doesn't help for asian, arabic, ... missing fonts and I'm not sure we want to embed a fallback for each language... so an idea could be:

You're obviously correct about that, and I completely agree that don't want to embed fonts for all the worlds languages (because of the file-sizes and the complexity).

  * to try to find the font on the system, if not present try some possible fallbacks with close metrics (like Liberation and Times);
  * if I understand correctly: https://pdfium.googlesource.com/pdfium.git/+/refs/heads/main/xfa/fgas/font/fgas_fontutils.cpp#195 could help to have some substitution names.

Those ideas are perhaps better to explore in a follow-up, if that's indeed deemed necessary to fix specific bugs.

* in case we only use the browser font render, we don't have to load `Liberation` in the worker because we don't need to sanitize it or whatever: we can just use it.

* afaik, in Firefox, fetching in the worker is in fact fetching in the main thread so afaik there's no special value to fetch in the worker (except the fact we don't need to pass a message to the main thread).

While I don't know anything about the fetch-implementation in Firefox, I do recall @brendandahl mentioning that fetching in the worker-thread was slightly faster (maybe related to no postMessage overhead); please see #12726 (comment)

Please note: There's some cases where worker-thread fetching actually makes sense, even in browsers.
There's certain text-rendering modes, and patterns can also be used when rendering text, that will require having "proper" font-data available on the worker-thread (to allow building glyph-outlines). Hence the need to load/parse font-data on the worker-thread.
However, this is currently not actually properly supported since by the time that we find out about those cases the font-loading/parsing has already been done. This is a known issue, that's not trivial to fix, and which was mentioned as a follow-up in e.g. #12726 (comment)

So my ideas would be:

* if a font is missing, then just send a message to the main thread to load the missing font,

That sounds like a good idea to me...

* load a local font: `local(name in pdf), local(subst 1), local(subst 2), ...`

... and so does this.

* if we fail then load `Liberation`,

That would actually require fixing issue #14843 first though, since otherwise the fonts we load will be too incomplete to be generally helpful. Given that this shouldn't be necessary to fix the referenced bug, unless I'm misunderstanding things, this can probably be deferred until later!?

However, if/when we actually implement this we'd still need to have a way to await the Liberation...-font being loaded though. Otherwise there's a risk that rendering would start before the Liberation...-font has actually been loaded (on the main-thread), thus breaking things.
Most likely you could utilize the existing font-caching in the API to work-around that problem, but again this is probably better left for follow-up work!?

* no await in the worker: we don't have to have the font,

That'd would indeed address my concerns from #14844 (comment), about the worker-thread having to wait for the main-thread.

* in case we need to have it (should be pretty rare.... is it possible ??) just ask to the main thread to pass `Liberation` to the worker

Above I've tried to outline the situations where we need access to the font-data on the worker-thread (in browsers), however that'd most likely require a different kind of implementation overall. Let's ignore that part for now, or at least here.

This way there is no overhead in the worker, almost no overhead in main thread (compared to the current situation). I think it's in general better to try to use system fonts when they're available: I guess most linux distros have Liberation or most of Windows setup have Times or Arial so having a list of possible substitutions would help to use them instead of loading them.

Agreed, most (if not all) environments should hopefully have the "standard" fonts available.

@calixteman
Copy link
Contributor Author

* I'm in favor of [Replace most Foxit fonts with Liberation fonts #14843](https://github.com/mozilla/pdf.js/issues/14843): it'd help to have something more consistent and at least provide a good coverage of languages using more or less a latin alphabet;

Thanks!

* but it doesn't help for asian, arabic, ... missing fonts and I'm not sure we want to embed a fallback for each language... so an idea could be:

You're obviously correct about that, and I completely agree that don't want to embed fonts for all the worlds languages (because of the file-sizes and the complexity).

  * to try to find the font on the system, if not present try some possible fallbacks with close metrics (like Liberation and Times);
  * if I understand correctly: https://pdfium.googlesource.com/pdfium.git/+/refs/heads/main/xfa/fgas/font/fgas_fontutils.cpp#195 could help to have some substitution names.

Those ideas are perhaps better to explore in a follow-up, if that's indeed deemed necessary to fix specific bugs.

Yep, sure, it's was my plan, I was just enumerating what I've in mind to fix the missing font issues in general.
This patch is just a first step.

* in case we only use the browser font render, we don't have to load `Liberation` in the worker because we don't need to sanitize it or whatever: we can just use it.

* afaik, in Firefox, fetching in the worker is in fact fetching in the main thread so afaik there's no special value to fetch in the worker (except the fact we don't need to pass a message to the main thread).

While I don't know anything about the fetch-implementation in Firefox, I do recall @brendandahl mentioning that fetching in the worker-thread was slightly faster (maybe related to no postMessage overhead); please see #12726 (comment)

Yes it's correct: the price to pay in fetching in the main thread will be to use postMessage.
If we manage to not await in the worker, then the price is likely close to 0.

Please note: There's some cases where worker-thread fetching actually makes sense, even in browsers. There's certain text-rendering modes, and patterns can also be used when rendering text, that will require having "proper" font-data available on the worker-thread (to allow building glyph-outlines). Hence the need to load/parse font-data on the worker-thread. However, this is currently not actually properly supported since by the time that we find out about those cases the font-loading/parsing has already been done. This is a known issue, that's not trivial to fix, and which was mentioned as a follow-up in e.g. #12726 (comment)

If we need to have some glyphs outlines, for now it's only possible with fonts where we've the raw font data (so embedded fonts and standard fonts). If for example Liberation has already been loaded on the main thread we can pass the buffer for free in the worker but if we fall back on another one then we'll need to fetch it... but it's only in case we've to display an outline for a missing font so I guess it should be pretty rare.

So my ideas would be:

* if a font is missing, then just send a message to the main thread to load the missing font,

That sounds like a good idea to me...

* load a local font: `local(name in pdf), local(subst 1), local(subst 2), ...`

... and so does this.

* if we fail then load `Liberation`,

That would actually require fixing issue #14843 first though, since otherwise the fonts we load will be too incomplete to be generally helpful. Given that this shouldn't be necessary to fix the referenced bug, unless I'm misunderstanding things, this can probably be deferred until later!?

However, if/when we actually implement this we'd still need to have a way to await the Liberation...-font being loaded though. Otherwise there's a risk that rendering would start before the Liberation...-font has actually been loaded (on the main-thread), thus breaking things. Most likely you could utilize the existing font-caching in the API to work-around that problem, but again this is probably better left for follow-up work!?

We could add a dependency on the font to be sure the render will await when we're drawing.

* no await in the worker: we don't have to have the font,

That'd would indeed address my concerns from #14844 (comment), about the worker-thread having to wait for the main-thread.

* in case we need to have it (should be pretty rare.... is it possible ??) just ask to the main thread to pass `Liberation` to the worker

Above I've tried to outline the situations where we need access to the font-data on the worker-thread (in browsers), however that'd most likely require a different kind of implementation overall. Let's ignore that part for now, or at least here.

+1

This way there is no overhead in the worker, almost no overhead in main thread (compared to the current situation). I think it's in general better to try to use system fonts when they're available: I guess most linux distros have Liberation or most of Windows setup have Times or Arial so having a list of possible substitutions would help to use them instead of loading them.

Agreed, most (if not all) environments should hopefully have the "standard" fonts available.

@theres-waldo
Copy link

What is the next step here? Is there anything I can do to help move this forward? I run into bug 1766039 pretty frequently, making it a challenge to use Firefox as the default PDF viewer on Linux.

@calixteman
Copy link
Contributor Author

This PR is superseeded by #16363.

@calixteman calixteman closed this Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants