is there a way to distinguish the regular font and pseudo bold in pymupdf #2881

zhangxiaojiawow · 2023-12-12T09:20:52Z

zhangxiaojiawow
Dec 12, 2023

Is your feature request related to a problem? Please describe.
There are some text which is pseudo bold font in my pdf file, I want to distinguish it with the regular font text. currently, using the font size，flag and font name, I can't achieve this goal.

Answered by JorjMcKie

Dec 12, 2023

This is a Discussions item and no issue.

There are multiple methods to let non-bold text appear bold:

Write the same text twice, with a small offset the second time. The PDF author may have done this character-by-character or in greater chunks (word-by-word, line-by-line, etc.). This can be detected as doubled text in extractions, ("aaabbbccc", or "abcabcabc" etc.) and (largely) overlapping text positions.
Thicken the single character borders: normally characters are written by only filling its interior with a fill color. In addition, one can add a stroke color to also write the characters' border lines. This cannot be detected in normal text extraction (and this will not change in the f…

View full answer

JorjMcKie · 2023-12-12T10:14:40Z

JorjMcKie
Dec 12, 2023
Maintainer

This is a Discussions item and no issue.

There are multiple methods to let non-bold text appear bold:

Write the same text twice, with a small offset the second time. The PDF author may have done this character-by-character or in greater chunks (word-by-word, line-by-line, etc.). This can be detected as doubled text in extractions, ("aaabbbccc", or "abcabcabc" etc.) and (largely) overlapping text positions.
Thicken the single character borders: normally characters are written by only filling its interior with a fill color. In addition, one can add a stroke color to also write the characters' border lines. This cannot be detected in normal text extraction (and this will not change in the foreseeable future). But you can use page.get_texttrace(). This is a low level text extraction, which returns separate text spans for the fill color version of text and the stroke color version. Each spans contains a "type" key in the dictionary, which is normally 0 (fill color) or "1" (stroke color). Please consult its documentation.

Here is an image explaining point 2:

Imagine both colors are black, then render mode 2 would appear like bold.

1 reply

zhangxiaojiawow Dec 12, 2023
Author

Hello, @JorjMcKie :
Thanks for your quick reply, this should be a discussion. Your method is valid in my case(actually the second situations).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is there a way to distinguish the regular font and pseudo bold in pymupdf #2881

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

is there a way to distinguish the regular font and pseudo bold in pymupdf #2881

zhangxiaojiawow Dec 12, 2023

Replies: 1 comment · 1 reply

JorjMcKie Dec 12, 2023 Maintainer

zhangxiaojiawow Dec 12, 2023 Author

zhangxiaojiawow
Dec 12, 2023

Replies: 1 comment 1 reply

JorjMcKie
Dec 12, 2023
Maintainer

zhangxiaojiawow Dec 12, 2023
Author