Unicode characters replaced with Unicode black square (U+25a0) #1125

rogerbinns · 2023-03-07T18:51:53Z

Description of problem

Have an rst file with Unicode characters such as line drawing or the front page of wikipedia. Once converted to pdf, those characters are replaced with black square.

Note that this is not the pdf viewer doing rendering that way. If you copy the text out of the pdf viewer it really is the black square codepoints. Tested on both the default Linux pdf viewer and Apple's iPad PDF viewer.

My encoding is UTF8 and I also tried adding that to code-block which made no difference. I can't find anything in the manual or searches giving a solution.

What is the expected output? What do you see instead?

This file has some line drawing and then text from wikipedia. (Github wouldn't let me upload uncompressed).

foo.rst.gz

Also in correctly rendering gist

This is what it looks like if rst2html is run:

And this is page 1 from running rstpdf:

And page 2 with wikipedia text:

This is copy and paste from html output:

Polski العربية Deutsch English Español Français Italiano مصرى Nederlands 日本語

This is copy and paste from PDF:

Polski ■■■■■■■ Deutsch English Español Français Italiano ■■■■ Nederlands ■■■

🖥 Versions

python -V
Python 3.10.7
pip freeze | grep rst2pdf
rst2pdf==0.99
pip freeze | grep reportlab
reportlab==3.6.12

Which operating system are you using?

Ubuntu 22.10

The text was updated successfully, but these errors were encountered:

lornajane · 2023-04-23T19:24:58Z

Thanks for the clear bug report, this is always appreciated here! I am not sure what to suggest, except maybe a font with more symbol coverage perhaps? Perhaps someone with more expertise in this area could chime in to assist.

akrabat · 2023-04-24T08:12:40Z

This is related to fonts. The standard PDF fonts have a very limited set of glyphs, so you need to provide your own for the glyphs you need. For your example, I grabbed unifont and then created a stylesheet:

foo.yaml:

---

# Download unifont-15.0.01.ttf from http://www.unifoundry.com/unifont/index.html 
# and rename to unifont-15.0.01.ttf
embeddedFonts: 
  - [unifont.ttf, unifont.ttf, unifont.ttf, unifont.ttf]

fontsAlias: 
  fontSans: unifont

You can now create your PDF using rst2pdf foo.rst -s foo.yaml and it will generate foo.pdf with the correct glyphs for nearly all the characters:

akrabat closed this as completed Apr 24, 2023

lornajane mentioned this issue Dec 6, 2023

Serbian Cyrillic letters not supported #1182

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode characters replaced with Unicode black square (U+25a0) #1125

Unicode characters replaced with Unicode black square (U+25a0) #1125

rogerbinns commented Mar 7, 2023

lornajane commented Apr 23, 2023

akrabat commented Apr 24, 2023

Unicode characters replaced with Unicode black square (U+25a0) #1125

Unicode characters replaced with Unicode black square (U+25a0) #1125

Comments

rogerbinns commented Mar 7, 2023

lornajane commented Apr 23, 2023

akrabat commented Apr 24, 2023