Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode characters replaced with Unicode black square (U+25a0) #1125

Closed
rogerbinns opened this issue Mar 7, 2023 · 2 comments
Closed

Unicode characters replaced with Unicode black square (U+25a0) #1125

rogerbinns opened this issue Mar 7, 2023 · 2 comments

Comments

@rogerbinns
Copy link

Description of problem

Have an rst file with Unicode characters such as line drawing or the front page of wikipedia. Once converted to pdf, those characters are replaced with black square.

Note that this is not the pdf viewer doing rendering that way. If you copy the text out of the pdf viewer it really is the black square codepoints. Tested on both the default Linux pdf viewer and Apple's iPad PDF viewer.

My encoding is UTF8 and I also tried adding that to code-block which made no difference. I can't find anything in the manual or searches giving a solution.

What is the expected output? What do you see instead?

This file has some line drawing and then text from wikipedia. (Github wouldn't let me upload uncompressed).

foo.rst.gz

Also in correctly rendering gist

This is what it looks like if rst2html is run:

image

And this is page 1 from running rstpdf:

image

And page 2 with wikipedia text:

image

This is copy and paste from html output:

Polski العربية Deutsch English Español Français Italiano مصرى Nederlands 日本語

This is copy and paste from PDF:

Polski ■■■■■■■ Deutsch English Español Français Italiano ■■■■ Nederlands ■■■

🖥 Versions

python -V
Python 3.10.7
pip freeze | grep rst2pdf
rst2pdf==0.99
pip freeze | grep reportlab
reportlab==3.6.12

Which operating system are you using?

Ubuntu 22.10

@lornajane
Copy link
Contributor

Thanks for the clear bug report, this is always appreciated here! I am not sure what to suggest, except maybe a font with more symbol coverage perhaps? Perhaps someone with more expertise in this area could chime in to assist.

@akrabat
Copy link
Member

akrabat commented Apr 24, 2023

This is related to fonts. The standard PDF fonts have a very limited set of glyphs, so you need to provide your own for the glyphs you need. For your example, I grabbed unifont and then created a stylesheet:

foo.yaml:

---

# Download unifont-15.0.01.ttf from http://www.unifoundry.com/unifont/index.html 
# and rename to unifont-15.0.01.ttf
embeddedFonts: 
  - [unifont.ttf, unifont.ttf, unifont.ttf, unifont.ttf]

fontsAlias: 
  fontSans: unifont

You can now create your PDF using rst2pdf foo.rst -s foo.yaml and it will generate foo.pdf with the correct glyphs for nearly all the characters:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants