Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quarto render error #7029

Closed
Damonsoul opened this issue Sep 27, 2023 · 18 comments · Fixed by #7948
Closed

Quarto render error #7029

Damonsoul opened this issue Sep 27, 2023 · 18 comments · Fixed by #7948
Labels
bug Something isn't working windows
Milestone

Comments

@Damonsoul
Copy link

Bug description

Quarto render error when qmd document contain Chinese at some situation

Steps to reproduce

https://github.com/Damonsoul/quartotest

Expected behavior

both t1.qmd and t2.qmd render successful

Actual behavior

t1.qmd render failed and return ����: lexical error: invalid char in json text.
{r}\r\nt = "娴嬭瘯娴媆"\r\n```\r\n"},"results":"C:\Us
(right here) ------^
ִֹͣ��

t2.qmd render successful

Your environment

"Mountain Hydrangea" Release (583b465e, 2023-06-05) for windows
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) RStudio/2023.06.0+421 Chrome/110.0.5481.208 Electron/23.3.0 Safari/537.36

Quarto check output

$ quarto check
Quarto 1.4.386
[>] Checking versions of quarto binary dependencies...
Pandoc version 3.1.8: OK
Dart Sass version 1.55.0: OK
Deno version 1.33.4: OK
[>] Checking versions of quarto dependencies......OK
[>] Checking Quarto installation......OK
Version: 1.4.386
Path: C:\Program Files\Quarto\bin
CodePage: 936

[>] Checking tools....................OK
TinyTeX: (external install)
Chromium: 869685

[>] Checking LaTeX....................OK
Using: TinyTex
Path: C:\Users\Administrator\AppData\Roaming\TinyTeX\bin\windows
Version: 2023

[>] Checking basic markdown render....OK

[>] Checking Python 3 installation....OK
Version: 3.9.16 (Conda)
Path: E:/SoftwareENU/Anaconda/python.exe
Jupyter: 5.3.0
Kernels: python3

[>] Checking Jupyter engine render....OK

[>] Checking R installation...........OK
Version: 4.3.1
Path: E:/SoftwareENU/R
LibPaths:
- E:/SoftwareENU/R/library
knitr: 1.43
rmarkdown: 2.23

[>] Checking Knitr engine render......OK

@Damonsoul Damonsoul added the bug Something isn't working label Sep 27, 2023
@cderv
Copy link
Collaborator

cderv commented Sep 27, 2023

I can't reproduce by

  • cloning your repo
  • running quarto render on the files.

They all render fine.

I am on

  • Windows 11x64 (build 22621)
  • R 4.3.1
  • Quarto 1.4.386
  • knitr: 1.44
  • rmarkdown: 2.25

Can you update knitr and rmarkdown just in case ?

Possibly related to encoding issue, or maybe a Locale issue... 🤔

@cderv cderv added the needs-repro Issues that are blocked until reporter provides an adequate reproduction label Sep 27, 2023
@Damonsoul
Copy link
Author

update knitr and rmarkdown:

  • rmarkdown: 2.25
  • knitr: 1.44

t2.qmd still render failed

I added another three example at https://github.com/Damonsoul/quartotest
t3.qmd t4.Rmd render fine and the result is right. t3.qmd use jupyter kernel and t4 is a rmd.

t5.qmd render success but the result(t5.html also at https://github.com/Damonsoul/quartotest) has something wrong

The error occurred on a Windows machine with codepage 936. However, when I rendered the documents on a Windows computer with utf-8, all of them were rendered correctly and produced the correct results.
@cderv

@mcanouil
Copy link
Collaborator

mcanouil commented Sep 27, 2023

From what you are saying the issue is that you are not using UTF-8 which is the encoding used for both input and output by Pandoc thus Quarto, see < https://pandoc.org/MANUAL.html#character-encoding>.

@Damonsoul
Copy link
Author

The encoding of my qmd document is utf-8. The issue I'm facing is that when rendering using Quarto in a non-utf-8 encoded Windows system, it can be successfully rendered and produces the correct results in some cases (such as t1.qmd), but not in others (such as t2.qmd). The iconv program is used to convert the encoding of text,but my document is already in utf-8 format.

@cderv
Copy link
Collaborator

cderv commented Sep 27, 2023

Thanks for the additional information @Damonsoul.

Codepage 936 is for chinese character on Windows I believe. Can you share results of chcp in BAT and/or powershell windows ?

On my Windows, checking chcp returns 1250. It is not UTF-8 but it renders ok. What is important I think could be that file are encoded to UTF-8.

  • Is this possible that your input file are not encoded in UTF-8 ? Can you check the file encoding and retry ?
    When I clone your repo and open in VSCODE all fils are encoded UTF-8.
  • Can you check that too ?

on a Windows computer with utf-8

What do you mean by utf-8 computer ? What is the codepage there ?

@Damonsoul
Copy link
Author

Thanks for reply @cderv
chcp returns :
(base) PS C:\Users\Administrator> chcp
活动代码页: 936

All documents are encoded UTF-8.

on a Windows computer with utf-8

The codepage is 65001,Windows 10

@Damonsoul
Copy link
Author

Damonsoul commented Sep 27, 2023

My win10 system default codepage is 936,and cause the issue I mentioned above.windows 10 offer a way to change codpage,
see https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#set-a-process-code-page-to-utf-8 ,when I change system codepage from 936 to 65001,the qmd render fine.But if I change back to 936,It fails again. @cderv

@cderv
Copy link
Collaborator

cderv commented Sep 27, 2023

Thanks for digging through this. My code page is not UTF-8 and I don't have this option activated.
I don't think we require users to use this "Beta: Use Unicode UTF-8 for worldwide language support." feature.

so there must be something we could do. I would like to understand what part of our code is throwing this. It is possible that this is not link to Pandoc UTF-8 requirement at all.

Can you run again after activating stack trace ? https://quarto.org/docs/troubleshooting/#get-a-stack-trace

This will tell us where this fail exactly; The error seems linked to a JSON parsing issue

@cderv cderv added needs-repro Issues that are blocked until reporter provides an adequate reproduction windows and removed needs-repro Issues that are blocked until reporter provides an adequate reproduction labels Sep 27, 2023
@Damonsoul
Copy link
Author

Damonsoul commented Sep 27, 2023

Thanks @cderv
As follow:
(base) PS C:\Users\Administrator> $env:QUARTO_PRINT_STACK = "true"
(base) PS C:\Users\Administrator> cd E:\R\Rproject\quartotest
(base) PS E:\R\Rproject\quartotest> quarto render t1.qmd
����: lexical error: invalid char in json text.
{r}\r\nt = "娴嬭瘯娴媆"\r\n```\r\n"},"results":"C:\Us
(right here) ------^
ִֹͣ��
ERROR: Error
at renderFiles (file:///C:/Program%20Files/Quarto/bin/quarto.js:75885:29)
at eventLoopTick (ext:core/01_core.js:181:11)
at async render (file:///C:/Program%20Files/Quarto/bin/quarto.js:80860:21)
at async Command.fn (file:///C:/Program%20Files/Quarto/bin/quarto.js:81037:32)
at async Command.execute (file:///C:/Program%20Files/Quarto/bin/quarto.js:8111:13)
at async quarto (file:///C:/Program%20Files/Quarto/bin/quarto.js:111534:5)
at async file:///C:/Program%20Files/Quarto/bin/quarto.js:111552:9

@dragonstyle dragonstyle added this to the v1.5 milestone Nov 28, 2023
@Damonsoul
Copy link
Author

@cderv I changed the 'rmd.R' in Quarto 1.4.527 at line 140 from " input <- readLines(stdin, warn = FALSE")" to " input <- readLines(stdin, warn = FALSE,encoding = "UTF-8")",and it works well

@cderv
Copy link
Collaborator

cderv commented Dec 18, 2023

Oh interesting ! Let me check that !

@cderv
Copy link
Collaborator

cderv commented Dec 18, 2023

@Damonsoul I opened a PR with a change. Are you able to try dev version of Quarto ? (https://github.com/quarto-dev/quarto-cli#development-version)

Otherwise, we'll do a built, or you can add my change locally as you did. I do think we need to set encoding to the file() function too to be safe.

@cderv cderv modified the milestones: v1.5, v1.4 Dec 18, 2023
@Damonsoul
Copy link
Author

@cderv It raised error as follow if I add "encoding = "UTF-8"" to file()
"""
����: lexical error: invalid bytes in UTF8 string.
:"Ŀ¼","toc-title-website":"��ҳ������","related-formats-ti
(right here) ------^
����: Warning message:
In xfun::read_utf8(stdin, error = FALSE) :
These lines contain invalid UTF-8 characters: 1
ִֹͣ��
"""

@cderv
Copy link
Collaborator

cderv commented Dec 18, 2023

Oh ok. Thank you for trying. Could it be that the stdin is not UTF-8 then... Can you tell me what is getOption("encoding") on your system ?

@Damonsoul
Copy link
Author

@cderv
getOption("encoding") return :

getOption("encoding")
[1] "native.enc"

and my sys encoding is cp936 as follow:
"
C:\Users\Administrator>chcp
活动代码页: 936
"

@cderv cderv linked a pull request Dec 18, 2023 that will close this issue
@cderv
Copy link
Collaborator

cderv commented Dec 19, 2023

Thanks a lot. I did tweak to simple

  stdin <- file("stdin", "r", encoding = "")
  input <- readLines(stdin, warn = FALSE, encoding = "UTF-8")

Hopefully it works.

I see you are using R 4.3.1. Do you have an older version of R available like R 4.0.5 to test ? On Windows they changed in 4.1 to be UTF-8 by default, so I wonder if there is a difference. If you are willing to help test, rig (https://github.com/r-lib/rig) can help you install several R versions

@Damonsoul
Copy link
Author

@cderv I tested the configuration of stdin <- file("stdin", "r", encoding = "") and input <- readLines(stdin, warn = FALSE, encoding = "UTF-8") on R4.05, and it ran successfully.Thank you for fixing

@cderv
Copy link
Collaborator

cderv commented Dec 20, 2023

Awesome! Thanks for confirming !

And really thanks for the report. We may have other bad side effect due to encoding, so please do not hesitate to open new issues to warn us. Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants