Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ñ crashes pandoc #242

Closed
lselzer opened this issue Aug 28, 2014 · 25 comments
Closed

Ñ crashes pandoc #242

lselzer opened this issue Aug 28, 2014 · 25 comments

Comments

@lselzer
Copy link

lselzer commented Aug 28, 2014

Hi, the Ñ letter is very common in spanish. Using it crashes pandoc.


---
title: "Untitled"
author: "Luciano Selzer"
date: "Thursday, August 28, 2014"
output: ioslides_presentation

---

## Slide

Ñato

Using the knit button I get

pandoc.exe: Cannot decode byte '\xc3': Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
Error: pandoc document conversion failed with error 1
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] stats4    splines   stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
[1] pander_0.3.8   lsmeans_2.11   lawstat_2.4.1  VGAM_0.9-4     mvtnorm_1.0-0 
[6] ggplot2_1.0.0  magrittr_1.1.0 reshape2_1.4   knitr_1.6     

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.4     evaluate_0.5.5   formatR_1.0     
 [5] grid_3.1.1       gtable_0.1.2     htmltools_0.2.4  lattice_0.20-29 
 [9] markdown_0.7.4   MASS_7.3-33      Matrix_1.1-4     multcomp_1.3-6  
[13] munsell_0.4.2    plyr_1.8.1       proto_0.3-10     Rcpp_0.11.2     
[17] rmarkdown_0.2.64 sandwich_2.3-2   scales_0.2.4     slidify_0.4.5   
[21] stringr_0.6.2    survival_2.37-7  TH.data_1.0-3    tools_3.1.1     
[25] whisker_0.3-2    yaml_2.1.13      zoo_1.7-11      

I tried changing the encoding between ISO-8859-1 and UTF-8 but still fails.

I believe it's related to issues #170 and #134

Any workaround?

@lselzer
Copy link
Author

lselzer commented Aug 28, 2014

It seems that something funny is going on with encondings. I saved the file with UTF-8, knit it myself and then used the console to transform it to HTML and it works.

@jmcphers
Copy link
Member

If it's really the same issue as #134 (and it looks like it is) then you should only see it if you're making an ioslides_presentation. Do you get an error if you change the output format to html_document?

@lselzer
Copy link
Author

lselzer commented Aug 28, 2014

It works with html_document.

@lselzer
Copy link
Author

lselzer commented Aug 29, 2014

If it helps further. This works with ioslides_presentation

---
title: "Ñato"
author: "Luciano Selzer"
date: "Thursday, August 28, 2014"
output: ioslides_presentation
---

## Something Else

This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

## Slide with Bullets

- Bullet 1
- Bullet 2
- Bullet 3

@lselzer
Copy link
Author

lselzer commented Aug 29, 2014

This works too:

---
title: "Ñato"
author: "Luciano Selzer"
date: "Thursday, August 28, 2014"
output: ioslides_presentation
---

## Something Else


This works ñ

but this with Ñ doesn't work. Which totally beats me why it works in the title but it won't work anywhere else

At least gives me a workaround for now

@lselzer
Copy link
Author

lselzer commented Aug 29, 2014

I belive it's a pandoc issue, not an rmarkdown one. And it has to do with the use of custom templates. Using the default sample.lua also breaks this. And the latest pandoc version 1.13.0.1 also has this bug.

@yihui
Copy link
Member

yihui commented Aug 30, 2014

@lselzer There is an encoding bug in knitr 1.6, but I do not know if it is related to the issue here. Anyway, do you mind testing the current development version of knitr?

install.packages('knitr', repos = c('http://rforge.net', 'http://cran.rstudio.org'),
                 type = 'source')

@lselzer
Copy link
Author

lselzer commented Aug 31, 2014

I tried with the dev version and it still fails with the same error.
This minimal example triggers the error from pandoc.

sample.lua
1.md:

## a


This doesn't work Ñ
pandoc 1.Rmd -t sample.lua -f markdown

@yihui
Copy link
Member

yihui commented Sep 1, 2014

#170 has been solved by upgrading knitr to its development version (>= 1.6.15). For #134 and this issue, I cannot reproduce either one on Windows using the same locale.

@lselzer
Copy link
Author

lselzer commented Sep 18, 2014

I have tried it on another machine. Fresh Win7, R, and Rstudio. And still fails with Ñ. It's strange it doesn't fail when the Ñ is on the yaml header.

@i-ozkan
Copy link

i-ozkan commented Nov 21, 2014

Hi all,
I got the very same error for some Turkish character, ğ, ş etc..
Here is the minimal example:

I hope we can figure out the way to produce ioslides in Turkish also..

Thanks in advance!!

title: "Untitled"
author: "I. Ozkan"
date: "Friday, November 21, 2014"

output: ioslides_presentation

R Markdown

ğ

-----Here what I got:------

rmarkdown::render("hello.md")
C:/Users/win7/AppData/Local/Pandoc/pandoc hello.utf8.md --to html --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-implicit_figures --output hello.html --smart --email-obfuscation none --self-contained --variable transition=0.4 --template C:\Users\win7\Documents\R\win-library\3.1\rmarkdown\rmd\ioslides\default.html --mathjax --variable mathjax-url:https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML --variable ioslides-url=C:\Users\win7\Documents\R\win-library\3.1\rmarkdown\rmd\ioslides\ioslides-13.5.1
C:/Users/win7/AppData/Local/Pandoc/pandoc hello.utf8.md --to ioslides_presentation.lua --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-implicit_figures --output C:\Users\win7\AppData\Local\Temp\RtmpKci8et\ioslides-output1180755b36b5.html --data-dir .
pandoc.exe: Cannot decode byte '\xc3': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
Error: pandoc document conversion failed with error 1
In addition: Warning message:
running command 'C:/Users/win7/AppData/Local/Pandoc/pandoc hello.utf8.md --to ioslides_presentation.lua --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-implicit_figures --output C:\Users\win7\AppData\Local\Temp\RtmpKci8et\ioslides-output1180755b36b5.html --data-dir .' had status 1

----session info:------

sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Turkish_Turkey.1254 LC_CTYPE=Turkish_Turkey.1254
[3] LC_MONETARY=Turkish_Turkey.1254 LC_NUMERIC=C
[5] LC_TIME=Turkish_Turkey.1254

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] knitr_1.8.2

loaded via a namespace (and not attached):
[1] digest_0.6.4 evaluate_0.5.5 formatR_1.0
[4] htmltools_0.2.4 rmarkdown_0.2.64 stringr_0.6.2
[7] tools_3.1.0 yaml_2.1.13

@mattflor
Copy link

mattflor commented Feb 2, 2015

I got the same problems with German characters: lowercase Umlauts (ä, ö, ü) work fine but uppercase Umlauts (Ä, Ö, Ü) as well as the sharp s (ß) produce the \xc3 pandoc error.

I've created two sample presentations, one is an ioslides presentation which has the error, the other one is a Slidy presentation which works fine.

First, the ioslides presentation (utf8ioslides.Rmd):

---
title: "utf8ioslides"
output:
    ioslides_presentation:
        keep_md: true
---

## German Characters

- Lowercase Umlauts (ä, ö, ü) work fine
- Keeping any of the uppercase Umlauts (Ä, Ö, Ü) produces a pandoc error
- Sharp S (ß): pandoc error again

Here's the full error:

pandoc.exe: Cannot decode byte '\xc3': 
Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
Error: pandoc document conversion failed with error 1
In addition: Warning message:
running command '"C:/Program Files (x86)/User/R-Studio0981059/bin/pandoc/pandoc" Preview
-1608110e2893.utf8.md --to ioslides_presentation.lua --from markdown+autolink_bare_uris
+ascii_identifiers+tex_math_single_backslash-implicit_figures --output "C:\Users\flor\App
Data\Local\Temp\Rtmp4KQ3Vs\ioslides-output161037312b9.html"' had status 1 
Execution halted

Second, everything work fine with Slidy output (utf8slidy.Rmd):

---
title: "utf8slidy"
output:
    slidy_presentation:
        keep_md: true
---

## German Characters

- Umlauts
    - Lowercase: ä, ö, ü work fine
    - Uppercase: Ä, Ö, Ü work fine, too

- Sharp S: ß also ok

If I include sessionInfo() R code chunks in the presentations the output is the same for both ioslides and Slidy:

## R version 3.0.3 (2014-03-06)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] httr_0.6.1
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6    digest_0.6.8    evaluate_0.5.5  formatR_1.0    
##  [5] htmltools_0.2.6 knitr_1.9.1     RCurl_1.95-4.5  rmarkdown_0.5.1
##  [9] stringr_0.6.2   tools_3.0.3     yaml_2.1.13

Now, if I look at the file encodings I actually see a difference: In a MinGW console in the directory where I saved my presentation .Rmd files, I ran this:

$ file --mime-encoding *
utf8ioslides.Rmd:   utf-8
utf8ioslides.html:  us-ascii
utf8ioslides_files: binary
utf8slidy.Rmd:      utf-8
utf8slidy.html:     utf-8
utf8slidy.md:       utf-8

Note that I have configured RStudio to encode all source files as UTF-8.

And if I remove the Umlauts and the sharp s from the utf8ioslides.Rmd file so that everything runs through I get:

$ file --mime-encoding *
utf8ioslides.Rmd:   utf-8
utf8ioslides.html:  us-ascii
utf8ioslides.md:    utf-8
utf8ioslides_files: binary
utf8slidy.Rmd:      utf-8
utf8slidy.html:     utf-8
utf8slidy.md:       utf-8

The templates included in the rmarkdown package are located here on my machine:

/c/Program Files (x86)/User/R/R-3.0.3/library/rmarkdown/rmd/ioslides
/c/Program Files (x86)/User/R/R-3.0.3/library/rmarkdown/rmd/slidy

If I check file encodings in these directories I get this for isolides:

$ file --mime-encoding *
default.html:              us-ascii
ioslides-13.5.1:           binary
ioslides_presentation.lua: us-ascii

and this for Slidy:

$ file --mime-encoding *
Slidy2:       binary
default.html: us-ascii

So why is it that the default.html files in both templates are encoded as us-ascii but in the presentation directory the encoding becomes utf-8 for the Slidy presentation but stays us-ascii for the ioslides presentation? Could that be the problem? Also, the .lua file is encoded as us-ascii.

Sorry for the long post, but hopefully this helps somebody to figure out what goes wrong here...

@nkalvi
Copy link

nkalvi commented Apr 21, 2015

@i-ozkan @mattflor

I'm wondering whether this is similar to jgm/pandoc#2101
So, if you're running under Windows, could you please run some tests with the following command to change the codepage first (in the Windows command shell before running the actual command):

chcp 65001

@yihui
Copy link
Member

yihui commented Apr 21, 2015

It will be nice if everybody who had the problem could test the latest development version of rmarkdown (which aimed to fix the similar issue #134):

devtools::install_github('rstudio/rmarkdown')

@nkalvi
Copy link

nkalvi commented Apr 21, 2015

Ah, yes; should've checked it first.

@jmcphers you mentioned some chars fail the conversion even with the change. I'm curious as to which ones. I find these files helpful for testing:
https://github.com/bits/UTF-8-Unicode-Test-Documents

@mattflor
Copy link

@yihui Just github-installed the dev version of rmarkdown as suggested, and now my ioslides test presentation renders capital German umlauts (Ä, Ö, Ü) and the sharp s (ß) perfectly, without errors. Thanks!
Any idea when this will make it into the CRAN version?

@jjallaire
Copy link
Member

This should make it onto CRAN within the next 3-4 weeks.

J.J.

On Wed, Apr 22, 2015 at 4:56 AM, Matt Flor notifications@github.com wrote:

@yihui https://github.com/yihui Just github-installed the dev version
of rmarkdown as suggested, and now my ioslides test presentation renders
capital German umlauts (Ä, Ö, Ü) and the sharp s (ß) perfectly, without
errors. Thanks!
Any idea when this will make it into the CRAN version?


Reply to this email directly or view it on GitHub
#242 (comment).

@lselzer
Copy link
Author

lselzer commented Apr 22, 2015

I can confirm that the fix works for spanish characters. Thank you very much!!

@yihui yihui closed this as completed May 5, 2015
@ldeassis
Copy link

I still having this problem only when I am using strange characters in knitr::kable command and I am running a notebook.

Other situations it is working for me

@yihui
Copy link
Member

yihui commented Nov 17, 2016

@ldeassis Could you provide a minimal, self-contained, and reproducible example? Please also include your session info:

devtools::session_info('rmarkdown')

@ldeassis
Copy link

Code

  • Show All Code file:///C:/Users/leona/Documents/Survey%20Design/exemplo.nb.html
  • Hide All Code file:///C:/Users/leona/Documents/Survey%20Design/exemplo.nb.html
  • Download Rmd file:///C:/Users/leona/Documents/Survey%20Design/exemplo.nb.html

R Notebook

Hide

devtools::session_info('rmarkdown')

Session info ------------------------------------------------------

setting value

version R version 3.3.1 (2016-06-21)

system x86_64, mingw32

ui RStudio (1.0.44)

language (EN)

collate Portuguese_Brazil.1252

tz America/Sao_Paulo

date 2016-11-17

Packages ----------------------------------------------------------

package * version date

backports 1.0.4 2016-10-24

base64enc 0.1-3 2015-07-28

bitops 1.0-6 2013-08-17

caTools 1.17.1 2014-09-10

digest 0.6.10 2016-08-02

evaluate 0.10 2016-10-11

highr 0.6 2016-05-09

htmltools 0.3.5 2016-03-21

jsonlite 1.1 2016-09-14

knitr 1.15.1 2016-11-17

magrittr 1.5 2014-11-22

markdown 0.7.7 2015-04-22

mime 0.5 2016-07-07

Rcpp 0.12.7 2016-09-05

rmarkdown 1.1.9017 2016-11-17

rprojroot 1.1 2016-10-29

stringi 1.1.2 2016-10-01

stringr 1.1.0 2016-08-19

yaml 2.1.14 2016-11-12

source

CRAN (R 3.3.2)

CRAN (R 3.3.0)

CRAN (R 3.3.0)

CRAN (R 3.3.1)

CRAN (R 3.3.1)

CRAN (R 3.3.1)

CRAN (R 3.3.1)

CRAN (R 3.3.1)

CRAN (R 3.3.1)

local

CRAN (R 3.3.1)

CRAN (R 3.3.1)

CRAN (R 3.3.1)

CRAN (R 3.3.1)

Github (ace87ba)

CRAN (R 3.3.2)

CRAN (R 3.3.1)

CRAN (R 3.3.1)

CRAN (R 3.3.2)

The problem is when I use foreign chars in kable, take a look:

Hide

abc <- data.frame(

a=as.factor(c(rep("Não",10), rep("Também Não", 10))),

b= rnorm(20)

)

knitr::kable(abc)

Error in gregexpr(calltext, singleline, fixed = TRUE) :

regular expression is invalid UTF-8

pandoc.exe: Cannot decode byte '\xe3': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

running command '"C:\Program Files (x86)\Pandoc/pandoc" +RTS -K512m -RTS "C:\Users\leona\AppData\Local\Temp\Rtmp8oZDZD_rs_md_3a942a5b24ea.md" --to html --output "C:\Users\leona\Documents\Survey Design.Rproj.user\DC6E55D\unsaved-notebooks\A384F7A8\1\cz2xmjj6omkax_t_rs_html_3a942a18aa7.html"' had status 1Error: pandoc document conversion failed with error 1

LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KYGBge3J9DQpkZXZ0b29sczo6c2Vzc2lvbl9pbmZvKCdybWFya2Rvd24nKQ0KYGBgDQoNClRoZSBwcm9ibGVtIGlzIHdoZW4gSSB1c2UgZm9yZWlnbiBjaGFycyBpbiBrYWJsZSwgdGFrZSBhIGxvb2s6DQoNCmBgYHtyfQ0KYWJjIDwtIGRhdGEuZnJhbWUoDQogIGE9YXMuZmFjdG9yKGMocmVwKCJOw6NvIiwxMCksIHJlcCgiVGFtYsOpbSBOw6NvIiwgMTApKSksDQogIGI9IHJub3JtKDIwKQ0KKQ0KDQprbml0cjo6a2FibGUoYWJjKQ0KYGBg

De: Yihui Xie [mailto:notifications@github.com]
Enviada em: quarta-feira, 16 de novembro de 2016 23:42
Para: rstudio/rmarkdown rmarkdown@noreply.github.com
Cc: Leonard de Assis lmassis@gmail.com; Mention mention@noreply.github.com
Assunto: Re: [rstudio/rmarkdown] Ñ crashes pandoc (#242)

@ldeassis https://github.com/ldeassis Could you provide a minimal, self-contained, and reproducible example? Please also include your session info:

devtools::session_info('rmarkdown')


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #242 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AAeZjTRSPC5RYBhrP1W6iRkcVwjiysBXks5q-7DkgaJpZM4CcTlO . https://github.com/notifications/beacon/AAeZjZmOr4z9rjEIh1ClHj5E24rxCWRyks5q-7DkgaJpZM4CcTlO.gif

@yihui
Copy link
Member

yihui commented Nov 17, 2016

@ldeassis I only need the Rmd file when you send your reproducible example, which can simply be pasted here or uploaded on Github as a zip file. You can come to Github and see how your reply looks like (not very readable).

Anyway, I cannot reproduce your issue with this example:

---
output: html_notebook
---

```{r tidy=FALSE}
abc <- data.frame(
  a=as.factor(c(rep("Não",10), rep("Também Não", 10))),
  b= rnorm(20)
)

knitr::kable(abc)
```

I saved it as test.Rmd with the UTF-8 encoding, and rendered with rmarkdown::render('test.Rmd', encoding = 'UTF-8').

@ldeassis
Copy link

The problem, I guess, is not with render command itself, it is on ‘Preview’ command in rstudio, I tested again using render and … no error, but, when I use rstudio ‘preview’ command, I can find the problem using your code, by the way

Leonard

De: Yihui Xie [mailto:notifications@github.com]
Enviada em: quinta-feira, 17 de novembro de 2016 18:43
Para: rstudio/rmarkdown rmarkdown@noreply.github.com
Cc: Leonard de Assis lmassis@gmail.com; Mention mention@noreply.github.com
Assunto: Re: [rstudio/rmarkdown] Ñ crashes pandoc (#242)

@ldeassis https://github.com/ldeassis I only need the Rmd file when you send your reproducible example, which can simply be pasted here or uploaded on Github as a zip file. You can come to Github and see how your reply looks like (not very readable).

Anyway, I cannot reproduce your issue with this example:


output: html_notebook

abc <- data.frame(
  a=as.factor(c(rep("Não",10), rep("Também Não", 10))),
  b= rnorm(20)
)

knitr::kable(abc)

I saved it as test.Rmd with the UTF-8 encoding, and rendered with rmarkdown::render('test.Rmd', encoding = 'UTF-8').


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #242 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AAeZjVCFRSJXcVrnQlJDhvcyYAcp_25Iks5q_Lw8gaJpZM4CcTlO . https://github.com/notifications/beacon/AAeZjVU2bCw_o65DvrqMzJTNorXxO0Fdks5q_Lw8gaJpZM4CcTlO.gif

@yihui
Copy link
Member

yihui commented Nov 18, 2016

@ldeassis Okay. I think this is the same issue as #860, which must have been fixed in the latest daily version of RStudio (see #860 for the link).

@github-actions
Copy link

github-actions bot commented Nov 3, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants