Skip to content

Code with very long string gets modified during highlight() #128

@cderv

Description

@cderv

This comes from a report in rstudio/distill#442 where someone has a chunk with a very long string (more than 1000 chars).

It seems R has a limitation in getParseData() where string with more than 1000 chars are not stored in the result and replace by something like [1011 chars quoted with ''']. This means that the highlighted code by downlit::highlight() is not the correct one.

See below a reprex:

val <- paste0(rep('very', 200), collapse = " ")
text <- sprintf(" x <- '%s long string'", val)
downlit::highlight(text)
#> [1] " <span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='s'>[1011 chars quoted with ''']</span>"
utils::getParseData(parse(text = text))
#>   line1 col1 line2 col2 id parent       token terminal
#> 7     1    2     1 1019  7      0        expr    FALSE
#> 1     1    2     1    2  1      3      SYMBOL     TRUE
#> 3     1    2     1    2  3      7        expr    FALSE
#> 2     1    4     1    5  2      7 LEFT_ASSIGN     TRUE
#> 4     1    7     1 1019  4      6   STR_CONST     TRUE
#> 6     1    7     1 1019  6      7        expr    FALSE
#>                           text
#> 7                             
#> 1                            x
#> 3                             
#> 2                           <-
#> 4 [1011 chars quoted with ''']
#> 6

val <- paste0(rep('very', 100), collapse = " ")
text <- sprintf(" x <- '%s long string'", val)
downlit::highlight(text)
#> [1] " <span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='s'>'very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very long string'</span>"
utils::getParseData(parse(text = text))
#>   line1 col1 line2 col2 id parent       token terminal
#> 7     1    2     1  519  7      0        expr    FALSE
#> 1     1    2     1    2  1      3      SYMBOL     TRUE
#> 3     1    2     1    2  3      7        expr    FALSE
#> 2     1    4     1    5  2      7 LEFT_ASSIGN     TRUE
#> 4     1    7     1  519  4      6   STR_CONST     TRUE
#> 6     1    7     1  519  6      7        expr    FALSE
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                text
#> 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 x
#> 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
#> 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <-
#> 4 'very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very long string'
#> 6

Created on 2022-03-01 by the reprex package (v2.0.1)

This could be an edge case of the usage of downlit because such long string are not encounter often, but this is now a known limitation documented in this issue.

Maybe this could be fixed somehow if highlight() was not using the parsed$data$text but directly the parsed$text or providing text. I did not try to look deeper in why one text is used and not the other.

Probably not high priority, deactivating downlit in distill is the workaround that can be used to avoid the initial issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions