Skip to content
Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
272 lines (231 sloc) 10.5 KB
---
title: "Manipulating the nested parse table"
author: "Lorenz Walthert"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Manipulating the nested parse table}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
> This vignette is partly outdated since nested structure was implemented
> completely. In particular, the serialization is now done differently.
library("dplyr")
library("purrr")
pkgload::load_all()
This vignette builds on the vignette "Data Structures" and discusses how
to go forward with the nested structure of the parse data.
In order to compute the white space information in a nested data structure, we
use a [visitor approach](https://en.wikipedia.org/wiki/Visitor_pattern) to
separate the algorithm (computing white space information and later apply
transformations) from the object (nested data structure).
The function `create_filler()` (name depreciated, now called
`default_style_guide_attributes()`) can then be used to compute current
white space information on every level of nesting within the nested parse data
if applied in combination with the visitor. In the sequel, a parse table at
one level of nesting will be denoted with the term *nest*, which always
represents a complete expression. Our visiting functions `pre_visit()` and
`post_visit()` take an object to
operate on and a list of functions. Concretely, the object is the
nested parse table. Each function is applied at each level of
nesting nesting before the next level of nesting is entered. You can find out
more about the visitor on the help file for `visit` (note that this function
is not exported by styler).
pre_visit
## function(pd_nested, funs) {
## if (is.null(pd_nested)) return()
## pd_transformed <- visit_one(pd_nested, funs)
##
## pd_transformed$child <- map(pd_transformed$child, pre_visit, funs = funs)
## pd_transformed
## }
## <environment: namespace:styler>
visit_one
## function(pd_flat, funs) {
## reduce(funs, function(x, fun) fun(x),
## .init = pd_flat)
## }
## <environment: namespace:styler>
This comes with two advantages.
* We go through the whole structure only as many times as we call the visitor
(instead of every *_nested() function going through it once, which is more
efficient in terms of speed.
- We don't need a \*\_nested() version of every function we want to
apply to the parse tables, in particular the rules in R/rules.R
- We go through the whole structure only as many times as we call the visitor
(instead of every
\*\_nested() function going through it once), which is more efficient
in terms of speed.
`create_filler()` was adapted to also initialize indention and
lag\_newlines.
create_filler
## function(pd_flat) {
##
## pd_flat$line3 <- lead(pd_flat$line1, default = tail(pd_flat$line2, 1))
## pd_flat$col3 <- lead(pd_flat$col1, default = tail(pd_flat$col2, 1) + 1L)
## pd_flat$newlines <- pd_flat$line3 - pd_flat$line2
## pd_flat$lag_newlines <- lag(pd_flat$newlines, default = 0L)
## pd_flat$col2_nl <- if_else(pd_flat$newlines > 0L, 0L, pd_flat$col2)
## pd_flat$spaces <- pd_flat$col3 - pd_flat$col2_nl - 1L
## pd_flat$multi_line <- ifelse(pd_flat$terminal, FALSE, NA)
## pd_flat$indention_ref_id <- NA
## ret <- pd_flat[, !(names(pd_flat) %in% c("line3", "col3", "col2_nl"))]
##
##
## if (!("indent" %in% names(ret))) {
## ret$indent <- 0
## }
##
## if (any(ret$spaces < 0L)) {
## stop("Invalid parse data")
## }
##
## ret
## }
## <environment: namespace:styler>
code <- "a <- function(x) { if(x > 1) { 1+1 } else {x} }"
pd_nested <- compute_parse_data_nested(code)
pd_nested_enhanced <- pre_visit(pd_nested, c(create_filler))
pd_nested_enhanced
## # A tibble: 1 x 20
## line1 col1 line2 col2 id parent token terminal text short
## <int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr>
## 1 1 1 1 47 49 0 expr FALSE
## # ... with 10 more variables: token_before <chr>, token_after <chr>,
## # internal <lgl>, child <list>, newlines <int>, lag_newlines <int>,
## # spaces <int>, multi_line <lgl>, indention_ref_id <lgl>, indent <dbl>
As a next step, we need to find a way to serialize the nested tibble, or
in other words, to transform it to its character vector representation.
As a starting point, consider the function `serialize` that was
introduced in the vignette "Data Structures".
serialize <- function(x) {
out <- Map(
function(terminal, text, child) {
if (terminal)
text
else
serialize(child)
},
x$terminal, x$text, x$child
)
out
}
serialize(pd_nested) %>% unlist
## [1] "a" "<-" "function" "(" "x" ")"
## [7] "{" "if" "(" "x" ">" "1"
## [13] ")" "{" "1" "+" "1" "}"
## [19] "else" "{" "x" "}" "}"
`serialize` can be combined with `serialize_parse_data_flat`. The latter
pastes together the column "text" of a flat parse table by taking into
account space and line break information, splits the string by line
break and returns it.
serialize_parse_data_flat
## function(pd_flat) {
## pd_flat %>%
## summarize_(
## text_ws = ~paste0(
## text, newlines_and_spaces(newlines, spaces),
## collapse = "")) %>%
## .[["text_ws"]] %>%
## strsplit("\n", fixed = TRUE) %>%
## .[[1L]]
## }
## <environment: namespace:styler>
However, things get a bit more complicated, mainly because line break
and white space information is not only contained in the terminal
tibbles of the nested parse data, but even before, as the following
example shows.
pd_nested_enhanced$child[[1]]
## # A tibble: 3 x 20
## line1 col1 line2 col2 id parent token terminal text short
## <int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr>
## 1 1 1 1 1 3 49 expr FALSE
## 2 1 3 1 4 2 49 LEFT_ASSIGN TRUE <- <-
## 3 1 6 1 47 48 49 expr FALSE
## # ... with 10 more variables: token_before <chr>, token_after <chr>,
## # internal <lgl>, child <list>, newlines <int>, lag_newlines <int>,
## # spaces <int>, multi_line <lgl>, indention_ref_id <lgl>, indent <dbl>
pd_nested_enhanced$child[[1]]$child[[1]]
## # A tibble: 1 x 20
## line1 col1 line2 col2 id parent token terminal text short
## <int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr>
## 1 1 1 1 1 1 3 SYMBOL TRUE a a
## # ... with 10 more variables: token_before <chr>, token_after <chr>,
## # child <list>, internal <lgl>, newlines <int>, lag_newlines <int>,
## # spaces <int>, multi_line <lgl>, indention_ref_id <lgl>, indent <dbl>
After "a" in `code`, there is a space, but this information is not
contained in the tibble where we find the terminal "a". In general, we
must add newlines and spaces values *after* we computed character vector
representation of the expression. In our example: we know that there is
a space after the non-terminal "a" by looking at
`pd_nested_enhanced$child[[1]]`. Therefore, we need to add this space to
the very last terminal within `pd_nested_enhanced$child[[1]]` before we
collapse everything together.
serialize_parse_data_nested_helper
## function(pd_nested, pass_indent) {
## out <- pmap(list(pd_nested$terminal, pd_nested$text, pd_nested$child,
## pd_nested$spaces, pd_nested$lag_newlines, pd_nested$indent),
## function(terminal, text, child, spaces, lag_newlines, indent) {
## total_indent <- pass_indent + indent
## preceding_linebreak <- if_else(lag_newlines > 0, 1, 0)
## if (terminal) {
## c(add_newlines(lag_newlines),
## add_spaces(total_indent * preceding_linebreak),
## text,
## add_spaces(spaces))
## } else {
## c(add_newlines(lag_newlines),
## add_spaces(total_indent * preceding_linebreak),
## serialize_parse_data_nested_helper(child, total_indent),
## add_spaces(spaces))
## }
## }
## )
## out
## }
## <environment: namespace:styler>
serialize_parse_data_nested
## function(pd_nested) {
## out <- c(add_newlines(start_on_line(pd_nested) - 1),
## serialize_parse_data_nested_helper(pd_nested, pass_indent = 0)) %>%
## unlist() %>%
## paste0(collapse = "") %>%
## strsplit("\n", fixed = TRUE) %>%
## .[[1L]] %>%
## trimws(which = "right")
## out
## }
## <environment: namespace:styler>
Before we are done, we need to add information regarding indention to
the parse table. We can add indention after every line break that comes
after a round bracket with `indent_round()`. And then serialize it.
pre_visit(pd_nested,
c(create_filler,
purrr::partial(indent_round, indent_by = 2)))
## # A tibble: 1 x 20
## line1 col1 line2 col2 id parent token terminal text short
## <int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr>
## 1 1 1 1 47 49 0 expr FALSE
## # ... with 10 more variables: token_before <chr>, token_after <chr>,
## # internal <lgl>, child <list>, newlines <int>, lag_newlines <int>,
## # spaces <int>, multi_line <lgl>, indention_ref_id <lgl>, indent <dbl>
We can see how indention works with a more complicated example
indented <- c(
"call(",
" 1,",
" call2(",
" 2, 3,",
" call3(1, 2, 22),",
" 5",
" ),",
" 144",
")"
)
not_indented <- trimws(indented)
back_and_forth <- not_indented %>%
compute_parse_data_nested() %>%
pre_visit(c(create_filler,
purrr::partial(indent_round, indent_by = 2))) %>%
serialize_parse_data_nested()
identical(indented, back_and_forth)
## [1] TRUE
You can’t perform that action at this time.