Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse 0x consistently #761

Merged
merged 14 commits into from Mar 30, 2021
Merged

Parse 0x consistently #761

merged 14 commits into from Mar 30, 2021

Conversation

lorenzwalthert
Copy link
Collaborator

@lorenzwalthert lorenzwalthert commented Mar 25, 2021

Closes #760.

  • cache_applying: 0.04 -> 0.04 (0.1%)
  • cache_recording: 1.28 -> 1.28 (0.1%)
  • without_cache: 3.95 -> 3.95 (0.1%)

fc31e73:

  • cache_applying: 0.03 -> 0.03 (0.1%)
  • cache_recording: 1.15 -> 1.16 (0.5%)
  • without_cache: 3.33 -> 3.36 (0.7%)

d4449cb

  • cache_applying: 0.03 -> 0.03 (-2.5%)
  • cache_recording: 1 -> 1 (0%)
  • without_cache: 2.91 -> 2.88 (-1%)

348298e

  • cache_applying: 0.03 -> 0.03 (0%)
  • cache_recording: 1.16 -> 1.16 (-0.5%)
  • without_cache: 3.46 -> 3.41 (-1.6%)

@codecov-io
Copy link

codecov-io commented Mar 25, 2021

Codecov Report

Merging #761 (348298e) into master (c048223) will decrease coverage by 0.00%.
The diff coverage is 90.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #761      +/-   ##
==========================================
- Coverage   90.57%   90.56%   -0.01%     
==========================================
  Files          47       47              
  Lines        2397     2396       -1     
==========================================
- Hits         2171     2170       -1     
  Misses        226      226              
Impacted Files Coverage Δ
R/parse.R 85.39% <90.00%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c048223...348298e. Read the comment docs.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

R/parse.R Outdated
Comment on lines 221 to 224
candidate_substring <- substr(
pd$text[is_problematic_num_const], 1L, 2L
)
is_problematic_num_const[is_problematic_num_const] <- candidate_substring == "0x"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A regex is shorter and most likely runs faster than creating and checking substrings:

Suggested change
candidate_substring <- substr(
pd$text[is_problematic_num_const], 1L, 2L
)
is_problematic_num_const[is_problematic_num_const] <- candidate_substring == "0x"
is_problematic_num_const[is_problematic_num_const] <- grepl("^0x", pd$text[is_problematic_num_const])

Same for ..._strings() above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...

R/parse.R Outdated
is_parent_of_problematic_string <-
pd$id %in% problematic_strings$parent
pd$id %in% problematictext$parent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscores have gone missing here, is this intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, reverted.

is_insufficiently_parsed_string(pd),
is_insufficiently_parsed_number(pd)
)
problematic_text <- pd[is_problematic_text, ]
Copy link
Member

@krlmlr krlmlr Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could speed up the code if we return early if is_problematic_text is FALSE everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, let's see what touchstone says...

R/parse.R Outdated
}

is_insufficiently_parsed_number <- function(pd) {
grepl("^0x", pd$text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for "STR_NUM" here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first hoped we don't but since parent matches children too, it messes things up.

@krlmlr
Copy link
Member

krlmlr commented Mar 29, 2021

I'm seeing with this PR:

styler::style_text(text = "a <- 0x2L", scope = "line_breaks")
#> Error in paste("Error in styler:::ensure_correct_txt().", "Please file an issue on GitHub (https://github.com/r-lib/styler/issues)", : argument is missing, with no default

Created on 2021-03-29 by the reprex package (v1.0.0)

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Mar 29, 2021

Right... Now I get the right result (only visible after as.character()):

as.character(
  styler::style_text(text = "a <- 0x2L", scope = "line_breaks")
)
#> [1] "a <- 0x2L"

Created on 2021-03-29 by the reprex package (v1.0.0)

Speed implications are also modest.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 1% less is a lot from such a small change. It's easy to get slower, it's way harder to speed up things.

R/parse.R Outdated Show resolved Hide resolved
Co-authored-by: Kirill Müller <krlmlr@users.noreply.github.com>
@lorenzwalthert
Copy link
Collaborator Author

Well I don't think it's actually making things faster, because from a theoretical point of view, it must make things slower I believe. There is some variance in the estimator but at least we have good reasons to believe it's not making things much slower.

@krlmlr
Copy link
Member

krlmlr commented Mar 30, 2021

I agree if the shortcut is hit rarely. In our case, most of the code won't contain 0x2L or the other problematic sequence, so in the majority of cases the shortcut will save time.

What is the other problem that's handled in this code?

There are chances that this problem is fixed upstream in R 4.2 or earlier. So, eventually, we may be able to get rid of that.

@lorenzwalthert
Copy link
Collaborator Author

Ah sorry I was not referring to your return-early suggestion, but to the whole PR, which gets 1% faster for unseen code, which I don't think makes sense from a theoretical point of view but can be explained with uncertainty in the estimation. We'll see what https://github.com/r-lib/styler/pull/761/checks?check_run_id=2224978050 brings. The other problem that's handed in that ensure_correct_txt()() is very long strings which will be truncated in the parent text, which is apparently hardcoded in the R parser as noted in #216

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Mar 30, 2021

Ok, with the changes you proposed, it's actually unambiguous if we get faster or slower with this PR from an intuitive point of view, because we return earlier and then use regex instead of subsetting with substring. Sorry for the confusion. Another benchmark run indicates -1.6%, so probably it's a net positive change for us in speed as well 🥳.

@lorenzwalthert lorenzwalthert merged commit a6da1da into r-lib:master Mar 30, 2021
@lorenzwalthert lorenzwalthert deleted the issue-760 branch March 30, 2021 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expression not equivalent with scope = "line_breaks"
3 participants