Parsing Bug in Bazel Extract #8785

cpsauer · 2021-02-20T09:13:28Z

Hey awesome Renovate folks,

I was trying to track down a misparsing error that showed up in a Renovate PR. Along the way, I noticed what I'm pretty sure is a bug in the parser--though perhaps not the one causing my issue.

In particular, the comment says the parser should "if ["""] found then ignore all [)] until closing ["""] parsed." but doesn't seem to actually ignore ')' inside of a doctoring comment. Not sure how much this really matters, but it definitely doesn't do as the comment says.

A test case that shows the issue is findBalancedParenIndex('""")""")'). It returns -1, but should return 7 (the last index) after ignoring the first paren in the docstring. Could also add an assert checking that the parenNestingDepth is never negative.

Here's the function in question:
https://github.com/renovatebot/renovate/blob/d2a71472223ce2cae3ffaa5ade6ad5f8c1a29423/lib/manager/bazel/extract.ts#L56:58

I think this is different from the parsing problem causing my real issue. In the real issue, Renovate tries to update an external dependency in a comment (which is nice, actually), but misparses somewhere and tries to eat the next function definition. The code in extract.ts looks to me like it handles this fine--I'm not sure where in the code is causing this bug.

Easiest way to show the error case was a screenshot:

Raw text for debugging:

    # Eigen already had a good BUILD file from Tensorflow.
# http_archive(
#     name = "rules_foreign_cc",
#     url = "https://github.com/bazelbuild/rules_foreign_cc/archive/dfccdce2c9d1063c59ddd331b94eb7cb528a96ee.tar.gz",
#     sha256 = "5469ef8b4e2c475de443c13290cf91ba7d1255899442b1e42fcb7fcdee8ceed8",
#     strip_prefix = "rules_foreign_cc-dfccdce2c9d1063c59ddd331b94eb7cb528a96ee",
# )
# load("@rules_foreign_cc//:workspace_definitions.bzl", "rules_foreign_cc_dependencies")
# rules_foreign_cc_dependencies()
# # Usage is a little weird, and depends 
# FOREIGN_CC_EXPOSE_ALL_FILES = """filegroup(name = "all", srcs = glob(["**"]), visibility = ["//visibility:public"])"""


########################################
# C++ & Cross-Platform Libraries

# Boost.
# Famous C++ library that gives rise to many new additions in the C++ standard library.
# See https://github.com/nelhage/rules_boost, recommended from https://docs.bazel.build/versions/master/rules.html
http_archive(
    name = "com_github_nelhage_rules_boost",
    url = "https://github.com/nelhage/rules_boost/archive/98495a618246683c9058dd87c2c78a2c06087999.tar.gz",
    sha256 = "f92cb7ed66a5b24f97a7fc3917407f808c70d2689273bdd68f93d70a379d22d3",
    strip_prefix = "rules_boost-98495a618246683c9058dd87c2c78a2c06087999",
)
load("@com_github_nelhage_rules_boost//:boost/boost.bzl", "boost_deps")
boost_deps() # Also pulls in a bunch of boost depenencies if you don't have them already. See https://github.com/nelhage/rules_boost/blob/master/boost/boost.bzl

Update: Yep, some different bug at work. findBalancedParenthases finds the right thing there.

Happy to help track down more tm, but it's late enough in my time zone that I better head to bed for now.

Thanks so much,
Chris

The text was updated successfully, but these errors were encountered:

github-actions · 2021-02-20T12:14:14Z

This issue has been labeled with status:requirements. This indicates that we don't know enough to start work and further requirements or a reproduction are needed first.

This label will be replaced with status:ready once all requirements and reproductions necessary to start work have been met.

If it's not clear what is missing to move this issue forward, ask for clarification in a new comment. If you think we already have what we need to move forward, mention this in a new comment.

github-actions · 2021-02-20T12:14:15Z

Hi there,

The Renovate team needs your help! To fix the problem, we first need to know exactly what's causing the bug. A minimal reproduction help us to pinpoint the exact cause of the bug.

To get started, please read our guide on minimal reproductions to understand what is needed.

We may close the issue if you have not provided a minimal reproduction within two weeks. If you need more time, or are stuck, please ask for help or more time in a comment.

Good luck,

The Renovate team

rarkins · 2021-02-20T12:14:51Z

Hi @cpsauer, happy to accept a fix or reproduction

rarkins · 2021-02-20T12:14:51Z

Hi @cpsauer, happy to accept a fix or reproduction

cpsauer · 2021-02-20T22:53:57Z

Thanks @rarkins.

I think I got you the repro info for both, right? I assume you'd want the "unit test" equivalent for the parsing issue (findBalancedParenIndex('""")""")'). It returns -1, but should return 7). And the failing section of the WORKSPACE as a minimal reproducing case? At least that's what I imagined want in your shoes for a fast debug, since I'd was just feeding each into the console to try to narrow things down.

For the two sub issues separately:

Key question on the docstring parsing one: Do you actually care about handling that case?
- There are lots of string/comment cases unhandled, and this only matters in the case where the docstring has unmatched parens... Seems like a weird case that's no more common than unmatched string/parens in a normal comment.
- Fastest fix would be to just remove the buggy docstring code. Doesn't do anything good in its current form anyway, I don't think. Not that hard to make it work, but questionable benefit for the complexity, I think.
- I think we should consider using some external parser (maybe even python's or starlark) if we want to really want to handle all these cases. Parsing can get tricky and recursive fast!
  - But if we did this, we'd lose the ability to update commented packages. Not sure which way you feel on that one either.
The WORKSPACE update misparse I haven't tracked down to a function--again maybe happening at some other, deeper update parsing level I haven't seen yet. I don't understand how this all fits together nearly as well as you do.
- For now, I can just ignore the dependency (it's commented out anyway), but thought you'd want to know about it because indicates that there's likely some deeper parsing issue somewhere.

rarkins · 2021-02-21T07:53:00Z

I'm ok with removing the current code if it's unhelpful. And if a unit test can successfully reproduce then that's fine too.

We try to avoid adding any parsing that requires using child process, because that has always caused us pain before (setup.py, Gradle..).

As a general rule: we don't aspire or need to fully parse any package file - just enough to correctly extract the dependencies most of the time. In rare cases where we fail to detect if people utilize weird syntax in a package file because they "can" and not "need", I'm not keen to add code to handle that. I think that making package definitions more semantic or parseable in order to utilize the benefits of Renovate's automation is hopefully a good-enough trade-off for most, although understand that sometimes the person wanting to add Renovate may not be empowered to make the modifications necessary in a big company.

github-actions · 2021-02-21T07:53:21Z

Thank you for providing a reproduction! 🎉 🚀

The Renovate team will take a look at the reproduction repository. Once we confirm the provided repository reproduces the problem, the label will be changed to reproduction:confirmed.

zharinov · 2021-02-21T21:11:43Z

Hi @cpsauer, don't you mind if I'll use your reproduction case as the test fixture in our code base?

cpsauer · 2021-02-22T06:49:31Z

Wow! I was coming back to clean out the problematic code--that philosophy totally makes sense, @rarkins; thanks for explaining. But it looks like @zharinov has already taken things quite far in the better lexing/parsing direction. Awesome!

@zharinov, of course you can. Thanks for digging into it and for asking :)

Did you find the source of the problematic PR while working on the code? When I was looking, it seemed like it might be be a separate parsing bug from another layer, since I think it should have parsed correctly even with the problematic code in findBalancedParenIndex(). Ignoring commented lines will sidestep the issue, of course, but I'm wondering if there might be something else there.

Three things that might be handy in the PR:
(I haven't used moo before, and I don't quite follow the diffs in the tests, so pardon me if I'm mistaken. Wanted to say in case handy, though.)
(1) I think you might already get the """ case for free if you handle ", and allow new lines, as you do. Clever that you allow newlines in strings and sidestep the <newline> case because you can count on a well-formed file. :)
(2) Heads that Starlark (Bazel's language) is happy with single quote strings. Might be an easy win to add if easy?
(3) Similarly there's the " and ' cases. Might be worth munching . to dance around termination issues?

renovate-release · 2021-03-10T18:27:06Z

🎉 This issue has been resolved in version 24.78.3 🎉

The release is available on:

GitHub release
24.78.3

Your semantic-release bot 📦🚀

rarkins added reproduction:provided and removed auto:reproduction A minimal reproduction is necessary to proceed status:requirements Full requirements are not yet known, so implementation should not be started labels Feb 21, 2021

zharinov self-assigned this Feb 21, 2021

zharinov mentioned this issue Feb 21, 2021

fix(bazel): Handle comment and string parsing #8794

Closed

6 tasks

zharinov mentioned this issue Feb 23, 2021

fix(bazel): Handle comment and string parsing #8830

Merged

6 tasks

HonkingGoose added status:in-progress Someone is working on implementation and removed status:ready labels Feb 25, 2021

rarkins closed this as completed in #8830 Mar 10, 2021

github-actions bot locked as resolved and limited conversation to collaborators Apr 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing Bug in Bazel Extract #8785

Parsing Bug in Bazel Extract #8785

cpsauer commented Feb 20, 2021 •

edited

Loading

github-actions bot commented Feb 20, 2021

github-actions bot commented Feb 20, 2021

rarkins commented Feb 20, 2021

rarkins commented Feb 20, 2021

cpsauer commented Feb 20, 2021 •

edited

Loading

rarkins commented Feb 21, 2021

github-actions bot commented Feb 21, 2021

zharinov commented Feb 21, 2021

cpsauer commented Feb 22, 2021

renovate-release commented Mar 10, 2021

Parsing Bug in Bazel Extract #8785

Parsing Bug in Bazel Extract #8785

Comments

cpsauer commented Feb 20, 2021 • edited Loading

github-actions bot commented Feb 20, 2021

github-actions bot commented Feb 20, 2021

rarkins commented Feb 20, 2021

rarkins commented Feb 20, 2021

cpsauer commented Feb 20, 2021 • edited Loading

rarkins commented Feb 21, 2021

github-actions bot commented Feb 21, 2021

zharinov commented Feb 21, 2021

cpsauer commented Feb 22, 2021

renovate-release commented Mar 10, 2021

cpsauer commented Feb 20, 2021 •

edited

Loading

cpsauer commented Feb 20, 2021 •

edited

Loading