Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Bug in Bazel Extract #8785

Closed
cpsauer opened this issue Feb 20, 2021 · 10 comments · Fixed by #8830
Closed

Parsing Bug in Bazel Extract #8785

cpsauer opened this issue Feb 20, 2021 · 10 comments · Fixed by #8830
Assignees
Labels
manager:bazel Bazel WORKSPACE files priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others status:in-progress Someone is working on implementation type:bug Bug fix of existing functionality

Comments

@cpsauer
Copy link
Contributor

cpsauer commented Feb 20, 2021

Hey awesome Renovate folks,

I was trying to track down a misparsing error that showed up in a Renovate PR. Along the way, I noticed what I'm pretty sure is a bug in the parser--though perhaps not the one causing my issue.

In particular, the comment says the parser should "if ["""] found then ignore all [)] until closing ["""] parsed." but doesn't seem to actually ignore ')' inside of a doctoring comment. Not sure how much this really matters, but it definitely doesn't do as the comment says.

A test case that shows the issue is findBalancedParenIndex('""")""")'). It returns -1, but should return 7 (the last index) after ignoring the first paren in the docstring. Could also add an assert checking that the parenNestingDepth is never negative.

Here's the function in question:
https://github.com/renovatebot/renovate/blob/d2a71472223ce2cae3ffaa5ade6ad5f8c1a29423/lib/manager/bazel/extract.ts#L56:58

I think this is different from the parsing problem causing my real issue. In the real issue, Renovate tries to update an external dependency in a comment (which is nice, actually), but misparses somewhere and tries to eat the next function definition. The code in extract.ts looks to me like it handles this fine--I'm not sure where in the code is causing this bug.

Easiest way to show the error case was a screenshot:

Screen Shot 2021-02-20 at 1 08 44 AM

Raw text for debugging:

    # Eigen already had a good BUILD file from Tensorflow.
# http_archive(
#     name = "rules_foreign_cc",
#     url = "https://github.com/bazelbuild/rules_foreign_cc/archive/dfccdce2c9d1063c59ddd331b94eb7cb528a96ee.tar.gz",
#     sha256 = "5469ef8b4e2c475de443c13290cf91ba7d1255899442b1e42fcb7fcdee8ceed8",
#     strip_prefix = "rules_foreign_cc-dfccdce2c9d1063c59ddd331b94eb7cb528a96ee",
# )
# load("@rules_foreign_cc//:workspace_definitions.bzl", "rules_foreign_cc_dependencies")
# rules_foreign_cc_dependencies()
# # Usage is a little weird, and depends 
# FOREIGN_CC_EXPOSE_ALL_FILES = """filegroup(name = "all", srcs = glob(["**"]), visibility = ["//visibility:public"])"""


########################################
# C++ & Cross-Platform Libraries

# Boost.
# Famous C++ library that gives rise to many new additions in the C++ standard library.
# See https://github.com/nelhage/rules_boost, recommended from https://docs.bazel.build/versions/master/rules.html
http_archive(
    name = "com_github_nelhage_rules_boost",
    url = "https://github.com/nelhage/rules_boost/archive/98495a618246683c9058dd87c2c78a2c06087999.tar.gz",
    sha256 = "f92cb7ed66a5b24f97a7fc3917407f808c70d2689273bdd68f93d70a379d22d3",
    strip_prefix = "rules_boost-98495a618246683c9058dd87c2c78a2c06087999",
)
load("@com_github_nelhage_rules_boost//:boost/boost.bzl", "boost_deps")
boost_deps() # Also pulls in a bunch of boost depenencies if you don't have them already. See https://github.com/nelhage/rules_boost/blob/master/boost/boost.bzl

Update: Yep, some different bug at work. findBalancedParenthases finds the right thing there.

Happy to help track down more tm, but it's late enough in my time zone that I better head to bed for now.

Thanks so much,
Chris

@rarkins rarkins added manager:bazel Bazel WORKSPACE files priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others auto:reproduction A minimal reproduction is necessary to proceed status:requirements Full requirements are not yet known, so implementation should not be started type:bug Bug fix of existing functionality labels Feb 20, 2021
@github-actions
Copy link
Contributor

This issue has been labeled with status:requirements. This indicates that we don't know enough to start work and further requirements or a reproduction are needed first.

This label will be replaced with status:ready once all requirements and reproductions necessary to start work have been met.

If it's not clear what is missing to move this issue forward, ask for clarification in a new comment. If you think we already have what we need to move forward, mention this in a new comment.

@github-actions
Copy link
Contributor

Hi there,

The Renovate team needs your help! To fix the problem, we first need to know exactly what's causing the bug. A minimal reproduction help us to pinpoint the exact cause of the bug.

To get started, please read our guide on minimal reproductions to understand what is needed.

We may close the issue if you have not provided a minimal reproduction within two weeks. If you need more time, or are stuck, please ask for help or more time in a comment.

Good luck,

The Renovate team

@rarkins
Copy link
Collaborator

rarkins commented Feb 20, 2021

Hi @cpsauer, happy to accept a fix or reproduction

1 similar comment
@rarkins
Copy link
Collaborator

rarkins commented Feb 20, 2021

Hi @cpsauer, happy to accept a fix or reproduction

@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 20, 2021

Thanks @rarkins.

I think I got you the repro info for both, right? I assume you'd want the "unit test" equivalent for the parsing issue (findBalancedParenIndex('""")""")'). It returns -1, but should return 7). And the failing section of the WORKSPACE as a minimal reproducing case? At least that's what I imagined want in your shoes for a fast debug, since I'd was just feeding each into the console to try to narrow things down.

For the two sub issues separately:

  • Key question on the docstring parsing one: Do you actually care about handling that case?
    • There are lots of string/comment cases unhandled, and this only matters in the case where the docstring has unmatched parens... Seems like a weird case that's no more common than unmatched string/parens in a normal comment.
    • Fastest fix would be to just remove the buggy docstring code. Doesn't do anything good in its current form anyway, I don't think. Not that hard to make it work, but questionable benefit for the complexity, I think.
    • I think we should consider using some external parser (maybe even python's or starlark) if we want to really want to handle all these cases. Parsing can get tricky and recursive fast!
      • But if we did this, we'd lose the ability to update commented packages. Not sure which way you feel on that one either.
  • The WORKSPACE update misparse I haven't tracked down to a function--again maybe happening at some other, deeper update parsing level I haven't seen yet. I don't understand how this all fits together nearly as well as you do.
    • For now, I can just ignore the dependency (it's commented out anyway), but thought you'd want to know about it because indicates that there's likely some deeper parsing issue somewhere.

@rarkins
Copy link
Collaborator

rarkins commented Feb 21, 2021

I'm ok with removing the current code if it's unhelpful. And if a unit test can successfully reproduce then that's fine too.

We try to avoid adding any parsing that requires using child process, because that has always caused us pain before (setup.py, Gradle..).

As a general rule: we don't aspire or need to fully parse any package file - just enough to correctly extract the dependencies most of the time. In rare cases where we fail to detect if people utilize weird syntax in a package file because they "can" and not "need", I'm not keen to add code to handle that. I think that making package definitions more semantic or parseable in order to utilize the benefits of Renovate's automation is hopefully a good-enough trade-off for most, although understand that sometimes the person wanting to add Renovate may not be empowered to make the modifications necessary in a big company.

@rarkins rarkins added reproduction:provided and removed auto:reproduction A minimal reproduction is necessary to proceed status:requirements Full requirements are not yet known, so implementation should not be started labels Feb 21, 2021
@github-actions
Copy link
Contributor

Thank you for providing a reproduction! 🎉 🚀

The Renovate team will take a look at the reproduction repository. Once we confirm the provided repository reproduces the problem, the label will be changed to reproduction:confirmed.

@zharinov
Copy link
Collaborator

Hi @cpsauer, don't you mind if I'll use your reproduction case as the test fixture in our code base?

@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 22, 2021

Wow! I was coming back to clean out the problematic code--that philosophy totally makes sense, @rarkins; thanks for explaining. But it looks like @zharinov has already taken things quite far in the better lexing/parsing direction. Awesome!

@zharinov, of course you can. Thanks for digging into it and for asking :)

Did you find the source of the problematic PR while working on the code? When I was looking, it seemed like it might be be a separate parsing bug from another layer, since I think it should have parsed correctly even with the problematic code in findBalancedParenIndex(). Ignoring commented lines will sidestep the issue, of course, but I'm wondering if there might be something else there.

Three things that might be handy in the PR:
(I haven't used moo before, and I don't quite follow the diffs in the tests, so pardon me if I'm mistaken. Wanted to say in case handy, though.)
(1) I think you might already get the """ case for free if you handle ", and allow new lines, as you do. Clever that you allow newlines in strings and sidestep the <newline> case because you can count on a well-formed file. :)
(2) Heads that Starlark (Bazel's language) is happy with single quote strings. Might be an easy win to add if easy?
(3) Similarly there's the " and ' cases. Might be worth munching . to dance around termination issues?

@HonkingGoose HonkingGoose added status:in-progress Someone is working on implementation and removed status:ready labels Feb 25, 2021
@renovate-release
Copy link
Collaborator

🎉 This issue has been resolved in version 24.78.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
manager:bazel Bazel WORKSPACE files priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others status:in-progress Someone is working on implementation type:bug Bug fix of existing functionality
Projects
None yet
5 participants