Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse empty lines and comments #8

Closed
wants to merge 1 commit into from

Conversation

blurrymoi
Copy link
Contributor

Allows to parse whole files.

The idea is that I force-reduce on inputunit -> NEWLINE rule and accept by creating a dummy newline node that only retains position (needed for further parsing). These nodes are filtered in the end to prevent output such as:
NewlineNode
NewlineNode
CommandNode
...

I handpicked the relevant changes, so I hope I didn't miss anything.

@idank
Copy link
Owner

idank commented Feb 20, 2016

Thanks for sending this. I'm curious how GNU bash handles new lines,
ideally we should do the same so we stay close to the source. Though if
this works, I guess I'm fine with taking it.

Can you add some tests?
On Feb 20, 2016 02:12, "blurrymoi" notifications@github.com wrote:

Allows to parse whole files.

The idea is that I force-reduce on inputunit -> NEWLINE rule and accept by
creating a dummy newline node that only retains position (needed for
further parsing). These nodes are filtered in the end to prevent output
such as:
NewlineNode
NewlineNode
CommandNode
...

I handpicked the relevant changes, so I hope I didn't miss anything.

You can view, comment on, or merge this pull request online at:

#8
Commit Summary

  • parse empty lines and comments

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#8.

@blurrymoi
Copy link
Contributor Author

I did some thinking prior to making this and on line 388 they do create a NULL command and yacc accept:
http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y?id=df2c55de9c87c2ee8904280d26e80f5c48dd6434#n388

The "None" approach would not really be viable, because a pointer can be of specific type, but None is of no type, so I assumed this to be error-prone. Also, in parser.py around lines 600-620, the index is being set and should there be None, the position would have to be remembered differently (not to mention testing for None-ness).

Perhaps this has to do with actually executing the code?

@idank
Copy link
Owner

idank commented Feb 21, 2016

Maybe. I tried a more naive approach:

@@ -593,6 +597,9 @@ def parse(s, strictmode=True, expansionlimit=None, convertpos=False):
     ef.visit(parts[-1])
     index = max(parts[-1].pos[1], ef.end) + 1
     while index < len(s):
+        if s[index].isspace():
+            index += 1
+            continue
         part = _parser(s[index:], strictmode=strictmode).parse()

         if not isinstance(part, ast.node):

But then started thinking about recursive parsing and indeed this fails when there's a newline in a command substitution, try: a $(b\nc). I also tried your fix and it fails as well. I'll try debugging this for a bit and see if I can find how the C parser deals with this.

I'll still merge this, but let's see if we can fix this case too.

@idank
Copy link
Owner

idank commented Feb 22, 2016

Ok, I think I got it working. It deviates from bash a little because I got tired of debugging it. Can you try https://github.com/idank/bashlex/tree/newlines

@blurrymoi
Copy link
Contributor Author

I played with it a bit and one assert ( assert p[2].kind == 'list' from parser.py/215 (while or until) ) fails for example on:

while [ sth ]
do
   sth
done

It works without the assert, though.

The two issues are barely related, so you could still combine the commits and both parse newlines and comments and fix the newline in substitution problem. I am not trying to enforce my changes, on the contrary, I'd love to see comment-parsing being handled non-trivially, it's just that each and every file that I have to parse contains comments and having a whole different version of the parser is not in my interest.

Thanks for the advice, I would not dare alter your former test cases.

@idank
Copy link
Owner

idank commented Feb 28, 2016

What do you mean have a whole different version of the parser?

@blurrymoi
Copy link
Contributor Author

I added my proposition for the combination of the two solutions. It should be able to parse all of the aforementioned problems: newlines, comments and newlines in command substitution.

What I meant is that if you choose to go with your newlines version and disregard comments altogether I'd have to use my approach for comment-parsing and digress from the mainstream bashlex and I would like to avoid this.

@blurrymoi
Copy link
Contributor Author

Squashed into a single commit for your convenience~

@idank
Copy link
Owner

idank commented Mar 1, 2016

My changes shouldn't ignore comments. I made them in haste, I'm sure it's fixable.

Does your original patch work with new lines in substitutions? Changing the grammar the way you did is a little scary and it's hard to see what else might be affected by it.

@blurrymoi
Copy link
Contributor Author

No, my original patch does not regard new lines in substitutions as that was in no way related to what I was trying to accomplish. And I didn't know such problem existed.
Neither did I touch the structure of the grammar. Although I admit that it cannot be said what else the changes may affect.

Your proposed version by itself also seems not to parse \n a.

@idank
Copy link
Owner

idank commented Mar 7, 2016

That's easy to fix.

What I meant by touching the grammar is this line yaccparser.action[8][tt.name] = -2. I can't really tell if that's harmless or not outside of what you were trying to fix. I think my approach is safer but happy to discuss any concerns you might have.

If you want to refine my patch, feel free. Otherwise I'll work on it when I have the time.

@thehesiod
Copy link

FYI I tried this PR with moderate complexity multiline bash files and it doesn't work.

@idank
Copy link
Owner

idank commented Sep 22, 2017

Did you debug further? What was giving it trouble?

@blurrymoi
Copy link
Contributor Author

Can you provide an example?

I analysed the testsuite of the abrt project (some 100+ files) and only a handful of tests failed to parse for reasons unrelated to newline parsing.
Can you maybe try with my devel branch and let me know if the problem persists?
( https://github.com/blurrymoi/bashlex/tree/devel )

@thehesiod
Copy link

I'll try again, gimme a min

@thehesiod
Copy link

btw can you merge master to your branch? You're 18 commits behind, and have conflicts here

@thehesiod
Copy link

so it fails on this script:

#!/bin/bash -xe

docker build --pull --tag s2-build .
docker run -it --rm -v `pwd`:/s2 s2-build

with error:

  File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 583, in parse
    parts = [p.parse()]
  File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 642, in parse
    tree = theparser.parse(lexer=self.tok, context=self)
  File "/Users/amohr/dev/third_party/bashlex/bashlex/yacc.py", line 277, in parse
    return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
  File "/Users/amohr/dev/third_party/bashlex/bashlex/yacc.py", line 1079, in parseopt_notrack
    tok = self.errorfunc(errtoken)
  File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 539, in p_error
    p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token '\n' (position 16)

@idank
Copy link
Owner

idank commented Jan 17, 2024

fixed by ea0c135

@idank idank closed this Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants