New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor to preserve reporting of original line numbers in requirements files #3125

Merged
merged 8 commits into from Oct 5, 2015

Conversation

Projects
None yet
5 participants
@qwcode
Contributor

qwcode commented Sep 21, 2015

this fixes #3009

it's alternative to PR #3030 that involves less refactor, and maintains the smaller functions for testability, and avoids the large "do-it-all" functions

@pgervais

This comment has been minimized.

Show comment
Hide comment
@pgervais

pgervais Sep 22, 2015

I should have mentioned that I also fixed two others bugs in #3030:

If the file contains

line1 \\
# comment \\
line2

Then the three lines are merged as 'line1 # comment line2' and 'line2' is dropped.

And more anecdotally, when there is no space before :

line1\\
# line2

Is merged into 'line1#line2' and the comment is not dropped.

Merging join_lines() and ignore_comments() seemed the cleanest way to handle that.

There is also another cornercase: if the last line in the file ends with \ it is discarded because the yield in join_lines is never executed.

I would recommend at least porting some of my tests to this PR to make sure the above cases are handled properly.

pgervais commented Sep 22, 2015

I should have mentioned that I also fixed two others bugs in #3030:

If the file contains

line1 \\
# comment \\
line2

Then the three lines are merged as 'line1 # comment line2' and 'line2' is dropped.

And more anecdotally, when there is no space before :

line1\\
# line2

Is merged into 'line1#line2' and the comment is not dropped.

Merging join_lines() and ignore_comments() seemed the cleanest way to handle that.

There is also another cornercase: if the last line in the file ends with \ it is discarded because the yield in join_lines is never executed.

I would recommend at least porting some of my tests to this PR to make sure the above cases are handled properly.

- add a preprocess function so that interactions between filtering and
  joining can easily be tested
- add some test cases mentioned by @pgervais
- deal with case of last line ending with \
@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Sep 23, 2015

Contributor

@pgervais I made some updates for the cases you mentioned.

for readability sake, I'd like to keep the function structure the way I have it, if possible, but I also want to respect the time you put into your PR and thinking through all the test cases

So... if you're interested, I'd like your help in porting over missing tests. I haven't looked at all of them yet.

You'd basically cut a PR to my qwcode:req_line_numbers branch (instead of pypa:develop), and then I'd merge and it would all show up here with your name on the commits.

Does that work for you?

Contributor

qwcode commented Sep 23, 2015

@pgervais I made some updates for the cases you mentioned.

for readability sake, I'd like to keep the function structure the way I have it, if possible, but I also want to respect the time you put into your PR and thinking through all the test cases

So... if you're interested, I'd like your help in porting over missing tests. I haven't looked at all of them yet.

You'd basically cut a PR to my qwcode:req_line_numbers branch (instead of pypa:develop), and then I'd merge and it would all show up here with your name on the commits.

Does that work for you?

@pgervais

This comment has been minimized.

Show comment
Hide comment
@pgervais

pgervais Sep 23, 2015

Works for me. The rest of my day is full, I'll work on it tomorrow morning. Thanks for your time.

pgervais commented Sep 23, 2015

Works for me. The rest of my day is full, I'll work on it tomorrow morning. Thanks for your time.

@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Sep 25, 2015

Contributor

@pgervais, just checking in... I'll probably go ahead and finish this off myself this weekend if I don't hear back. no worries if you're busy.

Contributor

qwcode commented Sep 25, 2015

@pgervais, just checking in... I'll probably go ahead and finish this off myself this weekend if I don't hear back. no worries if you're busy.

@pgervais

This comment has been minimized.

Show comment
Hide comment
@pgervais

pgervais Sep 28, 2015

Sorry for the long silence, I was indeed pretty busy last week. I'll have more time this week.
If you've already done the job, no worries. Just tell me :-)

pgervais commented Sep 28, 2015

Sorry for the long silence, I was indeed pretty busy last week. I'll have more time this week.
If you've already done the job, no worries. Just tell me :-)

@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Sep 28, 2015

Contributor

np, I haven't yet. so I'll wait til later this week or next weekend.

Contributor

qwcode commented Sep 28, 2015

np, I haven't yet. so I'll wait til later this week or next weekend.

@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Oct 4, 2015

Contributor

ok @pgervais , I migrated over what tests I thought were important, except for one case which won't work.

for this

req1 \\
# comment
req2

you had it ending up like this

req1
req2

whereas, I have it like this:

req1 req2

prior to this change, comments were processed first, then line continuations, so we should stay with that for compatibility.

Contributor

qwcode commented Oct 4, 2015

ok @pgervais , I migrated over what tests I thought were important, except for one case which won't work.

for this

req1 \\
# comment
req2

you had it ending up like this

req1
req2

whereas, I have it like this:

req1 req2

prior to this change, comments were processed first, then line continuations, so we should stay with that for compatibility.

qwcode added a commit that referenced this pull request Oct 5, 2015

Merge pull request #3125 from qwcode/req_line_numbers
refactor to preserve reporting of original line numbers in requirements files

@qwcode qwcode merged commit 0e870a7 into pypa:develop Oct 5, 2015

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@pgervais

This comment has been minimized.

Show comment
Hide comment
@pgervais

pgervais Oct 5, 2015

@qwcode Thanks a lot for following-up on that, sorry for not being very responsive.

I think processing comments before joining lines will confuse people. It's not the way parsing works for Python and as far as I can tell not in any other languages. In that case keeping backward compatibility seems to me less important than being consistent.

But you did all the work so up to you :-)

pgervais commented Oct 5, 2015

@qwcode Thanks a lot for following-up on that, sorry for not being very responsive.

I think processing comments before joining lines will confuse people. It's not the way parsing works for Python and as far as I can tell not in any other languages. In that case keeping backward compatibility seems to me less important than being consistent.

But you did all the work so up to you :-)

@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Oct 5, 2015

Contributor

@pfmoore @dstufft @Xafer @Ivoz
For requirements file preprocessing, filter out comments before are after joining lines?
Since this was added in 7.x, comments were filtered first.
I'm not sure what "right" is here?
practically, not sure it will matter much, but I'm willing to change it, if people think so.

Contributor

qwcode commented Oct 5, 2015

@pfmoore @dstufft @Xafer @Ivoz
For requirements file preprocessing, filter out comments before are after joining lines?
Since this was added in 7.x, comments were filtered first.
I'm not sure what "right" is here?
practically, not sure it will matter much, but I'm willing to change it, if people think so.

@pfmoore

This comment has been minimized.

Show comment
Hide comment
@pfmoore

pfmoore Oct 5, 2015

Member

In terms of the examples in the discussion, IMO "right" is not to do stuff like that...

So I more or less don't care what you do, and am happy with whatever you choose.

For reference, though, Python treats a comment as ending at the end of the physical line, so that

# A comment \
print("Hello")

prints "Hello".

So if I understand the PR, you're consistent with Python's behaviour, which is probably the most intuitive thing to do.

Member

pfmoore commented Oct 5, 2015

In terms of the examples in the discussion, IMO "right" is not to do stuff like that...

So I more or less don't care what you do, and am happy with whatever you choose.

For reference, though, Python treats a comment as ending at the end of the physical line, so that

# A comment \
print("Hello")

prints "Hello".

So if I understand the PR, you're consistent with Python's behaviour, which is probably the most intuitive thing to do.

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Oct 5, 2015

Member

I don't have a strong opinion, matching Python's behavior (whatever that is) probably makes sense.

Member

dstufft commented Oct 5, 2015

I don't have a strong opinion, matching Python's behavior (whatever that is) probably makes sense.

@Ivoz

This comment has been minimized.

Show comment
Hide comment
@Ivoz

Ivoz Oct 5, 2015

Member

A) requirements.txt never had the most formal definition... (slightly helps us in this instance)

B) If you define "Everything after a # is a comment" then naively \ is also part of the comment, and not functional.

C) You can choose to interpret the other order (parse \ first), which gives you the ability to do

some-req # this is some comment \
and some more comment \
and a last bit of explanation
another-req

But that really looks inferior, in readability, to

some-req # this is some comment
# and some more comment
# and a last bit of explanation
another-req

In my eyes. So I see no reason not to parse comments first.

Member

Ivoz commented Oct 5, 2015

A) requirements.txt never had the most formal definition... (slightly helps us in this instance)

B) If you define "Everything after a # is a comment" then naively \ is also part of the comment, and not functional.

C) You can choose to interpret the other order (parse \ first), which gives you the ability to do

some-req # this is some comment \
and some more comment \
and a last bit of explanation
another-req

But that really looks inferior, in readability, to

some-req # this is some comment
# and some more comment
# and a last bit of explanation
another-req

In my eyes. So I see no reason not to parse comments first.

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Oct 5, 2015

Member

Makes sense to me.

Member

dstufft commented Oct 5, 2015

Makes sense to me.

@pgervais

This comment has been minimized.

Show comment
Hide comment
@pgervais

pgervais Oct 5, 2015

One more data point:

a = \
# comment
1

Is syntactically incorrect in Python whether or not there is a backslash at the end of the comment.

pgervais commented Oct 5, 2015

One more data point:

a = \
# comment
1

Is syntactically incorrect in Python whether or not there is a backslash at the end of the comment.

@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Oct 5, 2015

Contributor

if we did process \ first, I was planning on still ignoring the case where \ follows a comment line, so @Ivoz 's case is not the decider on this I think. It's the slightly bizarre case @pgervais and I were going back and forth on.

this:

req1 \
# comment
req2

processing \ first gives:

req1
req2

processing comments first gives:

req1 req2

processing \ first here is consistent with python and maybe a bit more intuitive I think, but it's not completely clear.

Contributor

qwcode commented Oct 5, 2015

if we did process \ first, I was planning on still ignoring the case where \ follows a comment line, so @Ivoz 's case is not the decider on this I think. It's the slightly bizarre case @pgervais and I were going back and forth on.

this:

req1 \
# comment
req2

processing \ first gives:

req1
req2

processing comments first gives:

req1 req2

processing \ first here is consistent with python and maybe a bit more intuitive I think, but it's not completely clear.

@pfmoore

This comment has been minimized.

Show comment
Hide comment
@pfmoore

pfmoore Oct 5, 2015

Member

@pgervais that's because the comment is removed, but the line still ends - so you get

a =
1

@qwcode I would expect

req1 \
# a comment
req2

to give

req1
req2

(i.e. process \ first). But my logic for thinking so is not that you process \ first. Rather it's that you process comments first, but a full-line comment is still a line. So

req1 \
# a comment
req2

becomes

req1 \

req2

and then

req1
req2

when the backslash joins the empty line to the req1 line.

This feels more in line with

req1 \
xxx # a comment
req2

becoming

req1 xxx
req2

(via

req1 \
xxx 
req2

)

But again, the real point here is that people shouldn't be mixing line continuation and comments anyway, because the results are confusing :-)

Member

pfmoore commented Oct 5, 2015

@pgervais that's because the comment is removed, but the line still ends - so you get

a =
1

@qwcode I would expect

req1 \
# a comment
req2

to give

req1
req2

(i.e. process \ first). But my logic for thinking so is not that you process \ first. Rather it's that you process comments first, but a full-line comment is still a line. So

req1 \
# a comment
req2

becomes

req1 \

req2

and then

req1
req2

when the backslash joins the empty line to the req1 line.

This feels more in line with

req1 \
xxx # a comment
req2

becoming

req1 xxx
req2

(via

req1 \
xxx 
req2

)

But again, the real point here is that people shouldn't be mixing line continuation and comments anyway, because the results are confusing :-)

@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Oct 5, 2015

Contributor

that's because the comment is removed, but the line still ends - so you get

I may be wrong, but I don't think that's it @pfmoore

if you have "test.py" like so:

a = \
# comment
1

python will error with

  File "test.py", line 2
    # comment
            ^

it appears to error as if you had

a = # comment

which fails like this

  File "test.py", line 1
    a = # comment
                ^
Contributor

qwcode commented Oct 5, 2015

that's because the comment is removed, but the line still ends - so you get

I may be wrong, but I don't think that's it @pfmoore

if you have "test.py" like so:

a = \
# comment
1

python will error with

  File "test.py", line 2
    # comment
            ^

it appears to error as if you had

a = # comment

which fails like this

  File "test.py", line 1
    a = # comment
                ^
@pfmoore

This comment has been minimized.

Show comment
Hide comment
@pfmoore

pfmoore Oct 5, 2015

Member

Well, it removes the comment, leaving a blank line, joins it with the "a =" line, then errors on the newline. When reporting the error, it reports the original content of the line where the unexpected newline character was found.

But honestly, who cares about why? What concerns me is behaviour. I'll add notes to the tests to say what I prefer, and then leave it at that.

Member

pfmoore commented Oct 5, 2015

Well, it removes the comment, leaving a blank line, joins it with the "a =" line, then errors on the newline. When reporting the error, it reports the original content of the line where the unexpected newline character was found.

But honestly, who cares about why? What concerns me is behaviour. I'll add notes to the tests to say what I prefer, and then leave it at that.

req2
""")
result = preprocess(content, None)
assert list(result) == [(1, 'req1 req2')]

This comment has been minimized.

@pfmoore

pfmoore Oct 5, 2015

Member

Should be 2 lines, 'req1 ' and 'req2'.

@pfmoore

pfmoore Oct 5, 2015

Member

Should be 2 lines, 'req1 ' and 'req2'.

req2
""")
result = preprocess(content, None)
assert list(result) == [(1, 'req1 req2')]

This comment has been minimized.

@pfmoore

pfmoore Oct 5, 2015

Member

Should be 2 lines, 'req1 ' and 'req2'.

@pfmoore

pfmoore Oct 5, 2015

Member

Should be 2 lines, 'req1 ' and 'req2'.

@Ivoz

This comment has been minimized.

Show comment
Hide comment
@Ivoz

Ivoz Oct 7, 2015

Member

Yeah, I agree with @pfmoore

get ignore_comments() to stop disgarding empty lines, and remove them in a separate function after processing \. I think that would do the job.

Member

Ivoz commented Oct 7, 2015

Yeah, I agree with @pfmoore

get ignore_comments() to stop disgarding empty lines, and remove them in a separate function after processing \. I think that would do the job.

@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Oct 8, 2015

Contributor

Well, it removes the comment, leaving a blank line, joins it with the "a =" line,
then errors on the newline. When reporting the error, it reports the original
content of the line where the unexpected newline character was found.

can you reference something in the python docs that supports this version?

I see this, which seems to not support this: https://docs.python.org/3.4/reference/lexical_analysis.html#blank-lines

But honestly, who cares about why? What concerns me is behaviour.

I care about both. : ) seems easier to end up with the right behavior, if you know the why

Contributor

qwcode commented Oct 8, 2015

Well, it removes the comment, leaving a blank line, joins it with the "a =" line,
then errors on the newline. When reporting the error, it reports the original
content of the line where the unexpected newline character was found.

can you reference something in the python docs that supports this version?

I see this, which seems to not support this: https://docs.python.org/3.4/reference/lexical_analysis.html#blank-lines

But honestly, who cares about why? What concerns me is behaviour.

I care about both. : ) seems easier to end up with the right behavior, if you know the why

@qwcode

This comment has been minimized.

Show comment
Hide comment
@qwcode

qwcode Oct 8, 2015

Contributor

to be clear, I agree with Paul's behavior recommendations.

Contributor

qwcode commented Oct 8, 2015

to be clear, I agree with Paul's behavior recommendations.

erikrose added a commit to erikrose/peep that referenced this pull request Dec 2, 2015

Assume correct line-numbering behavior in pip>=8.
That's when pypa/pip#3125 lands.

Also capitalize format_control_arg, since it's as much a const as PIP_COUNTS_COMMENTS is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment