Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected result with regard to ^ #8

Closed
O-D-S opened this issue Sep 21, 2017 · 9 comments
Closed

Unexpected result with regard to ^ #8

O-D-S opened this issue Sep 21, 2017 · 9 comments

Comments

@O-D-S
Copy link

O-D-S commented Sep 21, 2017

Test Test
Test Test
Test Test
Test Test

python.exe subst.py -b --count 0 --verbose -p "^Test Test\r\nTest" -r "Match Match\r\nMatch" "Test.txt"

results in

Match Match
Match Test
Test Test
Test Test

python.exe subst.py -b --count 0 -l --verbose -p "^Test Test\r\nTest" -r "Match Match\r\nMatch" "Test.txt"

results in

Test Test
Test Test
Test Test
Test Test

python.exe subst.py -b --count 0 -l --pattern-multiline --verbose -p "^Test Test\r\nTest" -r "Match Match\r\nMatch" "Test.txt"

results in

Test Test
Test Test
Test Test
Test Test

python.exe subst.py -b --count 0 --pattern-multiline --verbose -p "^Test Test\r\nTest" -r "Match Match\r\nMatch" "Test.txt"

results in

Match Match
Match Test
Test Test
Test Test

None of the combinations of options results in

Match Match
Match Test
Match Match
Match Test

I am not sure, if this is intended. (I know, that count 0 is default.)

(I am using the version of subst.py, in which you have used the fnmatch instead of the glob module. For the case, that there is a bug with regard to ^ or one of the options and you fix it, before you fix the wildcard bug, I cannot check, if the bug fix works on Windows and with Python 2.7.11.)

@msztolcman
Copy link
Owner

-l option change how data is processed. Without -l whole file is read at once, and then replacements are done on whole data at once. With -l option file is read line by line, and replacements are done on every particular line. It means that multiline patterns aren't match by design. -l option is useful when you're processing very big file that can't fit into your computer memory.

But you find another interesting bug, for -p/-r params --pattern-* params wasn't processed. Fixed, thanks!

@O-D-S
Copy link
Author

O-D-S commented Sep 23, 2017

But there is no combination of options, that results in:

Match Match
Match Test
Match Match
Match Test

This is different from "https://regexr.com/3gqjc".

There is the global and the multiline flag set. Doesn't "global" mean "--count 0"?

@msztolcman
Copy link
Owner

@O-D-S works for me in newest version:

% cat x.txt
Test Test
Test Test
Test Test
Test Test
% ./subst.py -p '^Test Test\r\nTest' -r 'Match Match\r\nMatch' x.txt --pattern-multiline
% cat x.txt
Match Match
Match Test
Match Match
Match Test

There is the global and the multiline flag set. Doesn't "global" mean "--count 0"?

If you mean g flag then yes, it's same as --count=0 for subst.

@O-D-S
Copy link
Author

O-D-S commented Sep 23, 2017

You have uploaded more than one version this evening, haven't you? I have downloaded subst.py again and

subst.py -p '^Test Test\r\nTest' -r 'Match Match\r\nMatch' x.txt --pattern-multiline

indeed works now as expected.

I will test it tomorrow more extensively.

@O-D-S
Copy link
Author

O-D-S commented Sep 23, 2017

In future I will post the checksum of the version, to which I regard.

Good night!

Edited:

I have to check this tomorrow again:

-p "^[:blank:]*Test" does not work.
-p "[:blank:]*Test" works.
-p "^ *Test" works.

@msztolcman
Copy link
Owner

Yep, uploaded twice, but second one was uploaded before my last comment :) You don't need a checksum, just look at commits :)

About [:blank:] - it's a POSIX class, and Python's re module doesn't handle them at all. Second one works because star sign tells: '0 or more', it means it is ot it's not, it will work.

PCRE has \s class for white characters.

@O-D-S
Copy link
Author

O-D-S commented Sep 24, 2017

About [:blank:] - it's a POSIX class, and Python's re module doesn't handle them at all.

\s includes newlines. How about ( |\t)? I had compared the speed of the script. It is slightly faster with [:blank:] instead of ( |\t). [:blank:] works, but not in context with the caret. But I will also test \s in context with the caret again.

@msztolcman
Copy link
Owner

For tabs and spaces only you can use: [ \t] - it's better/faster then ( |\t).

@O-D-S
Copy link
Author

O-D-S commented Sep 24, 2017

Thanks!

Unexpected results with ^\s*Alpha. The result is different from ^[ \t]*Alpha. Maybe because some of the newline characters are recognized as the beginning of a newline, but not the end of the previous line. That is probably not a bug in Subst.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants