Skip to content

Whitespace skipping breaks matches unrelated to whitespaces #253

@Hi-Angel

Description

@Hi-Angel

Given the default behavior of whitespace skipping, it means that LineStart() + Regexp('word') should match both of: 1. 'word' and 2. ' word' (note the leading spaces) strings. However 2 doesn't match. One might think, perhaps it is a corner case where whitespace is singificant — but no, LineStart() + Regexp('\s+') will not match 2 either.

Steps to reproduce (in terms of terminal commands)

 λ cat test2.py
from pyparsing import LineStart, Regex

r = LineStart() + Regex('(\s+|word)')
s = '  word'
print([r for r in r.scanString(s)])
 λ python test2.py
[]

Expected

The list of matches is not empty

Actual

The list of matches is empty

Known workarounds

Disabling whitespace skipping as in the following code makes it work.

from pyparsing import LineStart, Regex, ParserElement

ParserElement.setDefaultWhitespaceChars('')
r = LineStart() + Regex('(\s+|word)')
s = '  word'
print([r for r in r.scanString(s)])

Version

2.4.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions