Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: document that FindReaderIndex reads past the match in some cases #21883

Closed
as opened this issue Sep 14, 2017 · 4 comments
Closed

regexp: document that FindReaderIndex reads past the match in some cases #21883

as opened this issue Sep 14, 2017 · 4 comments

Comments

@as
Copy link
Contributor

@as as commented Sep 14, 2017

What did you do?

https://play.golang.org/p/aKIosxo6aP

func main() {
	br := bytes.NewReader([]byte("abaab"))
	re := regexp.MustCompile("a")
	for i:=0;;i++{
		q := re.FindReaderIndex(br)
		fmt.Printf("step %d q=%v\n",i,q)
		if q==nil{
			break
		}
	}
}

What did you expect to see?

step 0 q=[0 1]
step 1 q=[1 2]
step 2 q=[0 1]

What did you see instead?

step 0 q=[0 1]
step 1 q=[]

I remember reading somewhere in the godoc that the readers can eat the underlying runes to an arbitrarily degree, but it's not clear from the documentation for FindReaderIndex that this is possible.

func (re *Regexp) FindReaderIndex(r io.RuneReader) (loc []int)
    FindReaderIndex returns a two-element slice of integers defining the
    location of the leftmost match of the regular expression in text read from
    the RuneReader. The match text was found in the input stream at byte offset
    loc[0] through loc[1]-1. A return value of nil indicates no match.

Workaround

I had to manually rewind the reader. This solution was not obvious at first, maybe a short addendum to the package function will clarify that it the function may skip matches.

https://play.golang.org/p/J9ubm_43wW

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Sep 15, 2017
@ianlancetaylor ianlancetaylor changed the title regexp: not clear why FindReaderIndex loses matches on some strings regexp: document that FindReaderIndex reads past the match in some cases Sep 15, 2017
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 18, 2017

Change https://golang.org/cl/78635 mentions this issue: regexp: examples for Regexp.FindAllSubmatchIndex and Regexp.FindAllSubmatchIndex methods

@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 29, 2017
@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Unplanned Jun 30, 2018
@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Jun 30, 2018

Not sure there is anything that needs to be done.
The current regexp docs state the following:

Note that regular expression matches may need to examine text beyond the text returned by a match, so the methods that match text from a RuneReader may read arbitrarily far into the input before returning.

@as
Copy link
Contributor Author

@as as commented Jul 2, 2018

@fraenkel

There are two commonly used package scope functions (MatchReader, FindReaderIndex) that do not mention the behavior. It would be more forgiving if these were the methods attached to a regexp.Regexp type, but the usage pattern of godoc and go doc make it unlikely a user will see this caveat without reading the entire package documentation.

The package documentation spans over 7 paragraphs, and introduces Submatching before talking about constraints. The user will likely stop reading the documentation as they realize the subject matter is out of scope for their use-case.

It would be ideal for this caveat to be mentioned in the package scoped functions or made to be more apparent in the package documentation.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 2, 2018

This is documented in the package docs.

@rsc rsc closed this Oct 2, 2018
@golang golang locked and limited conversation to collaborators Oct 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.