Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmt: confusion about Scanf patterns, newline, space #13565

Closed
rsc opened this issue Dec 10, 2015 · 13 comments
Closed

fmt: confusion about Scanf patterns, newline, space #13565

rsc opened this issue Dec 10, 2015 · 13 comments
Assignees
Milestone

Comments

@rsc
Copy link
Contributor

@rsc rsc commented Dec 10, 2015

While updating code using fmt.Scanf after CL 16165, I discovered that there were users with patterns ending in \n as a way to assert that Scanf had consumed an entire input.

The examples that follow are given as three-line stanzas printed by the program at the end of the report. The first line is the Scanf call, the second line is its return values, and the third line is the values of x and y after the call.

The case that CL 16165 changed intentionally is:

# Go 1.5
fmt.Sscanf("1,2", "%d,%d\n", &x, &y)
2 <nil>
1 2

which became:

fmt.Sscanf("1,2", "%d,%d\n", &x, &y)
2 input does not match format
1 2

But there exists code using a \n at the end of the pattern to reject matching a partial input token:

# code doesn't want this match
fmt.Sscanf("1,2,3", "%d,%d", &x, &y)
2 <nil>
1 2

# so it uses this instead
fmt.Sscanf("1,2,3", "%d,%d\n", &x, &y)
2 expected space in input to match format
1 2

The \n here is serving to reject inputs that would be accepted without it, because \n matched end-of-string in Go 1.5 and earlier. Unfortunately, in Go 1.6 the \n will reject end-of-string, so code like this needs to be adjusted. How?

It seems that changing \n to space works: space matches end-of-string still.

# space accepts end-of-string
fmt.Sscanf("1,2", "%d,%d ", &x, &y)
2 <nil>
1 2

# space still rejects non-space additional text
fmt.Sscanf("1,2,3", "%d,%d ", &x, &y)
2 expected space in input to match format
1 2

Of course, using space in this way still accepts spaces, which \n did not. So if \n was being used this way:

# Go 1.5
fmt.Sscanf("1 2", "%d %d\n", &x, &y)
2 <nil>
1 2
fmt.Sscanf("1 2 3", "%d %d\n", &x, &y)
2 newline in format does not match input
1 2

now the same code produces:

fmt.Sscanf("1 2", "%d %d\n", &x, &y)
2 input does not match format
1 2
fmt.Sscanf("1 2 3", "%d %d\n", &x, &y)
2 newline in format does not match input
1 2

and changing \n to space in the pattern does not help reject the second input:

fmt.Sscanf("1 2", "%d %d ", &x, &y)
2 <nil>
1 2
fmt.Sscanf("1 2 3", "%d %d ", &x, &y)
2 <nil>
1 2

The reason this seems to come up is that people read a line at a time from some source and then process it with fmt.Scanf with a pattern containing \n, perhaps not realizing that the \n has been stripped off by the line reader, or perhaps just taking advantage of the fact that \n served like the regexp $ in such processing.

I am not sure how people should update their code. I can't find any promise in the docs that a space in the pattern matches end-of-string, and it seems odd to me that it does. But if we don't do that, then there's no way to update some common uses. And even if we do make the promise, it doesn't help reject unmatched text beginning with space, as in the last few examples. (To be fair, \n doesn't help reject unmatched text beginning with \n, but that doesn't come up in line-at-a-time processing, as seems to be common.)

  1. Should we roll back CL 16165 and simply document that a single \n at end of pattern is allowed to match end-of-string.
  2. Or should we document that space at end of pattern matches end-of-string?
  3. Or should we make space at end of pattern not match end-of-string?

If we do (3), then there becomes no way to preserve the semantics of many existing uses, since Scanf gives no indication that it did not process the entire input string.

@rsc rsc added this to the Go1.6 milestone Dec 10, 2015
@rsc
Copy link
Contributor Author

@rsc rsc commented Dec 10, 2015

Program:

package main

import "fmt"

func main() {
    var x, y int

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1,2", "%d,%d ", &x, &y)`)
    fmt.Println(fmt.Sscanf("1,2", "%d,%d ", &x, &y))
    fmt.Println(x, y)

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1,2", "%d,%d\n", &x, &y)`)
    fmt.Println(fmt.Sscanf("1,2", "%d,%d\n", &x, &y))
    fmt.Println(x, y)

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1,2,3", "%d,%d", &x, &y)`)
    fmt.Println(fmt.Sscanf("1,2,3", "%d,%d", &x, &y))
    fmt.Println(x, y)

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1,2,3", "%d,%d\n", &x, &y)`)
    fmt.Println(fmt.Sscanf("1,2,3", "%d,%d\n", &x, &y))
    fmt.Println(x, y)

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1,2,3", "%d,%d ", &x, &y)`)
    fmt.Println(fmt.Sscanf("1,2,3", "%d,%d ", &x, &y))
    fmt.Println(x, y)

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1 2", "%d %d\n", &x, &y)`)
    fmt.Println(fmt.Sscanf("1 2", "%d %d\n", &x, &y))
    fmt.Println(x, y)

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1 2 3", "%d %d\n", &x, &y)`)
    fmt.Println(fmt.Sscanf("1 2 3", "%d %d\n", &x, &y))
    fmt.Println(x, y)

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1 2", "%d %d ", &x, &y)`)
    fmt.Println(fmt.Sscanf("1 2", "%d %d ", &x, &y))
    fmt.Println(x, y)

    x = 0
    y = 0
    fmt.Println(`fmt.Sscanf("1 2 3", "%d %d ", &x, &y)`)
    fmt.Println(fmt.Sscanf("1 2 3", "%d %d ", &x, &y))
    fmt.Println(x, y)
}
@robpike
Copy link
Contributor

@robpike robpike commented Dec 10, 2015

"Space in pattern matches end of string" is already documented, but perhaps not centrally enough:

%s and %v on strings scan a space-delimited token

Plus:

func Scanf(format string, a ...interface{}) (n int, err error)
Scanf scans text read from standard input, storing successive
space-separated values into successive arguments as determined by the
format. ...

func Scan(a ...interface{}) (n int, err error)
Scan scans text read from standard input, storing successive space-separated
values into successive arguments. ...

If you think more is needed, I think it can be made explicit in the package comment.

@rsc
Copy link
Contributor Author

@rsc rsc commented Dec 10, 2015

The text you mention all seems to me to be about the way %s and %v handle spaces in input, which is fine. It does not seem to say anything about the meaning of a space in the format pattern. What I do see is:

When scanning with Scanf, Fscanf, and Sscanf, all non-empty runs of space characters (except newline) are equivalent to a single space in both the format and the input. With that proviso, text in the format string must match the input text; scanning stops if it does not, with the return value of the function indicating the number of arguments scanned.

But that would suggest that " " in the format pattern should not match "" in the input. Yet it does.

P.S. The Scanf doc comment seems like it should drop "space-separated": scanning "%d,%d" against "1,2" doesn't have anything to do with spaces.

@robpike
Copy link
Contributor

@robpike robpike commented Dec 10, 2015

Well, I don't know how to document the scanning part of this package (or make it work, but that's another problem). To be fair, no one else seems to know how to define it either. C's scanf is well known but not documented to any level of precision. Maybe I should have taken that route and just hinted at what it does.

Suggestions welcome.

@rsc
Copy link
Contributor Author

@rsc rsc commented Dec 10, 2015

I'm happy to figure out docs if we can answer these questions:

  1. Should Sscanf("123", "%d\n") succeed?
  2. Should Sscanf("123", "%d ") succeed?
  3. If the answer to 1 is no but the answer to 2 is yes, why are they different?
  4. If the answer to 1 is no and the answer to 2 is no, how do you write a pattern scanning a single integer that succeeds on input "123" but fails on input "123 456"?
@robpike
Copy link
Contributor

@robpike robpike commented Dec 10, 2015

Based entirely on my mental model:

  1. No, because there is a newline in the format and not in the input. (true in 1.6, not in 1.5)
  2. Yes. (true in 1.6 and 1.5)
  3. Because newlines in the format must match the input.
  4. N/A.

Make sense?

I wish fmt.Scanf did not exist. Its overuse and its pain wildly outrun its value.

@rsc
Copy link
Contributor Author

@rsc rsc commented Dec 11, 2015

OK, I'll try to document that.

@rsc
Copy link
Contributor Author

@rsc rsc commented Dec 11, 2015

I sent CL 17723 with documentation of the current behavior, but I think I found a serious enough problem to warrant postponing any changes to Scanf until Go 1.7. Specifically, "X\n Y" does not match "X\n Y" (the same string). The fundamental issue (and a source of much confusion for me until I read the code) is that in the input format, any newline surrounded by zero or more space characters (that is, any section of input format matching / *\n */) corresponds to consuming / *\n/ from the input. Note the trailing space-star in the first regexp but not the second: " \n " in the format is the same as "\n"; both can consume " \n" from the input but not " \n ".

Less serious but still odd, "X %c" does not match "X \n" (reading '\n' into the %c argument) while "X%c" does match "X\n".

I think we should roll back the one newline-related change from the Go 1.6 cycle and revisit the whole big picture for Go 1.7. If we're going to break user code, we might as well do it just once.

I sent CL 17724 for the rollback of Go 1.6's newline change.

@robpike
Copy link
Contributor

@robpike robpike commented Dec 11, 2015

SGTM

@rsc rsc modified the milestones: Go1.7, Go1.6, Go1.7Early Dec 14, 2015
@gopherbot
Copy link

@gopherbot gopherbot commented Feb 18, 2016

CL https://golang.org/cl/17723 mentions this issue.

@bradfitz bradfitz modified the milestones: Go1.8Early, Go1.7Early May 5, 2016
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented May 5, 2016

Didn't happen for Go 1.7. Kicking to 1.8Early.

@rsc rsc assigned rsc and unassigned robpike Sep 26, 2016
@rsc rsc added the NeedsFix label Sep 26, 2016
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 6, 2016

CL https://golang.org/cl/30610 mentions this issue.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 6, 2016

CL https://golang.org/cl/30611 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 7, 2016
There are no semantic changes here, just tests to establish
the status quo. A followup CL will make some semantic changes,
the (limited) scope of which should be clear from the number of
tests that change.

For #13565.

Change-Id: I960749cf59d4dfe39c324875bcc575096654f883
Reviewed-on: https://go-review.googlesource.com/30610
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
@gopherbot gopherbot closed this in 0db9518 Oct 7, 2016
@golang golang locked and limited conversation to collaborators Oct 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.