-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes, strings: add iterator forms of existing functions #61901
Comments
Will |
Should |
Is |
This proposal has been added to the active column of the proposals project |
That relies on specialized optimizations to avoid an allocation and a copy. I'm not sure if those optimizations exist right now. The current best way to do it, as far as I know, is for i := 0; i < len(s); i++ {
c := s[i]
// ...
} I don't think that's particularly difficult, personally, but I think that |
Unless I misunderstand I think you just mean Split and Fields. And there is another misplaced reference to Runes a few lines down. |
I like this overall. Two things I wonder about:
|
Don‘t think so. And shouldn‘t there be only one way of doing things in go? |
You can pass iterators to other functions instead of just immediately using it in a range so being able to make iterators for built in things is still useful. |
ISTM that strings.Lines encourages people to write:
I'd rather see an iterator version of bufio.Scanner. |
@aarzilli I think it's pretty trivial to add an |
I wonder if it's necessary to return the index for bytes, since it's trivial to count them yourself. I guess it's nice to mirror the for/range statement.
Should this return the start index of the string too?
Should we have SplitAfterNSeq and SplitNSeq too? If not, why? These are the only ones left out.
If you need to know the length, the existing functions can handle that case. |
Join would be useful to have. For example, just now I wanted to make a String() method for a named slice. The easy way to do it would be |
Finishing this proposal discussion is blocked on #61405. |
Change https://go.dev/cl/558735 mentions this issue: |
Have all remaining concerns about this proposal been addressed? The proposal is to add these to both bytes and strings:
|
I would not expect them to include the terminating newlines if I didn't read the documentation. |
Good reason to read the documentation! 😄 If the newlines are not included, then the user of the iterator cannot distinguish Lines("abc") from Lines("abc\n"). It is often important to know whether a file has a non-terminated final line. |
If the 'last' line does end in a newline, does Edit: And if it does yield that last line, how is |
I know there are cases where it matters if the file ends with a newline or not but I don't think I have ever had to deal with that in any code I have ever written. |
I think the reason to keep the \n is then you don't have to document what happens to \r\n. |
It does not yield a final "". |
@rsc In my experience, I almost never care if a file ends in a newline or not and I'd want all newlines to be stripped off, so I can easily imagine having to use Other than this concern of mine, this proposal LGTM. 👍 |
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add these to both bytes and strings:
|
@rsc There is some percentage of usages of such a mechanism that require the newline. If it were majority newline or even 50/50 this would be a fine API. I don't know what the actual percentages are for needing \n but I would guess it's much less than 50%. I wouldn't mind if there were a pair of iterators for newline and sans-newline or if there's just the |
As existing precedence, the Personally, I've wanted newlines excluded a majority of the time, but have wanted it occasionally.
It seems to me that if I wanted to include newlines, I should just use |
I think the choice matters mostly because we don't have #61898 (or anything like it). It's easy to do a Which, to me, also seems to imply that @rsc is correct at least in the long-run. At some point, you can write |
|
I said before that returning the \n means you don't have to document the \r\n case, but now I lean to the side that Lines should be like bufio.Scanner and strip \r\n and not return a final empty "". People who want the endings and the final blank can use
|
If @earthboundkid's suggestion isn't palatable, then I still believe that a newline-stripping alternative should be documented in |
No change in consensus, so accepted. 🎉 The proposal is to add these to both bytes and strings:
|
What's the accepted version of Lines? |
And at the risk of sounding like a broken record, has my suggestion to document a newline-stripping variant of |
Whats accepted is explicitly documented in the acceptance comment: #61901 (comment)
|
There was a lot of discussion in between the final comment period starting and the actual final decision, so I think it's good to be explicit about the ruling on that. |
Just a minor thought on the docs for
If
Because in that case there are no newline-terminated lines in the string Similarly the last sentence does not quite sit right for me:
Because if it doesn't end in a newline, it surely falls outside the definition of "an iterator over the newline-terminated lines". All that said, I haven't managed to conjure up anything better! Because capturing the edge cases in pithy documentation here seems particularly hard. |
The accepted version of Lines is what is in the accept message: the newlines are included in the lines, for the reasons given above. In essentially all cases I can think of, the newline chopping is easily done when parsing the line inside the loop over the lines. As others have noted, it is easy to do s = strings.TrimSuffix(s, "\n"). @dsnet mentioned bufio.Reader.ReadLine, but that doc comment says "ReadLine is a low-level line-reading primitive. Most callers should use Reader.ReadBytes('\n') or Reader.ReadString('\n') instead or use a Scanner." ReadBytes and ReadString do include the delimiter, while Scanner by default does not. The standard library is as split as this discussion, but the basic primitives ReadBytes and ReadString preserve the \n, while the higher-level, more configurable API defaults to removing them but is easily reconfigured (with a different split function). My general rule for parsing functionality is to avoid throwing away information, which is the justification for keeping the \n. |
Change https://go.dev/cl/587095 mentions this issue: |
We propose to add the following functions to package bytes and package strings, to allow callers to iterate over these results without having to allocate the entire result slice. This text shows only the string package form.
This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.
Iterating over lines is an incredibly common operation that we’ve resisted adding only because we didn’t want to encourage allocation of a potentially large slice. Iterators provide a way to finally add it.
Iterating over bytes in a string is common and too difficult, since range ranges over runes. This function will inline to the obvious for loop (because we will make sure it does):
Iterating over runes is served by a regular range loop, but like slices.All and maps.All, it could be useful as an input to other iterator adapters. The name is Runes, not Seq or All, so that its clear at call sites what is being iterated over (runes not bytes).
Similar to Lines, there should be iterator forms of Split, Fields, and Runes, to avoid requiring the allocation of a slice when the caller only wants to iterate over the individual results. If we were writing the library from scratch, we might use the names Split, Fields, and Runes for the iterator-returning versions, and code that wanted the full slice could use slices.Collect. But that's not an option here, so we add a distinguishing Seq suffix. We do not expect that new functions will use the Seq suffix. For example the function above is Lines, not LinesSeq.
The text was updated successfully, but these errors were encountered: