Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement String::drain(range) according to RFC 574 #25028

Merged
merged 2 commits into from May 2, 2015

Conversation

bluss
Copy link
Member

@bluss bluss commented May 1, 2015

collections: Implement String::drain(range) according to RFC 574

.drain(range) is unstable and under feature(collections_drain).

This adds a safe way to remove any range of a String as efficiently as
possible.

As noted in the code, this drain iterator has none of the memory safety
issues of the vector version.

RFC tracking issue is #23055

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @brson (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see CONTRIBUTING.md for more information.

@bluss
Copy link
Member Author

bluss commented May 1, 2015

Previous vec drain PR is #24781.

NOTE: The iterator is really the side show. Iterating a slice out of the string can be done more flexibly with just foo[a..b] and applying the string iterator you want.

The lead role here is the ability to remove a range from anywhere in the string. So this gives String, (just like just given to Vec), magic memmove (a.k.a. ptr::copy) capabilities.

This is needed in the parser itself at a FIXME marked issue #12884 in src/libsyntax/codemap.rs, where we can avoid reallocating a whole file's source code.

@bluss bluss changed the title collections: Implement String::drain(range) according to RFC 574 Implement String::drain(range) according to RFC 574 May 1, 2015
@bluss
Copy link
Member Author

bluss commented May 1, 2015

I added a change to libsyntax to put it into use immediately. Not a big thing, but it clears up a FIXME and avoided some other already avoidable string copying.

@Gankra
Copy link
Contributor

Gankra commented May 1, 2015

Hrm, I wish this composed on Vec::drain better. You could just call Vec::drain(start..end); in the dtor, right? Is there any particularly nasty downside to this? Redoing bounds checks?

@Gankra
Copy link
Contributor

Gankra commented May 1, 2015

(otherwise this basically LGTM)

@bluss
Copy link
Member Author

bluss commented May 1, 2015

I considered that but I didn't want to take the chance that the iteration in the Vec::drain Drop wouldn't be compiled out in the u8 case. Maybe it will optimize well.

@bluss
Copy link
Member Author

bluss commented May 1, 2015

vec::drain(..) (for u8) optimizes to just set length 0, that's glorious.

This is the asm think it looks good enough? Panic code goes away only if we repeat the bounds checks

#![feature(collections_drain)]
#![crate_type="lib"]

#[inline(never)]
pub fn remove_range(v: &mut Vec<u8>, a: usize, b: usize) {
    if a <= b && b <= v.len() {
        v.drain(a..b);
    }
}
    .section    .text._ZN12remove_range20h1888d273e1a01667eaaE,"ax",@progbits
    .globl  _ZN12remove_range20h1888d273e1a01667eaaE
    .align  16, 0x90
    .type   _ZN12remove_range20h1888d273e1a01667eaaE,@function
_ZN12remove_range20h1888d273e1a01667eaaE:
    .cfi_startproc
    cmpq    %fs:112, %rsp
    ja  .LBB0_2
    movabsq $24, %r10
    movabsq $0, %r11
    callq   __morestack
    retq
.LBB0_2:
    pushq   %r15
.Ltmp0:
    .cfi_def_cfa_offset 16
    pushq   %r14
.Ltmp1:
    .cfi_def_cfa_offset 24
    pushq   %rbx
.Ltmp2:
    .cfi_def_cfa_offset 32
.Ltmp3:
    .cfi_offset %rbx, -32
.Ltmp4:
    .cfi_offset %r14, -24
.Ltmp5:
    .cfi_offset %r15, -16
    movq    %rsi, %rbx
    movq    %rdi, %r14
    cmpq    %rdx, %rbx
    ja  .LBB0_6
    movq    8(%r14), %r15
    subq    %rdx, %r15
    jb  .LBB0_6
    movq    %rbx, 8(%r14)
    je  .LBB0_6
    movq    (%r14), %rdi
    addq    %rdi, %rdx
    addq    %rbx, %rdi
    movq    %rdx, %rsi
    movq    %r15, %rdx
    callq   memmove@PLT
    addq    %rbx, %r15
    movq    %r15, 8(%r14)
.LBB0_6:
    popq    %rbx
    popq    %r14
    popq    %r15
    retq
.Ltmp6:
    .size   _ZN12remove_range20h1888d273e1a01667eaaE, .Ltmp6-_ZN12remove_range20h1888d273e1a01667eaaE
    .cfi_endproc

@Gankra
Copy link
Contributor

Gankra commented May 1, 2015

I have been told that this looks good enough for now, at least.

}

#[unstable(feature = "collections_drain", reason = "recently added")]
impl<'a> ExactSizeIterator for Drain<'a> { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the underlying Chars iterator itself isn't an exact size iterator, so we may not be able to provide this implementation just yet :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll remove that, I just didn't think that through.

@alexcrichton
Copy link
Member

The lead role here is the ability to remove a range from anywhere in the string. So this gives String, (just like just given to Vec), magic memmove (a.k.a. ptr::copy) capabilities.

Currently this isn't quite as 0-cost as Vec::drain as it still runs through the whole decoding step for UTF-8. We've got the choice of the iterator returning u8 (like bytes()) or char, or perhaps even both! If the main goal here though is to just provide the ability to remove a chunk of memory from a string then perhaps a byte-oriented iterator would be better?

I also believe that our naming conventions for strings would call this method drain_chars instead of just plain drain to ensure the element is in the name. I'm not 100% sure on this because the "collection-like" methods like pop and push implicitly use char, so this is arguably a "collection-like" method which uses char implicitly by default. On the other hand strings give no iterators that are possibly ambiguous!

@bluss
Copy link
Member Author

bluss commented May 1, 2015

Running the iterator is optional, not needed for removing the range. (See drop).

I would suggest dropping the iterator part and just naming it remove_range, but the RFC doesn't allow for that. As I said previously above, there are many iterator choices for strings and it's simple to do them just before you use remove_range.

@Gankra
Copy link
Contributor

Gankra commented May 1, 2015

The API is unstable, so I'm happy to land drain now for symmetry and see how people use it first.

Edit: this is in reply to remove_range, not whether it should be _chars or byte-oriented.

@bluss
Copy link
Member Author

bluss commented May 1, 2015

I expect we will write another RFC to straighten out the use of ranges and to add drain to other collections

@Gankra
Copy link
Contributor

Gankra commented May 1, 2015

One advantage implementation-wise of byte-oriented is that this just becomes a pure forwarding of Vec::drain. On the other hand that might make it less useful (just get the inner Vec)?

@bluss
Copy link
Member Author

bluss commented May 1, 2015

I implemented another toy thing for String first, a remove_range that returns a value that does the memmove on drop, just like this iterator. But instead of returning an iterator, it just returns something that derefs to &str. You can use it to iterate with any of the &str iterators instead of just char, but it's too complicated to explain and it's cuter than useful.

@alexcrichton
Copy link
Member

Running the iterator is optional, not needed for removing the range. (See drop).

Right, of course!

The API is unstable, so I'm happy to land drain now for symmetry and see how people use it first.

Yes I agree, I don't want to hold it up too much on this point, mostly just thinking out loud.

@bluss
Copy link
Member Author

bluss commented May 1, 2015

If you have a range:

let r = a..b;

Just use that range to both iterate with arbitrary iterator and then remove the range?

let computation = foo[r].bytes().all( /* etc */);
foo.drain(r);

@bluss
Copy link
Member Author

bluss commented May 1, 2015

Thank you both. It always improves a lot with review.

@alexcrichton
Copy link
Member

@bors: r+ 490ea42

Thanks!

@alexcrichton alexcrichton assigned alexcrichton and unassigned brson May 1, 2015
Ulrik Sverdrup added 2 commits May 1, 2015 19:51
`.drain(range)` is unstable and under feature(collections_drain).

This adds a safe way to remove any range of a String as efficiently as
possible.

As noted in the code, this drain iterator has none of the memory safety
issues of the vector version.

RFC tracking issue is rust-lang#23055
Avoid creating a new String when there is no BOM to strip, and
otherwises use .drain(..3) to strip the BOM using the same allocation.
@bluss
Copy link
Member Author

bluss commented May 1, 2015

Pushed a squash of the review fix into the first patch. It was needed to keep it tidy. No changes, just the squash.

@alexcrichton
Copy link
Member

@bors: r+ da03c9d

@bors
Copy link
Contributor

bors commented May 2, 2015

⌛ Testing commit da03c9d with merge 700b4c1...

bors added a commit that referenced this pull request May 2, 2015
collections: Implement String::drain(range) according to RFC 574

`.drain(range)` is unstable and under feature(collections_drain).

This adds a safe way to remove any range of a String as efficiently as
possible.

As noted in the code, this drain iterator has none of the memory safety
issues of the vector version.

RFC tracking issue is #23055
@bors bors merged commit da03c9d into rust-lang:master May 2, 2015
@bluss bluss deleted the drain-string branch May 2, 2015 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants