Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String#clear keep the string capacity #4373

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

casperisfine
Copy link
Contributor

[Feature #17790] https://bugs.ruby-lang.org/issues/17790

This is useful when trying to efficiently re-use a buffer, and is consistent with Array#clear.

[Feature #17790]

This is useful when trying to efficently re-use a buffer.
@dylanahsmith
Copy link
Contributor

is consistent with Array#clear

It doesn't look like Array#clear does keep the capacity (

ruby/array.c

Lines 4554 to 4556 in ff43ecc

if (ARY_DEFAULT_SIZE * 2 < ARY_CAPA(ary)) {
ary_resize_capa(ary, ARY_DEFAULT_SIZE * 2);
}
)

@casperisfine
Copy link
Contributor Author

It doesn't look like Array#clear does keep the capacity

Interesting, I actually didn't think to check the implementation. So apparently Array#clear shink the capacity of if it's larger than 32. Hence why my experimentation suggested the capacity was preserved.

@ioquatix
Copy link
Member

ioquatix commented Apr 9, 2021

This is an interesting issue.

Given that users can choose between:

thing.clear

thing = thing.class.new

I believe it's reasonable that thing.clear retains any existing buffers/allocations.

This would be consistent with the typical use cases - buffers, etc, where the string/array/thing would be reused with the similar amount of data.

The only case where it might not be realistic is when thing.clear is used to immediately release memory.

@casperisfine
Copy link
Contributor Author

The only case where it might not be realistic is when thing.clear is used to immediately release memory.

Yes, that's what I'm worried of.

Also the idea is to be able to empty the string (or array or hash for that matter) while still retaining its capacity to reuse it.

But I agree with some of the concerns raised. To do this safely you might often need to be able to query that capacity to decide wether you'd rather shrink it or not.

So maybe what would be needed would be String#capacity and String#capacity=.

@ioquatix
Copy link
Member

ioquatix commented Apr 9, 2021

My feeling is 99% of these problems happen because we try to use strings for binary buffers.

This problem has been going around in circles for a long time (at least several years).

Honestly, not sure what the solution is - my last attempt to solve this was to treat strings with binary encoding differently (assumption that it's a data buffer). The implication in this issue would be that if the string was binary encoded, string.clear would not resize the buffer.

If you wanted to be more generic:

string.clear(free: true/false)

That being said, I can imagine tons of use cases where string.clear would benefit from not releasing memory. Maybe we need a benchmark?

@dylanahsmith
Copy link
Contributor

I don't see why the encoding matters, since non-binary encoded strings could still be used as a buffer with mutable strings.

free: should not be used as a parameter name for changing the capacity, since with a variable width object allocation implementation we could store the string as part of the same allocation, in which case it would be reallocating. The concept of shrinking the capacity seems more appropriate by making less of an implication about the internals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants