Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make an RFC for immutability and freshness in the standard library #22

Open
LiberalArtist opened this issue Jul 17, 2019 · 10 comments
Open

Comments

@LiberalArtist
Copy link

Racket 2 presents a good opportunity to revisit which standard library functions return mutable values and which are guaranteed to return freshly-allocated values.

Spoiler: I think more things should be immutable and freshness should rarely be guaranteed.

My particular pet issue is strings. I don't think I have ever mutated a string except to demonstrate bugs caused by the mutability of strings. I strongly suspect most other Racketeers also reason about strings as though they were immutable. Yet almost every standard library function returns mutable strings: for example, (immutable? (string-append "a" "")) and even (immutable? (string-append "" "")) both return #false. I think that should change. My primary motivation is the ease of reasoning that immutable values provide, but the change might also expose additional optimization opportunities.

The question applies to other datatypes, as well. I suspect bytes may be mutated much more often than character strings, but I would still like better support for working with immutable bytes. For example, path->bytes could return immutable bytes, or else a path->immutable-bytes could be added. Maybe bytes-append should return immutable bytes, or there should be an immutable-bytes-append, or the mutability of the result should depend on the mutability of the arguments (though I personally think that would be confusing, especially in a dynamically-typed context). Basically the same trade-offs exist for vectors.

@jackfirth
Copy link
Sponsor Collaborator

Strongly agree. I think we should do what we did for lists and split all mutable versions of the existing types into separate entirely modules. Like if you want to use mutable strings, you should have to use a mutable-string? type and mutable-string-* functions.

@sorawee
Copy link
Contributor

sorawee commented Jul 17, 2019

I think we should do what we did for lists and split all mutable versions of the existing types into separate entirely modules

I'm not sure if I like this idea because it really clutters modules with identifiers. Considering hash map as an example. We right now have |{weak, mutable, immutable}| x |{eq, equal, eqv}| = 9 variants, which is absurd IMO. For constructors, I think there should be only one. The variant should be specified by keyword argument.

But things do get more complicated when it comes to a function that produces a new object, like string-append. For hash map, there's not much problem[*] because there are normal variant and !-variant, where !-variant always mutate the object in-place. string-append is weird because string-append! doesn't make sense (unless we are talking about C's strcat)...

Also, we have a proposal that things should be more generic (#19). This proposal on the other hand suggests that things should be splitted even more. These two goals look conflicting.

[*]: actually there's one: I always wish for/hash to have an option to return a mutable hash.

@rocketnia
Copy link

I think the places mutable strings and mutable bytes matter most are where people have taken it upon themselves to optimize their code, at which point generic operations are less desirable because they're not as optimized as the specialized operations are. Conflict between the goals is natural in this case, and I'm optimistic that everything can have a place.

There are sort of three layers: Libraries that define the most efficient operations for a data structure (some of which probably should do whatever they need for maximum efficiency, but the wilder ones should probably have longer names), libraries that define operations that almost surely will never be efficient or recommended (so are shunned from the main library) but are still too convenient to completely abandon, and generic operations. The generic operations have to be exposed by the first (efficient) library because that's where the data type is exposed, but they'll often be inefficient guilty pleasures like the operations in the second library. Even worse, sometimes they won't even be particularly convenient or meaningful, just there because they fill a gap in the Expression Problem tableau. The sheer comprehensiveness of the tableau can be very convenient even if not all the entries are star performers.

@dbenoit17
Copy link

dbenoit17 commented Jul 17, 2019

I think genericness may not be mutually exclusive with expressiveness. Consider Rust, which has the mut keyword as a modifier property of types. In such a schema, mutable-string is equivalent to mut String, where String is the base type and the mutability property is a modifier. By making mutability a special property of types, Rust can also perform checks at compile time and emit warnings when mutability is unnecessary in a given circumstance.

In cases where one must check for a mutable string type, mutable-string? is really a combination of the contracts (and/c string? mutable?). Identifiers like mutable-string? are in a sense just built-in sugar over the contract combinator. If identifier cluttering is a concern, perhaps a solution is to use contract-call forms which could be more generic than built-in identifiers.

(define-syntax-rule (and/c? [contract ...] predicate)
  ((and/c contract ...) predicate))

(if (and/c? [string? mutable?] "immutable string")
    true-expr
    false-expr)

@dbenoit17
Copy link

@rocketnia I think the concern over performance of generics is well-founded, and is something we should discuss in the generics RFC. Performance impact may or may not be possible to avoid, depending on which dispatching operations can be done at compile time.

@gus-massa
Copy link

My particular pet issue is strings. I don't think I have ever mutated a string except to demonstrate bugs caused by the mutability of strings.

I agree.

I like the idea of making most (all) functions return an immutable version, and the immutable thing may be not fresh. I guess this include 99% of my use of string and bytes. But vectors are more prone to be mutated. It is also optimizer friendly, but I guess it would be useful to add some cases to reduce (string-copy (string-copy x)).

I'd like to drop the immutable? function because it has a weird idea of what is immutable. Let's split it in string-immutable?, ... (or string-mutable?, ...).

Nevertheless, I like to maintain a immutable version of "" and a mutable version of "". I'm never sure if this is a good idea, but I like it.

@rmculpepper
Copy link

I like the idea of making more things immutable by default and guaranteeing freshness in fewer places.

I don't like the idea of splitting existing primitive operations like string-ref into string-ref and mutable-string-ref (namespace pollution). For constructors, though, it might make more sense to have different names.

I would propose keeping immutable? and adding generic mutable and immutable procedures that convert (shallowly!) between immutable and mutable kinds of data types. For example, to join two strings into a mutable string: (mutable (string-append str1 str2)). If most operations return immutable values, then uses of immutable should be rare---probably mostly at the end of code that uses mutation to construct a value. If mutation is rare, then mutable should be fairly rare. And presumably the compiler can watch for such combinations and eliminate copies in some cases.

Similarly, fresh could return a value equal (and of the same mutability) but not eq to its argument.

@spdegabrielle
Copy link
Sponsor Member

#37 closed because covered by #22

Drop more guarantees about new object allocations: for example, allow (append lst '()) to return lst, and the same for other list, string, bytes, vector functions (drop 0 items from the end of a list. substring and regexp-replace that don't change a string, etc).
@elibarzilay 3 Oct 2012

@rocketnia
Copy link

As long as this is a topic about generic relationships between mutable and immutable types, I want to point out #28 as an example of how a single immutable type (immutable vectors) can naturally correspond to more than one distinct mutable type (fixed-length vectors and resizable vectors). So while there may be a straightforward generic operation that converts mutable types to immutable ones, any operation that converts the other way will likely make at least a few non-obvious choices.

@LiberalArtist
Copy link
Author

I don't think I have ever mutated a string except to demonstrate bugs caused by the mutability of strings.

Here's one of those "bugs caused by the mutability of strings:" racket/racket#2179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants