Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine trait for string and collection concatenation #203

Closed
wants to merge 8 commits into from

Conversation

apoelstra
Copy link
Contributor

@bgamari
Copy link

bgamari commented Aug 17, 2014

I think the move away from + for string concatenation would be a good thing. + as an arithmetic operator is commutative and this feels like an important enough property that we should distinguish it from a potentially non-commutative semigroup operation.

@pnkfelix
Copy link
Member

@apoelstra IMO this could very well have gone up on discourse first. (I saw your comment on rust-lang/rust#16541 that mentioned potentially doing that.) My opinion is that draft RFC's can go through rounds of improvement on discourse first. But then again, perhaps you are more eager to see a quick yay/nay rather than going through discourse-based bike-shedding before coming here.


As for the operator itself: Have you considered that going forward we probably want to encourage use of AddAssign (+=) rather than Add (+) for building up strings? Do you think that we will need to put in a ++= operator?

@CloudiDust
Copy link
Contributor

I think we should go with Accumulate or Accum as the trait name, as it is both generic and to the point.

@pnkfelix, yes I do believe we need AccumAssign and ++=.

@japaric
Copy link
Member

japaric commented Aug 17, 2014

we probably want to encourage use of AddAssign (+=) rather than Add (+) for building up strings?

👍 I've suggested this before as well.


@apoelstra What would be the signature of the compose method? If it's going to take &self, then it's going to be as inefficient as the current Add implementation for Strings, because it would make an implicit clone of the lhs String. In that case I would prefer not to add this ++ sugar, and encourage += instead.

@CloudiDust
Copy link
Contributor

According to the wikipedia article on Monoids, especially the part on "Monoids in computer science", CS people tend to utilize monoids with "accumulating" or "folding" operations.

And string concatenations and function compositions are "accumulating" operations.

We have other uses for the name Fold, and Accum fits ++ better.

The problem is, does "Accumulate" imply the involvement of more than two elements?

Sorry that I am on my phone and cannot edit the previous post.

@CloudiDust
Copy link
Contributor

@japaric, or ++=?

@japaric
Copy link
Member

japaric commented Aug 17, 2014

@japaric, or ++=?

I'm in favor of either += or ++= if (the current) + for String gets deprecated. But, IMO, adding a new lang item (++=) just for Strings is overkill.

@CloudiDust
Copy link
Contributor

@japaric, I think ++= can be implemented on other collections, like vectors, or linked lists, or even hash maps.

And a += b should mean a = a + b (barring possible performance differences), so if we reuse += for strings, we would still be violating the commutative law that + should abide.

@apoelstra
Copy link
Contributor Author

@japaric ++= is not "just for strings", it is applicable everywhere that ++ is. As @CloudiDust points out, this includes a lot of collection types. Using += for something that is not x = x + y in a language that has + and = is an inconsistent interface.

@pnkfelix Yes, SOP for discourse is endless bikeshedding on such things as ++ vs ., arguments about other languages' syntax having nonabelian +, nontransitive ==, etc., and "nobody minds", complaints that Compose is a bad term, it should be more foreboding/less forboding, etc., etc.

This RFC came out of several IRC discussions with people familiar with Rust and the problem space, and has already undergone revisions because of it. For example I wanted Compose to be a monoid trait with an empty method, but somebody pointed out that this way Compose can be semigroup while Compose+Default is monoid. (Plus it was not expected that a monoid trait would be accepted, since that is just too Haskellish for Rust.)

@bluss
Copy link
Member

bluss commented Aug 17, 2014

The RFC should mention using ++ for Vec as well.

@japaric
Copy link
Member

japaric commented Aug 17, 2014

@japaric ++= is not "just for strings", it is applicable everywhere that ++ is.

The current RFC only proposes ++ for strings. ++ being usable for other collections is going to depend on the signature of the compose method, which is not included in the current RFC. If you have more ideas that would increase the usefulness of the Compose trait, you should update your RFC.

@apoelstra
Copy link
Contributor Author

Oh! FWIW the signature of ++ should be identical to that of +, but you're right that that belongs in the RFC itself. Edit: I've updated.

Edit2: Thanks for the suggestion @bluss. Updated RFC.

`Compose` should also be used for function compositions, at least for single-argument
functions `T->T`. How would this interact with our current/future coherence rules?

Where else should `Compose` be used?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compose could be used on an Iterator, as a synonym of the chain method. It could also be used on GenericPath, performing a join.

@SiegeLord
Copy link

I am very strongly against any requirement of commutativity on Add, since it breaks expression templates.

@bgamari
Copy link

bgamari commented Aug 17, 2014

Andrew Poelstra notifications@github.com writes:

This RFC came out of several IRC discussions with people familiar with
Rust and the problem space, and has undergone changes because of
it. For example I wanted Compose to be a monoid trait with an
empty method, but somebody pointed out that this way Compose can
be semigroup while Compose+Default is monoid.

This is a nice point and Haskell has shown that separating Semigroup and
Monoid indeed has value. If this is the direction taken will there be formal
laws on the instances (e.g. a note in the documentation indicating that
instances should satisfy a ++ default() == a, etc.)?

@bgamari
Copy link

bgamari commented Aug 17, 2014

SiegeLord notifications@github.com writes:

I am very strongly against any requirement of commutativity on Add,
since it breaks expression templates.

Could you be a bit more explicit here? I'm probably just being thick but
I'm not seeing the problem here.

@CloudiDust
Copy link
Contributor

@apoelstra, I think all collection types that implement Extendable are eligible for ++/++= implementations.

@huonw
Copy link
Member

huonw commented Aug 18, 2014

It depends exactly what we define commutativity to mean. For expression templates, the memory of a + b and b + a are different but they represent the same thing (i.e. (a + b).evaluate() and (b + a).evaluate() are equal).

@Gankra
Copy link
Contributor

Gankra commented Aug 18, 2014

+1 to the general idea. I like having the separation just for a intuitive scanning of code. + is a merge, and is "usually cheap", while ++ is concatenate, and is "usually expensive".

@SiegeLord
Copy link

Yes, what huonw said.

One more issue I thought of today is types declared outside of libcore. They cannot implement an operation that uses a built-in type in the LHS. Is an operator commutative if one of the commutations cannot be written down? E.g. consider matrix + scalar operator overload. It's perfectly natural to have it, but the reverse cannot be written down due to how operators are implemented in Rust.

@bgamari
Copy link

bgamari commented Aug 18, 2014

@SiegeLord a fair point.

@apoelstra
Copy link
Contributor Author

@SiegeLord Yes, I would consider an operation to be commutative even if one of the commutations can't be written down. This throws a wrench in my dreams of having unit tests for commutativity, but those were already broken by the fact that you can define a + b but not b + a if a and b are different types, and you can't have mutually-dependent traits. (These sorts of issues are easily solvable by a trait-testing RFC by the way -- Add<B for A> and Add<A for B> would both be required for the commutativity unit test, and expression templates would need a way to opt out of most every unit test since expressions naturally don't satisfy the properties of the operations they represent.)

The main reason is that I consider "one of the commutations can't be written down" to be a bug in the current trait definitions, which would be fixed by multiple dispatch. The two type parameters to (most) binary operators should be treated on equal footing.

@reem
Copy link

reem commented Aug 18, 2014

I am strongly in favor of this RFC. Rust has, should absolutely be implementing the best traits defined in languages like Haskell that abstract massively popular patterns. ++ could be use for the vast majority of collections and could go a long way towards making Rust code surrounding those collections much cleaner.

@apoelstra
Copy link
Contributor Author

@CloudiDust Good call on Extendable -- in fact, I think every collection should implement this, and I'll update the RFC to say so.

@CloudiDust
Copy link
Contributor

@liigo, we are not creating another operator, ++ is what Haskell uses. The reason there are bad operator overloadings is that, the overloadings violate assumptions about the operators. And concatenation violates assumptions about +, for the same reason, ./../&/|| are all not good choices.

EDIT: Grammar.

@CloudiDust
Copy link
Contributor

@liigo, also, just because something is widespread in mainstream languages, doesn't automatically mean it is right, or Rust should adopt it. Unconstrained null is such a thing. And IMHO + for concatenations is another.

@CloudiDust
Copy link
Contributor

@liigo I think a reason against ++ would be that, C/C++/Java programmers already have their assumptions about ++.

However their ++ is unary, while Rust's will be binary.

@liigo
Copy link
Contributor

liigo commented Sep 15, 2014

@CloudiDust My last comment focused on "common languages", not ++ or +. (I'm not against ++)

@apoelstra
Copy link
Contributor Author

@liigo "common language" means ordinary English. In English, "addition" refers to a commutative operator.

As for "you can abuse ++ just like +, so why add another operator?" Currently we are abusing +. I propose that we should stop. The addition of ++ is because when I looked into the reason that we have + for strings today, I found that apparently there was a belief that there was demand for a string concatenation operator. (The conspicuous absense of anybody actually using + for strings in Rust argues against that, but it's not an argument I'm interested in having.) And when I suggested ++ on IRC, people pointed out that in fact, it would be applicable to arbitrary collections, paths, iterators (except not really as @huonw pointed out), etc.

I am not proposing that we abuse ++, so worries about this abuse are out of scope of the RFC.

@liigo
Copy link
Contributor

liigo commented Sep 15, 2014

@apoelstra Yes I see. thanks for you explanation.

@CloudiDust
Copy link
Contributor

@liigo, I was just pointing out a possible reason that was against ++, but not necessary against "using a different operator for concatenation", because earlier I voiced my reason against . /.. / & / ||, so I thought some balance was needed. ;)

Users who want an abelian group can then use `Add+Zero+Neg` (or `Add+Zero+Sub`,
this ambiguity should probably be addressed in a later RFC related to fixing the
numeric traits once we have associated items); users who want an arbitrary group
can use `Mul+One+Div`; users who want a monoid can use `Combine+Default`, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want to write a generic method that works on all groups? According to this I would write

fn f<T: Mul+One+Div>(...

Naturally this algorithm also works for all abelian groups, but how does the compiler know that? The biggest argument for this RFC seems to be that abstract algorithms can make certain assumptions, but does this scheme even support abstract algorithms?

@mahkoh
Copy link
Contributor

mahkoh commented Sep 15, 2014

I'd find it a bit more natural to remove the Add, Mul, One, and Zero traits completely and to replace them by a set of other traits where the behavior is already encoded in the name.

  • Magma defines the * operator and makes no guarantees.
  • CommutativeMagma defines the + operator and guarantees commutativity.
  • Semigroup as Magma but with associativity.
  • CommutativeSemigroup as CommutativeMagma but with associativity.
  • Monoid as Semigroup but with the Monoid::one() function.
  • CommutativeMonoid as CommutativeSemigroup but with CommutativeMonoid::zero()
  • Group as Monoid but with the / operator.
  • CommutativeGroup as CommutativeMonoid but with the - infix and prefix operator.
  • PseudoRing defines the + and the * operator with the usual guarantees.
  • Ring as above with a multiplicative identity.
  • CommutativeRing ...
  • Field

One example:

trait Field {
    fn zero() -> Self;
    fn one() -> Self;

    fn add(self, other: &Self) -> Self;
    fn sub(self, other: &Self) -> Self;

    fn neg(self) -> Self {
        Field::zero().sub(&self)
    }

    fn mul(self, other: &Self) -> Self;
    fn div(self, other: &Self) -> Self;

    fn inv(self) -> Self {
        Field::one().div(&self)
    }
}

As you can see, the functions take self for more efficiency. Then s *= v and friends can be rewritten as s = s * v.They are also not generic over the second and last parameter. This seems like a problem at first but @thestinger has a PR open that allows to go from &str to &String cheaply.

Then comes the compiler magic: Every type can implement at most one of these traits and the compiler knows, e.g., that every monoid is also a magma.

I'm not yet sure if there's really a good story here. The biggest problem is that i8 and friends don't fit into this scheme. What you really want is a EuclideanRing, but 128u8 * 2 = 0 and therefore we have don't have one. There really would have to be a completely new algebraic structure that encodes the behavior of these integers. Another problem is that I'm not sure how you would pass a Ring to a function that takes a Monoid, given that Ring has two operations but Monoid only one.

PS: String would implement Monoid.

@Gankra
Copy link
Contributor

Gankra commented Sep 15, 2014

Please god don't add group-theoretic categories to Rust's standard library. Monoid/Monad/Functor has caused untold educational pain in the Haskell world. If we have to have them, they should at least have decent names that relate to what they actually are.

@thestinger
Copy link

If operating overloading is intended to be abused for creating an arbitrary DSL, then it shouldn't be tied to traits. Traits without guarantees aren't useful, because it's not possible to write correct generic code with them. I think it should be tied to traits and adding better support for creating a DSL is a problem for language improvements to deal with. The ability to define custom operators or infix call sugar would alleviate the need to abuse the basic operators. The == and != operators are already expected to meet basic requirements and I think that should be extended to the arithmetic operators.

I don't think new operators should be added as special cases. Please just propose a general solution rather than adding more and more built-in operators.

@mahkoh
Copy link
Contributor

mahkoh commented Sep 15, 2014

Monad/Functor has caused untold educational pain in the Haskell world.

Don't worry, those really have nothing to do with what I've described above.

@SiegeLord
Copy link

I think it should be tied to traits and adding better support for creating a DSL is a problem for language improvements to deal with. The ability to define custom operators or infix call sugar would alleviate the need to abuse the basic operators.

I really don't think operator abuse is that clear cut. E.g. it might be useful to have a trait which adds an overload for * for matrix multiplication which is quite different from scalar multiplication. If you use the current system with 'one operator, one trait' equivalence, then you either have to make Mul not be commutative (which might be a useful thing to assert) or make it commutative and preclude useful mathematical libraries from using it. Maybe you'd classify matrix multiplication as operator abuse, but it seems very draconian.

So it seems obvious to me that it's better to divorse operator overloading from traits (i.e. do what @mahkoh suggested except perhaps with less jargon and compiler magic). That way we can have two traits, commutative Mul with clear semantics, and NonCommutativeMul with clear semantics. Both useful for generic code, and both using the nice syntax sugar. I don't find the fact that the sugar is the same while semantics are different any more offensive than unrelated types having the same method name.

@apoelstra
Copy link
Contributor Author

@SiegeLord If "commutative multiplication" refers to a group operator, then addition can be used. If addition is already taken, presumably we are in a ring or something, and the appropriate way to approach this is to have traits Ring, CommutativeRing, Field, etc., which require Add and Mul and proper relations between them.

I also don't think this sort of thing belongs in the standard library, but rather in some external "mathtypelib" which would have the full gamut of algebraic structures. All I'm trying to do with this RFC is make sure the stdlib doesn't discourage that by setting a bad example.

@SiegeLord
Copy link

@apoelstra That seems completely opposite of what you suggest in the RFC, however. If you want to defer the semantics to those mathy traits, then the operator overloading traits should have no semantics at all, and should not be used in generic code directly (i.e. you'd use your Ring or whatnot instead of Add + Sub).

@thestinger
Copy link

I think it's an awful workaround for the lack of better language features. Reusing the same few names or operators for different operations is poor API design. It should be possible to define custom operators if operators are going to be used as sugar for cases not meeting sane requirements for the mathematical traits.

@SiegeLord
Copy link

Reusing the same few names or operators for different operations is poor API design.

I don't disagree, but in the realm of syntax sugar where there is enormous pressure to pick very short names, this will be inevitable, even with custom operators.

@apoelstra
Copy link
Contributor Author

@SiegeLord Hmm, you're right, my "all I'm trying to do" comment is a bit disingenuous. I definitely think that Sub should be the inverse of Add, Div the inverse of Mul, Add and Mul should be associative, etc. (I list these as examples because I don't think they're controversial.) And I think commutativity of addition falls into this same category of "basic properties" that the stdlib operators should require.

On the other hand, I worry about the slide into Haskellism where we are intimidating new users with too much abstract algebra, which is why I don't advocate having Field and CommutativeRing and Monoid be stdlib traits. There is a balancing act between the two extremes of "free for all" (where we have like C++ where every operator requires you to look up the definiton, which is probably overloaded 30 times, and is likely to do something random and even side-effecting) and "category theory" (where users feel like they need an advanced math degree to use the language).

My feeling is that commutativity and associativity are okay to enforce where appropriate, because I think that ordinary programmers are very familiar with these concepts.

@glaebhoerl
Copy link
Contributor

I think a sensible middle ground, which mostly tracks the current approach, might be:

  • For each operator, have a single trait which is maximally polymorphic (a type parameter per argument and return type), and used only for overloading that operator, without any additional requirements or guarantees.
  • Strongly discourage people from overloading generic code on these traits. Actually add a warn-by-default lint for using these traits as a bound on any type parameter of any function.
  • Add subtraits which inherit from the operator-overloading traits, constrain their types, and do specify laws. Generic code should only ever use these traits. Types should only implement these traits if their operator overloads obey the laws.

This allows full syntactic flexibility for EDSLs while still retaining sanity for normal generic code.

The main drawback is a readability hit: you wouldn't be able see the use of an operator in arbitrary code and assume that it has any particular types or laws; if it's generic code you would have to look at what trait bounds are being required, and for non-generic code you would need to be familiar with the type it's being used with. (Of course, community pressure can still be applied to discourage truly egregious abuses.)

Tying the operator overloads directly to laws as @thestinger suggests is appealing, and if that's the road we eventually choose to go down, it wouldn't necessarily bother me. I think that's a perfectly sensible option as well.

But not even GHC manages to entirely hold the line on this. They have an extension called RebindableSyntax which is even more laissez-faire and, for any functions or operators implicated by the language itself (i.e. lang items), causes the compiler to just use whatever function or operator of that name is currently in scope, instead of the real e.g. Monad trait from the Prelude. The Eq trait constraining the return type of == directly to Bool is also a huge pain point for basically every EDSL ever. I think this suggests that tying operator overloads directly to specific type signatures and laws, however desirable, may not be tenable over the long term.

@thestinger
Copy link

Why should they be traits if they're not meant to be used as traits? It doesn't make much sense to me. If it's not doing the same thing, it shouldn't use the same name / operator. The debate is already over for ==, !=, <, >, <= and >= because it's all marked with a stability marker promising only slight changes - at least if those markers mean anything at all. The only operators up for discussion are the arithmetic / bitwise ones.

@glaebhoerl
Copy link
Contributor

It's a mechanism which already exists and works for the purpose, basically.

The problem is that "the same thing" is too general a concept to be captured precisely in the type system (or at least in ours). An EDSL in which the == operator is emitted as an equality comparison in the target language rather than returning a bool to the host language is, in some sense, doing "the same thing" (even up to obeying the same laws). But good luck writing a type which allows that, still disallows "different things", and isn't overly complex for the common case.

@thestinger
Copy link

The traits for overloading == and != already have requirements. It doesn't need to be captured in the type system - it's something that correct implementations have to adhere to. Yes, you can write incorrect code that's not caught by the type system. I don't understand why that's relevant.

@aturon
Copy link
Member

aturon commented Mar 5, 2015

ping @huonw

@aturon aturon added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label May 22, 2015
@alexcrichton
Copy link
Member

This RFC has been inactive for quite some time now, so I'm going to close this. It would be good to take a fresh look at this RFC with today's modern standard library, especially with respect to ops traits and how the ownership of their arguments have changed. I'd definitely believe there's still a place for a trait like this in the standard library, but it would be useful to start anew!

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Right now just specialize one element as that's the same size as a vector, but
we can consider adding more later.

Closes rust-lang#203
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet