Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Allow autoderef and autoref in operators #2147
So, as the drawbacks section says:
Passing something by value when it was expected to be passed by reference can lead to severe performance issues. Passing something by reference when it was expected to be passed by value can also lead to severe performance issues. And both can make the behavior of code less transparent to the person reading it.
(Please note: I acknowledge that having to add
In an effort to mitigate this: are there ways we could provide appropriate lints for potentially problematic cases?
This RFC addresses the technical issues that made the previous RFC technically problematic to implement. I don't feel like it addresses the remaining issues for whether we should do this, and what protections we should provide to attempt to mitigate potential issues this will cause.
@joshtriplett Can you give a concrete example? The RFC provides a concrete example of code that is made significantly less comprehensible by not having this feature, what is an example of something that would be made less comprehensible?
As far as I can see, this would only really trigger if you only have by-ref impls. Its true, you won't know if an operation using a type you don't know is by ref or by value, but why is that so critical to know that its not enough to look up the impl?
If we allow them, they will only be used in pathologically unclear code. If we don't allow them, no code that is not pathologically unclear would miss them.
Well, except for the "DSL maker" school that tries to overload syntax in every way possible. I'm not sure what's our best option there.
@arielb1 I guess someone could make
I'm finding the RFC a little unclear about this point. When you say:
Does that just mean the bindings involved need to be
Not sure whether that's an argument for or against
It means that all the overloaded derefs/indexes in the middle need to implement
Your knowledge of this stuff is impressive!
The way this compares to #2111 should be clarified. Although the RFCs share similar motivations they only overlap in obscure cases and solve orthogonal issues.
What do you think of the alternative of waiting for intersection impls so that
impl<T: Borrow<U>, U> Borrow<U> for &T
Then people would implement their operators like:
impl<T: Borrow<i32>, U: Borrow<i32>> Add<T> for U
This would solve many common cases. I wonder if it would also work for indexing.
Just fyi there are some analysis and benchmarks of using
tl;dr It might work, but it requires considerable care, which makes writing math libraries harder. Also if autojazz works reasonably then it might leave more room for optimization.
@arielb1 I don't follow, we have
I'm basically in favor of this. It seems to be essentially the 'obvious extension' of method-call dispatch to operator overloading. But I want to raise a question about possible programming patterns just to make the argments in favor more specific.
When I work with
So now if you want to "escape" a local variable or parameter (e.g., to store in a field), you do
Do you think that this pattern would apply to bignums? Where are the painpoints when you do this? Presumably one would implement
In any case, as a historical note, the
(Actually, before my time,
@joshtriplett I am definitely wary of adding too much autoref sugar in general and finding that there are performance pathologies created, or just introducing plain confusion for users (as an example, when I teach ownership transfer, I always get a question about why
That said, I do think that the damage here is somewhat mitigated by the fact that one must provide the impls to be used in the first place -- if you think a type is too expensive to pass by reference, or should only be passed by reference -- that is within your power to enforce (using newtypes, if necessary), no?
@nikomatsakis I'd suggested the operators being
referenced this pull request
Sep 13, 2017
I'd imagine that
If otoh you're working with non-crypto bignums, then your type might looks roughly like
so you'd want to pass by value.
If you are working with crpytographic bignums for RSA, then you'll want roughly either a
As an aside,
This happens to me also. But you haven't really explained why you want to do this. With this RFC, like the match RFC, the end goal is really to get to a place where you just don't have to go through this edit-compile-debug cycle over errors that are not semantically significant to your program.
Its especially worth keeping in mind, with this RFC, that we're just applying a subset of the coercions we apply to method receivers to operators. That is, today you can write:
let foo: i32 = 100; foo.is_positive()
We don't make you write:
let foo: i32 = 100; (&foo).is_positive()
In fact, the
vec.iter().filter(|x| x.is_positive()) vec.iter().filter(|x| x > 0)
But the second will not compile, and instead you'll have to do something like:
vec.iter().filter(|x| **x > 0)
This seems quite silly to me.
@withoutboats The difference is "not semantically significant" in the same sense that the difference between
I managed to find the blog post I was looking for: https://medium.com/@robertgrosse/how-copying-an-int-made-my-code-11-times-faster-f76c66312e0f
This is the kind of problem that I think autoderef and autoref will make far more prevalent.
Team member @withoutboats has proposed to merge this. The next step is review by the rest of the tagged teams:
No concerns currently listed.
Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
(emphasis added) Both of the bolded portions of your comment seem exaggerated to me. Let me try to organize several interwoven lines of argument for why I think this will not have a significant negative performance impact.
The micro-optimization opportunities are infrequent
First, let's define the scenario concretely. You have code like this:
vec.iter().filter(|x| x > 0)
In other words, there's two very narrowing parameters here: you could perform the micro-optimization to remove the indirection, and it actually impacts performance to do it.
I'm more likely to, by necessity or laziness, just write this:
vec.iter().filter(|&&x| x > 0)
In other words, the code the compiler would have inserted for me anyway.
If you knew you should perform this optimization, you're likely to do it regardless
The scenario you're describing is that because you received an error message, you realize that you can perform a micro-optimization. But this seems a little farfetched to me: most importantly, how do you know that its a micro-optimization, instead of a micro-pessimization?
If you already know that removing the indirection will be faster, why did you introduce it in the first place? I see two scenarios in which you might know about the performance impact:
You might, today, remove indirection when you get errors like this, and that's fine. But are you really doing that for good reason, or based on a hunch that it will probably be faster?
It just seems bad to me to introduce a re-edit cycle in order to get you to think about the performance of an indirection. A re-edit cycle has a huge cost, and if the performance impact of something like this really matters, you'll know some other way.
The compiler can optimize many cases already
I believe that in a lot of these cases the difference between the possibilities (especially when you mention something like removing a reference through a match) is going to be a wash for the compiler. I believe these dereferences are largely reorderable, removable, and optimizable, and LLVM will only get better at doing this for you over time.
The scenarios impacted by this RFC are not extremely common
Lastly, as I discussed previously, we already do this kind of autoref and autoderef. We currently do it for method receivers, this only extends that behavior to most operators.
This means the only time you get these kind of "prompting errors" today that you wouldn't after this RFC is when you use a binary operator on the indirected value. If instead you called methods on it, you get no error, and if you used it in other ways that give you an error today, that won't change as a result of this RFC. So talk about "far more prevalent" seems like a large exaggeration.
Now, to take a contrary approach and try to lay out reasons why I think this is not only not bad but very good. :-)
We have carefully benchmarked code in the wild that will benefit from this RFC
Today, curve25519-dalek has concrete examples of nigh-unreadable code because they have found that passing their FieldNumber type by value has a serious deleterious performance impact. We have a clear example of production code that's trying to be competitive with handwritten assembly - the absolute high end for Rust performance - and which is hurt by our current semantics.
If they weren't competing with handwritten assembly, the authors of curve25519-dalek might've been satisfied with the performance of by-value operations on their field numbers & not taken on the high syntactic burden of performing operations by reference.
After this RFC, people writing high performance math code will be more likely to actually benchmark the difference between by-ref and by-value code than they are today (they'll just have to change the impls to get the different codegen, rather than throwing syntactic salt into every use site). As a result, this RFC could actually improve the performance in crates where it really counts.
Users who don't need to care don't get bothered
On the other end of the spectrum, as I've demonstrated there are many cases where this "prompting error" is not helpful:
So, today, we have an error being introduced that users can usually fix only one way, and even if they could fix it the other way, it won't matter to them - either because the performance difference is not significant, or because performance is not really critical in their application (especially if they're still learning Rust, and systems programming through Rust).
This error is just annoying to advanced users (which is bad), but to new users it can really be the end of their Rust experience. A lot of the "borrowing frustration" especially new users run against is not necessarily complex lifetime situations, but just this kind of "type tetris" juggling. Mitigating it can be a high impact way of easing the onboarding experience for people coming to Rust, which is probably our biggest user complaint.
I really think this can help, but I also have a concern that's been growing on me recently, which is that it seems the various RFCs (like Match Ergonomics) are adding different ways of doing coercion in different places, and I wonder if these amount to local optimizations that make it hard for users to maintain a global understanding of what coercion is generally available.
My recent example is this:
It looks like this RFC will make "operators" more like the
I find the function argument case particularly important because as it is, I find myself to be a bit wary of creating abstractions, because too often it means I lose some coercion that means I have to add hacks that make my code harder to read.
Yes, I'd assume the current RFCs will incur a much longer path to stabilization, due to there being so many of them.
Ideally, any doing funny business with
As per the commentary here and in the related coercions RFC, the lang team has decided to close this RFC in favor of experimentation.
That is, while we think this design is very plausible, there are a number of related changes being considered, and we'd like to land these all behind feature gates and gain experience with them, before coming back after the impl period with fresh RFCs.
Thanks @arielb1 for writing this so well and so quickly; I suspect much of the text will wind up in the final RFC.