-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
const
-ness of view operations
#385
Comments
I agree with the tone of this RFC, these are real issues.
I think that views are already expensive, and that the serial case is an important one, so my feelings are currently against a model that requires internal synchronization to implement a conforming view. Still, this hasn't been investigated yet and maybe (hopefully?) isn't as bad as I expect it to be. There is another issue. While views are not thread-safe in general, they are (or should be) cheap to copy, so the thread-safety issues that might arise are from users inadvertently sharing views between threads instead of copying them (which is what users should be doing) [*] . This is another point in which views differ from containers (which are expensive to copy and might be worth sharing). Views should almost never be worth sharing and by making them thread-safe we might be encouraging the wrong usage patterns. Sometimes one might really want to share a view, but in those cases one wants to perform sharing explicitly as opposed to by mistake, so I think thread-safe versions of some of the views (something that you are not proposing) could be really useful even though having the world split might impact re-usability (e.g. can't reuse Is there a list somewhere of which views are not thread safe? (e.g. probably all the [*] Maybe the cppguidelines could in the future improve the safety issues by allowing marking some views as not-thread safe in the spirit of rust |
I quickly searched the code and I discovered that box.hpp and stride.hpp already use atomics for some synchronization. I find that surprising. Could anyone comment what is rationale behind that? Is not we paying for what we don't normally use? PS seems like this is for some mutable members which, where the user would assume thread-safety because of const interface. |
The current model is: either the range has visible mutable state and does not have
It's so that
Ack. Baby, meet bathwater. Some views simply cannot be const, logical or physical; an |
One simplification is that no views have |
This implementation of template<typename Val>
struct istream_range
: view_facade<istream_range<Val>, unknown>
{
private:
friend range_access;
std::istream *sin_;
mutable semiregular_t<Val> obj_;
struct cursor
{
private:
istream_range const *rng_;
public:
cursor() = default;
explicit cursor(istream_range const &rng)
: rng_(&rng)
{}
void next()
{
rng_->next();
}
Val &get() const noexcept
{
return rng_->cached();
}
bool done() const
{
return !*rng_->sin_;
}
Val && move() const noexcept
{
return detail::move(rng_->cached());
}
};
void next() const
{
*sin_ >> cached();
}
cursor begin_cursor() const
{
return cursor{*this};
}
Val & cached() const noexcept
{
return obj_;
}
istream_range(std::istream &sin, Val *)
: sin_(&sin), obj_{}
{}
istream_range(std::istream &sin, semiregular<Val> *)
: sin_(&sin), obj_{in_place}
{}
public:
istream_range() = default;
istream_range(std::istream &sin)
: istream_range(sin, _nullptr_v<semiregular_t<Val>>())
{
next(); // prime the pump
}
}; has |
That's interesting, but it feels a bit dirty. I don't like |
I disagree with the claim that there are " Do you disagree with the semantics we've defined for views and input iterators that make this implementation valid, or do you disagree with the results this implementation produces for programs with undefined behavior? |
Neither. I'm expressing my distaste for using up all the wiggle room we've given ourselves wrt input ranges. I don't think I disagree that this is allowed. And it probably should be. But man does it feel wrong. EDIT: I think it feels wrong because not everybody will be cognizant of the fact that input iterators are not required to be dereferenceable after increment, and the type system is giving them no help. A good interface makes it easy to write correct code and hard to write incorrect code. This is not that. |
For the record, this issue interests me greatly. The interaction of the type system with views is very important to get right. I would love to hear a better suggestion, but so far I favor the current design as warty and surprising as it can be. Perhaps we look for a library solution that guides people away from the warts and toward doing the right thing, like providing the const-correct EDIT: One of the reasons |
What exactly do you mean here? Iterators are only dereferenceable after increment if |
If that is the case the problem is IMO that dereferencing doesn't mean the same thing for InputIterator than for other iterators, and it looks to me that we got ourselves into this hole by calling it the same thing. A solution could then be to call it something else This is far from perfect, breaks the iterator hierarchy, and EDIT: going this far is IMO not worth the trouble, and making InputIterators return a range is too weird. |
This.
If by "we" you mean the original designers of the STL then I agree. It's too late to change it now. The question is, what hand can we make of the cards we've been dealt? Casey thinks he can pull an inside straight (stretching the metaphor), I think we should settle for a std pair. |
Dereference is fine, really, it's increment that causes the insanity. Increment has spooky action-at-a-distance in that it renders copies invalid. Such action-at-a-distance is anathema to our ability to reason locally about program behavior.
Yes, and that ship got on the bus and left the train station long ago. We can't bolt safety onto single-pass iterators now. What we can do is suggest implementation techniques that exploit the undefined behavior to provide at least some error detection. An input range that detects repeated calls to And yes, I admit I'm doing a lot more complaining about problems in this thread than discussing solutions. My goal was not to belittle the current use of const/mutable for views, so much as to point out some things I think need improvement and see of others agree. |
I don't think const should affect the behaviour of views. If you want to cache state, using mutable is fine. This issue took me unexpectedly and made me lose a lot of time. Had I known this beforehand, I might have stuck with boost range v2 instead. |
Possibly for Input ranges, but how could it be fine to mutate a const object that is possibly being read concurrently from different parts of your program? Guarding against that kinds of misuse is exactly what
Seriously? Be my guest. |
Boost.Iterator/Boost.Range are fairly simple both in concept and implementation, are well documented, well understood, mature, and I happen to have a lot of experience with it. I chose to use Range v3 for a toy project since it was a chance to evaluate it, but given the minimal documentation, the amount of complexity I ran into, and the open major design issues like this one, it might appear that it's not quite ready. Anyway, I'm not quite used to the view/cursor dichotomy as opposed to the range/iterator one yet, but couldn't all state go into the iterator rather than the view, removing all needs for synchronization? |
I think maybe the library needs a FAQ because @mgaunard questions are fair for new users to have, and come up very often. New users seem to have two issues with the library:
So because they don't know this, and some of them come with a Boost.Range v2 background, they first try to make everything const, and "clash" against the library. Then they ask themselves why can't I make a view const? It has no state! And because they don't know the second point, they end up very confused.
This is how range-v3 is designed. While some views don't store state, most do. Don't make them const. This might be hard if you are used to "make everything const", but ask yourself, does this stateful algorithm really need to be const here?
That is, if you do: auto v = range | view::remove_if(pred);
for (auto&& i : v) { ... } // begin/end are amortized O(1) here
for (auto&& i : v) { ... } // begin/end are amortized O(1) here as well The way this is implemented is that some views cache the begin/end on construction, so that all the calls to begin/end from that point on are O(1). This requires state, so vies that do this cannot be const (@ericniebler: I don't find this in the design paper, have you written about this somewhere besides in the issues? I guess this also would belong to a FAQ). @mgaunard This is just a guarantee that range-v3 views offer you. If you don't need this/want this, range-v3 offers you the tools to build your own But think about it thrice before going down that route. What do you win? The ability to make a view const? For what? Is that worth it? What do you loose? (O(1) begin/end) |
From @CaseyCarter in #428 :
The difference between if(0 != ranges::advance(it, -1, ranges::begin(rng_->mutable_base())))
it = rng_->get_end_(BoundedRange<Rng>()); If For ranges like
The design compromise that is currently implemented makes I'm sure you can find examples in the other views where So, here are the alternatives I see:
Any other options I missed? <spitballing> I can imagine a separate range adaptor that caches the underlying range's begin/end on construction. With C++17's guarantees about copy elision, we can investigate the possibility of using this either internally or letting users use it directly. That requires that let |
Why wouldn't you just do |
(Drifting off-topic):
This implementation of reverse_view avoids LWG#198. |
Was this design aspect (constness drops range status of certain views) discussed in WG21 yet? |
With the current design, const does not effect the strength of the concept that a range satisfies. So no, it has not been discussed in WG21. |
@CaseyCarter Thoughts? I'm going to start work on Range TS2 "Views Edition" soon/eventually, and as of right now, I haven't heard a better alternative than the current design. I can play with improved designs for |
I ran into this when trying to improve a bit the Needless to say, I gave up since the best I could come up with was this:
Given the current situation, maybe it is possible to implement a " It would also improve teachability if we had a more solid story for the undefined behavior behind |
This is a fallacy. It is exactly as thread-safe as any other component in the standard library.
The problems with calculating begin/end of the underlying range on view construction are discussed above. What are you contributing to the discussion that is new? That in your use case view copy doesn't happen very often? Should we conclude from this that for all use cases it's OK for copy (and move and construct) to be O(N) instead of O(1) just so that users can iterate over a const filtered, reversed view? That doesn't seem like the right tradeoff. And what to do about the other views that need to maintain mutable internal state, like
That is not my understanding. Iterators always refer to boundaries between elements. The same is true of pointers to objects in memory. Despite the fact that objects take up a range of bytes in memory, the singular address of that object is the point at which the object starts. That is the boundary between memory that is not part of the object and memory that is. Iterators (and pointers) combine access and traversal. By convention, dereferencing the pointer (or iterator) yields the value of the first element of that range. In different algorithms, the "access" mode of the iterator has greater prominence than the "traversal" mode. This can give the impression that the iterator denotes the element as opposed to the boundary immediately before the element. (The language of the standard doesn't help here.) For most cases, the difference is immaterial and we've skated by this long with muddled language and muddled thinking.
I've looked at the slides. As I discuss above, I disagree with your premise. There is no confusion or inconsistency that I can see about what an iterator denotes, so there is no reason to conclude that iterators are a broken abstraction that needs to be abandoned. When you create a
Dereferencing that iterator yields the element before that boundary, which, when taken together with the way increment is defined, is consistent with the convention that dereference yields the value of the first element of the range. So what should Instead, if we see the I don't think this nature of iterators is one that is widely appreciated, but I have found it leads to some confusion. I should probably write a blog post. |
If you pass your filter range to two threads and they both call begin(), you have a race, or not? You think it is more important that |
And regarding
When inlining the current reverse adaptor, we get something like:
So reverse is not a zero-cost abstraction. Why? What are we missing? We have the wrong abstraction. If we write the loop abstractly as:
reverse can be implemented as
with no extra ifs, no two iterators and no access to With iterators defined as they are today, the element after a border is a no-op, the element before a border requires an iterator decrement. But in a reverse adaptor, the element after a border requires a base iterator decrement, the element before a border is a noop. The reverse adaptor is a victim of the lack of symmetry in the definition of iterators. |
"If you pass [any standard library component] to two threads and they both [call non-
Correct. Accessing every element of a range using
We're not missing anything. It not the point of the iterator abstraction -- or any abstraction for that matter -- to provide exactly zero overhead for all use cases. Stepanov was well aware of this when he designed the STL. A good example is the iterators of
The lack of symmetry is in the mathematical notion of a half-open range, not in iterators. I'll grant that the convention that the dereference operation, in returning the value of the element following the denoted boundary, is not symmetric, but that property comes from pointers. Your beef is with Dennis Ritchie, not Stepanov, and it's not something we can change by choosing a different iterator abstraction. It is valid for you to say that reverse is not optimally efficient. It's valid for you to say that reverse is sufficiently important for your use case that you are abandoning the iterator abstraction. It is not valid for you to conclude that the iterator abstraction is inconsistent in what it denotes (it isn't) or that the abstraction is broken and should be abandoned because it doesn't provide exactly zero overhead for all possible use cases. |
I think we understand each other‘s arguments. To be more constructive, can‘t we have two This way, there would be compile time safety regarding the performance guarantees of adaptors. Ranges V3 can provide the fast version for all adaptors, but others like me making a different trade-off do not have to. This would not add undue complexity. Novice users could probably go for a long time without knowing the And adaptors like |
While @ericniebler answered to this above I have to dig a bit deeper here: What do you mean by pass? Do you mean storing the view on thread A, and passing references to the view to threads B and C so that they can concurrently mutate it? If so, none of what's being proposed here would be enough because we would need to store every pointer and every counter in all the adaptors behind atomics (or rwlocks/mutexes) for this to work, and this is a price that everybody would need to pay. The alternative being pursued here is that those who want to do this should just put the view behind a synchronization primitive. If what you meant instead was about passing copies of the view to thread B and C, then that just works. Views are cheap to copy. You can create the view in thread A, call
It doesn't really matter. What matters is what's the complexity of For me at least the largest advantages of worst-case amortized O(1) are:
When you ask "Can't we have two This change destroys the two advantages mentioned above. What does the change buy us that make that worth it? @ericniebler I found it very enlightening to think of reversing the range |
Just concurrently iterate, no mutation. You could make a copy, but then you may have to hold the functor by reference if it has non-trivial size, and your code needs to reason about whether to make a copy or not. I also have filtering containers in my library, a container aggregated into a lazy filter. They are very practical (try returning a lazily transformed vector from a function), but expensive to copy.
We have differences either way, the world isn't as uniform as we want it to be. I find subtle race conditions occuring with some adaptors harder to teach or reason about than performance differences. I do not want to impose this view on everyone, I just want to have the option to have that view within the specifications of the standard library. If you want to make all your adaptors O(1) |
Are all the range adaptors in your library thread safe? If so, how do you implement |
No, they are not. And even if they would be, this is not a requirement I want to impose on everyone and everything. In particular, imposing a requirement such as O(1) begin()/end() or thread safety so early in the process while people are still learning how to use and also how to implement ranges is premature. Even with the C++20 standard finalized, users still won't be able to judge if we made the right choices because there are no adaptors in C++20 for them to try out. Let's put out a RangesWithAdaptors TS first, so we get multiple implementations and users taking ranges seriously and starting to use them first before cementing the fundamentals. |
I am confused. I thought that you were arguing that by making begin/end amortized O(1), and thus non-const, we were sacrificing thread safety, which for you was important. But now you are saying that thread-safety is not it. So what problems do you see in amortized O(1) non-const begin/end ? Or did I just completely misunderstood this and we all agreed that there weren't any? (I haven't had my coffee yet). |
I do not want to impose any blanket constraints on implementations at this point. For many ranges, such as |
Just to add some data, here are the counts of various adaptors in our codebase:
|
Looking through Ranges TS, where is the View concept used? Where does the implementation of the library as it is proposed depend on it? |
Nowhere in the Ranges TS directly, but it is used in proposals that build on the TS, like P0789, "Range Adaptors and Utilities". |
I think there is a possiblity for filter with O(1) const begin() and O(1) copy by using index instead of iterator as the abstraction and doing the caching in the filter view ctor (but not again in the copy ctor). index'es are like iterators, but any operation gets supplied the index and its range. Iterators can easily be built on top by aggregating an index and a pointer to the range, so compatibility is no problem. Typically, the index of an adaptor is the index of the container (which for legacy containers is simply an iterator) at the very bottom of the adaptor stack, or a std::variant storing one of them in case of concat and index plus one of them for join. So as long as the container is not copied, view indices are trivially stable against copy and move, and copying/moving a filter view can simply copy the begin() cache. I will try an implementation in our codebase. |
@schoedl How would this index approach work for an InputRange/View, like |
Does filter need to cache begin() for input ranges? You cannot iterate twice anyway... |
Good question.
You can still take multiple #include <iostream>
#include <range/v3/all.hpp>
auto perms() {
return ranges::view::generate([x = 0]() mutable -> int{
std::cout << "Calling generate: x = " << x << '\n';
return x++;
});
}
int main(int /*argc*/, char ** /*argv*/) {
auto ps = perms();
auto ps_f = ps | ranges::view::filter([](auto&& v) { return v % 2 != 0; }) | ranges::view::take(3);
std::cout << "before begin\n";
auto b = ranges::begin(ps_f); // maybe cached? advances the range
std::cout << "before end\n";
auto e = ranges::end(ps_f); // end not cached: not BidirectionalRange
std::cout << "deref: " << *b << std::endl;
// take another begin - don't know if this re-uses the cached value or
// or not (the first element in the range satisfies the filter so...):
std::cout << "begin deref: " << *ranges::begin(ps_f) << std::endl;
std::cout << "loop\n";
while (b != e) {
std::cout << "x = " << *b << '\n';
++b;
}
} Prints: Calling generate: x = 0
before begin
Calling generate: x = 1
before end
deref: 1
begin deref: 1
loop
x = 1
Calling generate: x = 2
Calling generate: x = 3
x = 3
Calling generate: x = 4
Calling generate: x = 5
x = 5
Calling generate: x = 6
Calling generate: x = 7 |
|
I argued against position-based (or index-based) ranges in 2014 here: This is rather late in the game to suggest a different set of basis operations, and it has never been clear to me that the benefits of such a design outweigh the costs. In particular, you need a different set of APIs to access a begin/end index ( |
My original motivation for positions was to make adaptor stack iterators smaller. With purely iterators, iterator size grows linearly with stack height. With positions as implementational helper, iterator size is typically 2 words, independent of stack height. Any container can keep its (1-word) iterator, which will moonlight as position if someone (typically an adaptor) really wants one. |
I think with the standardization of views, we can finally put this old issue to rest. Thanks all. |
Then let the GitHub record of this issue and of #254 stand as a permanent memento of those heady days in late-2015 / early-2016 when we were youthful and idealistic; when we exuberantly frolicked through the design-space of ranges; when we waved our Thanks all. |
Containers
Containers in C++ may have both
const
and non-const
overloads ofbegin
. They differ only in their return types: non-const
begin
returns a mutable iterator, andconst
begin
a constant iterator. Neither overload changes the bits of the container representation or modifies the semantic value of the container object. Two overloads exist so that it is necessary to establish a non-const
access path to the container object in order to obtain a mutable iterator.end
is the same.size
is alwaysconst
because it provides no way to modify the container's content.Views
Views in range-v3 may have both
const
and non-const
overloads ofbegin
/end
/size
(herein termed "operations"). Views have pointer semantics - a view is essentially a pointer to a sequence of elements - so mutability of the elements viewed is orthogonal to mutability of the view object itself. Theconst
distinction here has no relation to that of containers. Non-const
operations do not modify the semantic objects being viewed, nor do they "swing the pointer" so that the same view designates different semantic objects. Non-const
operations mutate internal state that does not contribute to the semantic value of the view; theconst
-ness here is purely bitwise.The
const
-ness model used by views makes view composition painful. You can always provide the non-const
overloads, butconst
overloads are preferred when achievable. So a composer, e.g.:ends up providing two definitions of each operation: one
const
that's constrained to requireconst
operations over the underlying view(s):and one mutable that's constrained to only be available when the
const
version isn't:Ranges
I'm concerned that the differing
const
distinctions for containers and views don't mesh well into a consistent notion of whatconst
means for operations on general ranges. I see a potential for latent bugs where a programmer accustomed to the fact that callingbegin
/end
on mutable containers is threadsafe callsbegin
/end
on mutable ranges without realizing there are sharp corners here.The only mutating operation on pointers is assignment. If views are supposed to be range-pointers, perhaps assignment should be the only mutating operation? We (I) need to investigate an alternative model where view operations are always
const
and perform internal synchronization if needed.The text was updated successfully, but these errors were encountered: