Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsidering the recent changes in sort guarantees #38524

Closed
ghost opened this issue Dec 21, 2016 · 12 comments
Closed

Reconsidering the recent changes in sort guarantees #38524

ghost opened this issue Dec 21, 2016 · 12 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@ghost
Copy link

ghost commented Dec 21, 2016

The PR in question is #38192.

@steveklabnik had some objections to new guarantees of the sort algorithm.

Old guarantees: "This sort is stable and O(n log n) worst-case but allocates approximately 2 * n where n is the length of self."
New guarantees: "This sort is stable and O(n log n) worst-case, but allocates temporary storage half the size of self."

The new version is ATM in nightly only, so we still have time to adjust it. I agree that saying "approximately half the size" would be slightly better.

However... should we perhaps leave the guarantees as they were (2 * n), in order not to make alternative implementations non-compliant? What exactly should the documentation say?

Here's my stance on the issue:

First, sort algorithms should never ever allocate 2 * n additional memory. That's just way too much and I'm strongly against 2 * n. Even the old merge sort can be easily tweaked to allocate just n without compromising performance.

Second, stable sorts can often get away with allocating just n / 2. In fact, I successfully tweaked the old merge sort to allocate just n / 2 before I jumped on the TimSort ship.

Any stable sort that requires n memory can be made to work with just n / 2. Here's how. Split the input array in half. Sort each half using the allocated auxilliary buffer (which is of same length as the half itself). Move one of the halves into the buffer. Merge them into the original orray. Done.

Guarantees in some other programming languages:

  1. Python (source) - TimSort allocates n / 2.
  2. Java (source) - TimSort allocates n / 2.
  3. C++ (docs) - std::stable_sort allocates n.

I believe it's safe to say that the standard sort allocates approximately n / 2, but would be okay with guaranteeing n, too.

Libs team, what do you think? @bluss?

@sfackler
Copy link
Member

I'm not sure I understand in what contexts 2 * n allocation is a deal breaker but n or n / 2 isn't?

@brson
Copy link
Contributor

brson commented Dec 22, 2016

It seems to me like the question here is only about what the docs say. I don't know the answers to your concerns, but the specificity of the documentation here, and that it has changed at all, both raise suspicion with me.

@sfackler I think the concern is that the documentation is constraining the implementation by being so specific about its space requirements. There aren't many sorts that can guarantee exactly n / 2 extra space. So alternate implementations are basically going to have to do what we're doing now, and we are constrained in our own choices in the future.

My instinct is that the docs should not make it sound like the implementation is guaranteed to use n / 2 space.

@nrc
Copy link
Member

nrc commented Dec 22, 2016

IMO, the docs should not be considered a normative spec for the implementation

@ghost
Copy link
Author

ghost commented Dec 22, 2016

@brson Yes, the question is only about what the docs say. Sorry that this change slipped through.

What you said makes sense, and I no longer believe that guaranteeing n/2 is a good idea. Or anything too specific, while we're at it.

By the way, here's what the C++ standard says about std::stable_sort: pdf, page 911 (or 925 in the pdf).
TL;DR: It doesn't specify how much it allocates, just that it will allocate if extra memory is available.

Allow me then to move forward and suggest two ways of expressing our guarantees:

  1. "This sort is stable and O(n log n) worst-case, but may allocate temporary memory up to approximately the size of self."
  2. "This sort is stable and O(n log n) worst-case, but may allocate some temporary memory."

Which one do you like best? I'm open to other suggestions, too.

@notriddle
Copy link
Contributor

This sort is stable. It is O(n log n) with respect to comparisons and swaps, and O(n) with respect to memory allocated, worst-case.

@ghost ghost closed this as completed Dec 22, 2016
@ghost ghost reopened this Dec 22, 2016
@steveklabnik
Copy link
Member

IMO, the docs should not be considered a normative spec for the implementation

I am also very happy to have this be true, if that's what @rust-lang/libs wants. This issue, as @stjepang mentions, is mostly about gaining clarity: if this comment was normative, then this would have been a breaking change.

I have no personal opinion on if it should be or not, but I fear it appears normative today.

@aturon
Copy link
Member

aturon commented Dec 23, 2016

@nrc

IMO, the docs should not be considered a normative spec for the implementation

Why not?

In general, we need a place to document the guarantees that client code should be able to rely on, and documentation is the natural place to do that. There are plenty examples of giving such contracts in documentation throughout the standard library.

Personally, I would take any behavior mentioned in the documentation as something you can and should rely on in your code, unless stated otherwise. I'm not sure how else to reliably program against an API.

@ghost
Copy link
Author

ghost commented Jan 2, 2017

Also, the Rust reference is explicitly non-normative. Isn't there an official decision regarding normativity of the stdlib documentation?

@nrc
Copy link
Member

nrc commented Jan 4, 2017

@aturon

Why not?

I might be wrong, but I believe that when adding documentation, a reviewer does not review on the basis that this is the spec for the API for evermore and will be binding up to our stability policy. Therefore it seems possible that there is documentation that we would not be happy to take as a normative spec.

Put another way, I would expect that any specification should be opted in to, rather than opted out of and documentation should indicate whether it is expected to be taken as a source of truth. Otherwise, I would treat docs as informative only with the same level of reliability as the source code when it comes to correctness (i.e., we are able to fix bugs in documentation in the same way as bugs in the source).

In general, we need a place to document the guarantees that client code should be able to rely on

Ideally, yes we do, but IMO, we don't have such a place at the moment. Likewise, we should have a proper specification for the language, but we don't (we have the RFCs and references and book, all of which are approaching such a thing, but are not there yet).

@aturon aturon added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 4, 2017
@aturon
Copy link
Member

aturon commented Jan 4, 2017

This definitely seems worth a live discussion on the libs team; nominating for the next meeting.

@steveklabnik
Copy link
Member

I'd like to attend, if possible.

@aturon aturon removed the I-nominated label Jan 9, 2017
@aturon
Copy link
Member

aturon commented Jan 9, 2017

We discussed the question of normativity for std docs in the lang team meeting today. There are good arguments in both directions. But by and large, most items we add to the standard library provide behavior which is intended to be reliable -- changes to observable behavior are extremely rare in std. The docs are usually carefully reviewed when landing new std API surface, and we take pains to highlight areas that are subject to change (see this section in I/O for example). So on balance, the team feels comfortable with treating the docs as normative, with the understanding that they are still best-effort and may contain bugs.

FWIW, part of the calculation here is that people are likely to rely on observable behavior whether documented/guaranteed or not. So we're likely to be prevented from making behavioral changes just to avoid breakage in practice, let alone because of what the docs happen to say.

steveklabnik added a commit to steveklabnik/rust that referenced this issue Jan 10, 2017
steveklabnik added a commit to steveklabnik/rust that referenced this issue Jan 25, 2017
bors added a commit that referenced this issue Jan 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants