Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing documentation: sortperm! is not truly non-allocating unless scratchspace is provided #53834

Open
AhmedSalih3d opened this issue Mar 24, 2024 · 6 comments
Labels
docs This change adds or pertains to documentation sorting Put things in order

Comments

@AhmedSalih3d
Copy link

AhmedSalih3d commented Mar 24, 2024

Hi!

Full thread here:

https://discourse.julialang.org/t/why-does-sortperm-allocate-here/112028/3

Doing a small example it is easy to see that sortperm! will allocate:

function main()
    Cartesians    = Vector{CartesianIndex{2}}(undef, 100)
    SortedIndices = collect(LinearIndices(Cartesians))

    # Fill the array with random CartesianIndex{2} values
    for i in 1:100
        # Assuming you want indices in the range 1:10 for both dimensions
        Cartesians[i] = CartesianIndex(rand(1:10), rand(1:10))
    end

    for iter = 1:5
        b = @allocated sortperm!(SortedIndices, Cartesians)
        println("Iteration ", iter, " : ", b , " allocated bytes")
    end
end

main()

Iteration 1 : 896 allocated bytes
Iteration 2 : 896 allocated bytes
Iteration 3 : 896 allocated bytes
Iteration 4 : 896 allocated bytes
Iteration 5 : 896 allocated bytes

This is because a scratch-space has not been preallocated. Dan kindly provided the following code which fixes the issue:

function main()
    Cartesians    = Vector{CartesianIndex{2}}(undef, 10000)
    SortedIndices = collect(LinearIndices(Cartesians))
    _, t = Base.Sort.make_scratch(nothing, eltype(SortedIndices), length(Cartesians))

    # Fill the array with random CartesianIndex{2} values
    for i in 1:100
        # Assuming you want indices in the range 1:10 for both dimensions
        Cartesians[i] = CartesianIndex(rand(1:10), rand(1:10))
    end

    for iter = 1:5
        b = @allocated sortperm!(SortedIndices, Cartesians; scratch=t)
        println("Iteration ", iter, " : ", b , " allocated bytes")
    end
end 

I would not have been able to discover this without help since the ? sortperm! returns:

sortperm!(ix, A; alg::Algorithm=DEFAULT_UNSTABLE, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward, [dims::Integer])

  Like sortperm, but accepts a preallocated index vector or array ix with the same axes as A. ix is initialized to contain the values LinearIndices(A).

  │ Warning
  │
  │  Behavior can be unexpected when any mutated argument shares memory with any other argument.

  │ Julia 1.9
  │
  │  The method accepting dims requires at least Julia 1.9.

  Examples
  ≡≡≡≡≡≡≡≡

  julia> v = [3, 1, 2]; p = zeros(Int, 3);

  julia> sortperm!(p, v); p
  3-element Vector{Int64}:
   2
   3
   1

  julia> v[p]
  3-element Vector{Int64}:
   1
   2
   3

  julia> A = [8 7; 5 6]; p = zeros(Int,2, 2);

  julia> sortperm!(p, A; dims=1); p
  2×2 Matrix{Int64}:
   2  4
   1  3

  julia> sortperm!(p, A; dims=2); p
  2×2 Matrix{Int64}:
   3  1
   2  4

With no mention of scratch-space.

I believe that any ! function is assumed to be non-allocating, and that this goes against this.

I went to check help for sort! too and did not find any mention of scratch space either.

Tasks

No tasks being tracked yet.
@giordano
Copy link
Contributor

I believe that any ! function is assumed to be non-allocating, and that this goes against this.

It's a style convention, not something enforced, and it's about modifying at least one of arguments, not being entirely non-allocating.

@AhmedSalih3d
Copy link
Author

I believe that any ! function is assumed to be non-allocating, and that this goes against this.

It's a style convention, not something enforced, and it's about modifying at least one of arguments, not being entirely non-allocating.

I don't disagree that ! can be simply taking as modify arguments in place but I believe that most instinctively see it as does not allocate. Even if one does not agree with this, I still think it would be beneficial to somewhere in sort functions, to actually mention the scratchspace variable.

@LilithHafner LilithHafner added docs This change adds or pertains to documentation sorting Put things in order labels Mar 25, 2024
@LilithHafner
Copy link
Member

Scratchspace handling in sorting is okay but not perfect. For example, sometimes it still allocates when passed a scratchspace. I don't want to commit to keeping the current system as is forever. For example, it's possible using Memory might be better. Consequently, I chose not to publicize it (and therefore make it a part of the stable API) when I added it.

I have yet to find a single real-world example where explicitly passing a scratch space to sort! or any of its variants is a good idea or even has a measurable impact on performance. Until someone finds such an example, or scratch space handling is perfect, I'm inclined to think it's premature to publicize & document it.

Actionable things here

  • Document that sorting may allocate even when the function ends with an !, and note that slower non-allocating algs exist.
  • Document more clearly that ! does not mean non-allocating.

@AhmedSalih3d
Copy link
Author

Scratchspace handling in sorting is okay but not perfect. For example, sometimes it still allocates when passed a scratchspace. I don't want to commit to keeping the current system as is forever. For example, it's possible using Memory might be better. Consequently, I chose not to publicize it (and therefore make it a part of the stable API) when I added it.

I have yet to find a single real-world example where explicitly passing a scratch space to sort! or any of its variants is a good idea or even has a measurable impact on performance. Until someone finds such an example, or scratch space handling is perfect, I'm inclined to think it's premature to publicize & document it.

Actionable things here

  • Document that sorting may allocate even when the function ends with an !, and note that slower non-allocating algs exist.
  • Document more clearly that ! does not mean non-allocating.

Thanks for detailed explanation.

Okay, if scratchspace is not part of official/stable API, it does not make sense to document it inside of ? in my opinion - especially if providing a scratch space at times is not enough to avoid allocations. What are the slower non-allocating algorithms ?

I think what you suggest by improving the explanation here, https://docs.julialang.org/en/v1/manual/style-guide/#bang-convention, to highlight that in-place operations does not mean non-allocating. I just make a fork of official doc, put some text in and then we take it from there?

I had a weird edge case, in which if I commented out a line, then sortperm! would not allocate at all, if I commented it in, it would allocate once. I am programming non-allocating algorithms for simulations, so I need to ensure there is truly none - this is why I spotted it in the first place.

Kind regards

@LilithHafner
Copy link
Member

I am programming non-allocating algorithms for simulations

Why must your algorithms not allocate? If it is for performance reasons, remember that unexpected allocations are typically a sign of type instability or other major performance issues, but allocations themselves are not all that slow and can enable faster algorithms.

What are the slower non-allocating algorithms?

On Julia 1.9+, you can use alg=QuickSort. On older versions there are none (QuickSort sometimes does allocate)

julia> VERSION
v"1.8.5"

julia> x = rand(1:10, 100);

julia> @b sort!($x, alg=QuickSort)
248.288 ns (1 allocs: 144 bytes)

I just make a fork of official doc, put some text in and then we take it from there?

Yes! Once you are ready to contribute the changes you've made to the official docs, make a pull request, and feel free to @ me in it.

@sumiya11
Copy link
Contributor

I have yet to find a single real-world example where explicitly passing a scratch space to sort! or any of its variants is a good idea or even has a measurable impact on performance. Until someone finds such an example, or scratch space handling is perfect, I'm inclined to think it's premature to publicize & document it.

FWIW, I have experimented with scratch spaces for sort! in my code (repetitive sorting of rows and columns of huge sparse matrices), which gave 1-3% overall speed up in linear algebra for particular "real-world" examples.

So I think it is a nice improvement :), though no idea how useful it is in general

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation sorting Put things in order
Projects
None yet
Development

No branches or pull requests

4 participants