Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is this allocating at all? #2

Open
oxinabox opened this issue Feb 16, 2020 · 7 comments
Open

Why is this allocating at all? #2

oxinabox opened this issue Feb 16, 2020 · 7 comments

Comments

@oxinabox
Copy link
Owner

There is no real reason this code should allocate AFIACT.

I think something is going wrong with Cassette.

@MasonProtter
Copy link

There is no real reason this code should allocate AFIACT.

What code? Did you mean to post an example?

@oxinabox
Copy link
Owner Author

Any code using avoid_allocations.

E.g. example from the readme.

julia> using AutoPreallocation, BenchmarkTools

julia> foo() = ones(1, 2096) * ones(2096, 1024) * ones(1024,1)
foo (generic function with 1 method)

julia> const foo_res, foo_record = record_alloctions(foo);

julia> @btime avoid_alloctions($foo_record, foo)
  1.376 ms (29 allocations: 672 bytes)
1×1 Array{Float64,2}:
 2.146304e6

@oxinabox
Copy link
Owner Author

So the location some of the allocations come from is in creating the Context, (thanks @vchuravy).
But it is only a small minority of them AFAICT

julia> @btime AutoPreallocation.new_replay_ctx($foo_record)
  14.621 ns (3 allocations: 64 bytes)

@oxinabox
Copy link
Owner Author

oxinabox commented Feb 17, 2020

It doesn't do it in 1.2, only 1.3+
I believe this is JuliaLabs/Cassette.jl#153

In 1.3:

julia> @btime avoid_allocations($record, f_matmul)
  2.012 μs (15 allocations: 352 bytes)

in 1.2:

julia> @btime avoid_allocations($record, f_matmul);
  1.317 μs (3 allocations: 64 bytes)

64 is just the cost of creating the context.

@oxinabox
Copy link
Owner Author

This was not fixed by JuliaLabs/Cassette.jl#166
but I am going to guess it is something similar.
Will need to dig deeper

@MasonProtter
Copy link

MasonProtter commented Feb 18, 2020

I think this is related to all the splatting that happens in the definition of overdub. I tried tweaking https://github.com/jrevels/Cassette.jl/blob/master/src/overdub.jl#L524 to

using SpecializeVarargs
@specialize_vararg 5 recurse(ctx::Context, ::typeof(Core._apply), f, args...) = Core._apply(recurse, (ctx, f), args...)

and I find that the allocations in

using AutoPreallocation, BenchmarkTools

foo() = ones(1, 2096) * ones(2096, 1024) * ones(1024,1)

let
    foo_res, foo_record = record_allocations(foo)
    @btime avoid_allocations($foo_record, $foo)
end

goes from 7 allocations: 192 bytes to 6 allocations: 176 bytes, which suggests I might be on the right track. I tried for a while to manually add more methods to overdub so it doesn't rely as heavily on varargs, but I couldn't figure out how to do it correctly.

@MasonProtter
Copy link

To clarify what I mean, I think that if overdub were defined with methods

overdub(OVERDUB_CONTEXT_NAME::Context)
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1) where {T1}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2) where {T1, T2}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2, arg3::T3) where {T1, T2, T3}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2, arg3::T3, arg4::T4) where {T1, T2, T3, T4}

instead of just

overdub(OVERDUB_CONTEXT_NAME::Context, args...)

the allocations might be avoided here. The explicit specializations on ::T1, ... ::T4 are important here for forcing the compiler to infer things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants