Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Base.@default_eltype for iterators and generators eltype function which currently returns Any? #54157

Open
Tortar opened this issue Apr 19, 2024 · 28 comments

Comments

@Tortar
Copy link
Contributor

Tortar commented Apr 19, 2024

I'm using this function to compute the type generically for iterators:

function compute_eltype(iter)
    T = eltype(iter)
    return T === Any ? Base.@default_eltype(iter) : T
end

and I'm asking myself if this couldn't be applied by default to some generators and iterators which currently returns Any (actually if it is applied specifically to just those types it can be used only Base.@default_eltype(iter) I think)

For example consider

julia> iter = Iterators.flatten(((3,), [4, 5]))
Base.Iterators.Flatten{Tuple{Tuple{Int64}, Vector{Int64}}}(((3,), [4, 5]))

julia> iter = Iterators.flatten(((3,), [4, 5]))

julia> eltype(iter)
Any

julia> compute_eltype(iter)
Int64

julia> iter = (x^2 for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#3#4"}(var"#3#4"(), 1:10)

julia> eltype(iter)
Any

julia> compute_eltype(iter)
Int64

Performance-wise I see a cost of a couple of ns. I found that there is this other issue and associated PR about a special case of this: #48249 and #48277, but I think that a function as above could be applied more generally. Would this be okay? Or is this problematic?

@nsajko
Copy link
Contributor

nsajko commented Apr 21, 2024

As far as I remember, Base.@default_eltype relies on type inference. Type inference works on a best-effort basis, so it's good for optimizations in some cases, but mustn't be relied on for correctness. So the change you're proposing wouldn't be correct.

Apart from that, the run time cost of type inference wouldn't be acceptable for eltype.

@Tortar
Copy link
Contributor Author

Tortar commented Apr 21, 2024

Consider that though collect(::Base.Generator) relies on Base.@default_eltype to infer the type for the array. So it would be logical in my opinion to use it for eltype.

Also I see very good performance using it on some micro-benchmarks:

julia> using BenchmarkTools

julia> iter = (x for x in 1:10)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:10)

julia> @benchmark eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.182 ns  6.061 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.192 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.197 ns ± 0.056 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂        █▆       ▄▃                 ▁         ▃▂       ▁ ▁
  █▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁█ █
  1.18 ns     Histogram: log(frequency) by time     1.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Base.@default_eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.182 ns  6.843 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.192 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.202 ns ± 0.085 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           █                                                 
  ▃▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▄▃▁▁▁▁▁▁▁▁▃ ▂
  1.18 ns        Histogram: frequency by time       1.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> iter = (x < 5 ? x : 2.0*x for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#1#2"}(var"#1#2"(), 1:10)

julia> @benchmark eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.182 ns  22.753 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.192 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.203 ns ±  0.218 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           █                                                  
  ▃▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▁▁▁▃ ▂
  1.18 ns        Histogram: frequency by time        1.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Base.@default_eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.182 ns  6.622 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.193 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.209 ns ± 0.073 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           █                                                 
  ▂▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▇▄▁▁▁▁▁▁▁▁▅ ▂
  1.18 ns        Histogram: frequency by time       1.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@nsajko
Copy link
Contributor

nsajko commented Apr 23, 2024

xref #54207

@Tortar
Copy link
Contributor Author

Tortar commented Apr 23, 2024

Indeed, it would be really okay if @default_eltype was just public API so that I could just use it as a fallback with confidence when eltype returns Any which can be also mentioned in the docstring:

infer_eltype(itr) = (T = eltype(itr)) === Any ? @default_eltype(itr) : T

@Tortar
Copy link
Contributor Author

Tortar commented Apr 23, 2024

Another possibility is to create a new function as above and export it or even something maybe a bit stronger as

infer_eltype(itr) = ifelse((T2 = Base.@default_eltype(itr)) <: (T1 = eltype(itr)), T2, T1)

with all the needed disclaimers in the docstring

@LilithHafner
Copy link
Member

We might want to special case Union{} because sometimes type inference can infer that the itr is empty, in which case it tells us Union{} which is true and precise and not usually what folks are looking for

julia> Base.@default_eltype Iterators.Filter([1,2,3]) do x
           sum([1]) == 0
       end
Int64

julia> Base.@default_eltype Iterators.Filter([1,2,3]) do x
           sum(1) == 0
       end
Union{}

@Tortar
Copy link
Contributor Author

Tortar commented Apr 23, 2024

A thing I don't understand is why in Base.@default_eltype (https://github.com/JuliaLang/julia/blob/master/base/array.jl#L744) it is used promote_typejoin_union(T), I would prefer to infer a Union of concrete types than an abstract type e.g.

julia> s = (x < 5 ? x : 2.0*x for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#5#6"}(var"#5#6"(), 1:10)

julia> T = Core.Compiler.return_type(Base._iterator_upper_bound, Tuple{typeof(s)})
Union{Float64, Int64}

julia> Base.promote_typejoin_union(T)
Real

Why can't we use Core.Compiler.return_type(Base._iterator_upper_bound, Tuple{typeof(s)}) at least when the Union is not too big? Notice that this affects also collect:

julia> collect(s)
10-element Vector{Real}:
...

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 23, 2024

collect does not return a Union, as it gets the types from the real values and not from inference

@Tortar
Copy link
Contributor Author

Tortar commented Apr 23, 2024

right, now I see, it widens the type. Nonetheless I guess with an infer_eltype function which returns also Unions one can probably take advantage of this with collect(infer_eltype(itr), itr)

@Tortar
Copy link
Contributor Author

Tortar commented Apr 23, 2024

@vtjnash this is a bit derailing the conversation, but I'd like to ask you: is it a bad idea to return a Union (of not too many types)? e.g. I was able to return a Vector{Union{Int, Float64}} in the above case by changing https://github.com/JuliaLang/julia/blob/master/base/array.jl#L822 in new = similar(dest, Union{T, typeof(el)}).

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 23, 2024

Yes. That will never be the type of the object when it actually contains data, so it slows down the case of when there is data to benefit the case where there is not data

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 23, 2024

You typically want the static and dynamic types to nearly coincide, for optimal results

@aplavin
Copy link
Contributor

aplavin commented Apr 24, 2024

We might want to special case Union{} because sometimes type inference can infer that the itr is empty, in which case it tells us Union{} which is true and precise and not usually what folks are looking for

It would be really unfortunate, especially if this avoidance of Union{} is motivated solely by current compiler limitations (and not language design concerns).

Do you suggest turning Union{} into Any, or something else? There's just no other type that would retain the nice invariant that the (inferred) eltype of concatenation of A + B is a supertype of A and B eltypes.

@Tortar
Copy link
Contributor Author

Tortar commented Apr 24, 2024

If we are going the route where we export a new function, which to me seems preferable, I (maybe mis)understood what @LilithHafner has suggested to be implemented as for example

function infer_eltype(itr)
    T1, T2 = eltype(itr), Base.@default_eltype(itr)
    ifelse(T2 !== Union{} && T2 <: T1, T2, T1)
end

@LilithHafner
Copy link
Member

It would be really unfortunate, especially if this avoidance of Union{} is motivated solely by current compiler limitations (and not language design concerns).

Inference is tasked with finding the narrowest type it can come up with that is a supertype of all types of instances that may actually occur. Coming up with a supertype of all types that could actually occur is a correctness thing: it has to get that right every time or we have compiler bugs. Picking a narrow type is on a best-effort basis. Returns(Any) is a perfectly valid type inference algorithm for Julia, just not a particularly performant one. Consequently a, when it is possible to prove that an iterator is empty, a really good type inference algorithm will produce Union{}, which is the narrowest type. Union{} is not a supertype of any concrete type, but nevertheless it is (vacuously) a supertype of all types of instances that may actually occur in the (empty) iterator. This is all at the language level. We happen to be running into this problem now as type inference is good enough to make these emptiness proofs. In the past some folks (including Base) have at times relied on compiler limitations by expecting type inference not to return Union{} here.

Do you suggest turning Union{} into Any, or something else? There's just no other type that would retain the nice invariant that the (inferred) eltype of concatenation of A + B is a supertype of A and B eltypes.

We cannot guarantee that in any case so long as we rely on inference because inference may decide to provide an arbitrarily wide eltype for A and/or B. Even turning Union{} into Any would not help. For example if A infers to Union{} and B infers to Int, it is quite plausible for their concatenation to infer to Int, which is not a supertype of Any.

@Tortar
Copy link
Contributor Author

Tortar commented Apr 24, 2024

if we want to return the narrowest type possible would then be fine to use (a macro version of) Core.Compiler.return_type instead of Base.@default_eltype(itr) so that to infer Union types in some cases instead of abstract ones?

@aplavin
Copy link
Contributor

aplavin commented Apr 24, 2024

when it is possible to prove that an iterator is empty, a really good type inference algorithm will produce Union{}, which is the narrowest type.<...> We happen to be running into this problem now as type inference is good enough to make these emptiness proofs.

This is great – inference is improving!

In the past some folks (including Base) have at times relied on compiler limitations by expecting type inference not to return Union{} here.

And this seems to be the actual issue.
It's just really weird to make the inferred type artificially wider, as if the inference was less precise.

@LilithHafner
Copy link
Member

And this seems to be the actual issue.
It's just really weird to make the inferred type artificially wider, as if the inference was less precise.

Which raises the question: when would someone use this infer_eltype function?

@Tortar
Copy link
Contributor Author

Tortar commented Apr 29, 2024

Which raises the question: when would someone use this infer_eltype function?

I would say at least whoever is using Base.@default_eltype currently, actually probably more because it is not even exported now

@Tortar
Copy link
Contributor Author

Tortar commented Apr 29, 2024

Seems like around 20 packages some of which very popular are using it: https://juliahub.com/ui/Search?q=@default_eltype&type=code

@LilithHafner
Copy link
Member

LilithHafner commented Apr 29, 2024

Yeah, for example

function _DisjointSet(xs, ::Base.EltypeUnknown)
    T = Base.@default_eltype(xs)
    (isconcretetype(T) || T === Union{}) || return Base.grow_to!(DisjointSet{T}(), xs)
    return DisjointSet{T}(xs)
end

to

function _DisjointSet(xs, ::Base.EltypeUnknown)
    T = infer_eltype(xs)
    isconcretetype(T) || return Base.grow_to!(DisjointSet{T}(), xs
    return DisjointSet{T}(xs)
end

has one semantic change: when inference can infer that the iterator is empty you get at DisjointSet{eltype(iter)}() instead of a DisjointSet{Union{}}().
(source)

@LilithHafner
Copy link
Member

IMO a pull request adding infer_eltype (with appropriate docs) should be accepted.

@Tortar
Copy link
Contributor Author

Tortar commented Apr 30, 2024

I could try to make a PR but I'm still unsure about a thing, consider:

julia> struct A end

julia> struct B end

julia> itr = (x < 5 ? A() : B() for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#1#2"}(var"#1#2"(), 1:10)

julia> Base.@default_eltype(itr)
Any

julia> Core.Compiler.return_type(Base._iterator_upper_bound, Tuple{typeof(itr)})
Union{A, B}

I would really prefer the second one, so I think a possible infer_eltype function should not use Base.@default_eltype but a narrower version of it (without promote_typejoin_union at the end), is that okay?

@Tortar
Copy link
Contributor Author

Tortar commented Apr 30, 2024

ok, maybe we could actually have two versions, with a tight/narrow = false argument so that the user can choose

@LilithHafner
Copy link
Member

Sure, IIUC that promote_typejoin_union dates back to when the compiler was bad at handling Unions performantly, so it would be reasonable to remove. No need for an option, the user can always call promote_typejoin_union on the result. It's also possible to continue this discussion once you open a PR (most PRs have a bit of discussion and revision before merging).

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 1, 2024

It was a recent performance improvement for collect, which is designed not to return random Union types normally (only having trouble with the empty case). The exact PR for it should be findable by git blame if interested in the full picture

@LilithHafner
Copy link
Member

The PR that changed @default_eltype was #42046.

collect has been returning abstract types instead of unions since at least 1.0.

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 1, 2024

Yes, looks like the main explanation of that PR is in #30485 though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants