Tests failing on Apple Silicon #197

joachimbrand · 2023-02-27T21:35:21Z

Rimu tests fail on Apple MacBook Pro with M2 Pro processor.

The error seems to be triggered by MPI reductions:

AllOverlaps: Error During Test at /Users/brand/git/code/Rimu/test/lomc.jl:90
  Got exception outside of a @test
  User-defined reduction operators are currently not supported on non-Intel architectures.
  See https://github.com/JuliaParallel/MPI.jl/issues/404 for more details.
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35
    [2] MPI.Op(f::Function, T::Type; iscommutative::Bool)
      @ MPI ~/.julia/packages/MPI/APiiL/src/operators.jl:95
    [3] MPI.Op(f::Function, T::Type)
      @ MPI ~/.julia/packages/MPI/APiiL/src/operators.jl:91
    [4] Allreduce!(rbuf::MPI.RBuffer{Base.RefValue{Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}}, Base.RefValue{Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}}}, op::Function, comm::MPI.Comm)
      @ MPI ~/.julia/packages/MPI/APiiL/src/collective.jl:668
    [5] Allreduce!(sendbuf::Base.RefValue{Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}}, recvbuf::Base.RefValue{Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}}, op::Function, comm::MPI.Comm)
      @ MPI ~/.julia/packages/MPI/APiiL/src/collective.jl:670
    [6] Allreduce(obj::Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}, op::Function, comm::MPI.Comm)
      @ MPI ~/.julia/packages/MPI/APiiL/src/collective.jl:695
    [7] sort_into_targets!(dtarget::Rimu.RMPI.MPIData{DVec{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64, IsDynamicSemistochastic{Float64, Rimu.StochasticStyles.ThresholdCompression{Float64}, DynamicSemistochastic{Float64, WithReplacement{Float64}}}, Dict{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64}}, Rimu.RMPI.MPIPointToPoint{Pair{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64}, 1}}, w::DVec{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64, IsDynamicSemistochastic{Float64, Rimu.StochasticStyles.ThresholdCompression{Float64}, DynamicSemistochastic{Float64, WithReplacement{Float64}}}, Dict{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64}}, stats::Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}})
      @ Rimu.RMPI ~/git/code/Rimu/src/RMPI/helpers.jl:91
...

Related MPI issue: JuliaParallel/MPI.jl#404.

In short we do many reduction operations with MPI.Allreduce. The issue arises when the reduction operation does something else/more than just one of the few built-in reductions for scalars. Passing a general Julia function as reduction operator to MPI.Allreduce apparently works only on Intel processors at the moment.

E.g. MPI-enabled sum fails because sum uses a non-generic reduction operator that includes some type conversion

"""
    Base.add_sum(x, y)

The reduction operator used in `sum`. The main difference from [`+`](@ref) is that small
integers are promoted to `Int`/`UInt`.
"""
add_sum(x, y) = x + y
add_sum(x::SmallSigned, y::SmallSigned) = Int(x) + Int(y)
add_sum(x::SmallUnsigned, y::SmallUnsigned) = UInt(x) + UInt(y)
add_sum(x::Real, y::Real)::Real = x + y

Attempt to resolve (or work around) the issue: #196

The text was updated successfully, but these errors were encountered:

joachimbrand mentioned this issue Feb 27, 2023

Apple silicon workaround #196

Merged

joachimbrand linked a pull request Mar 5, 2023 that will close this issue

Apple silicon workaround #196

Merged

joachimbrand closed this as completed in #196 Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests failing on Apple Silicon #197

Tests failing on Apple Silicon #197

joachimbrand commented Feb 27, 2023

Tests failing on Apple Silicon #197

Tests failing on Apple Silicon #197

Comments

joachimbrand commented Feb 27, 2023