Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests failing on Apple Silicon #197

Closed
joachimbrand opened this issue Feb 27, 2023 · 0 comments · Fixed by #196
Closed

Tests failing on Apple Silicon #197

joachimbrand opened this issue Feb 27, 2023 · 0 comments · Fixed by #196

Comments

@joachimbrand
Copy link
Owner

Rimu tests fail on Apple MacBook Pro with M2 Pro processor.

The error seems to be triggered by MPI reductions:

AllOverlaps: Error During Test at /Users/brand/git/code/Rimu/test/lomc.jl:90
  Got exception outside of a @test
  User-defined reduction operators are currently not supported on non-Intel architectures.
  See https://github.com/JuliaParallel/MPI.jl/issues/404 for more details.
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35
    [2] MPI.Op(f::Function, T::Type; iscommutative::Bool)
      @ MPI ~/.julia/packages/MPI/APiiL/src/operators.jl:95
    [3] MPI.Op(f::Function, T::Type)
      @ MPI ~/.julia/packages/MPI/APiiL/src/operators.jl:91
    [4] Allreduce!(rbuf::MPI.RBuffer{Base.RefValue{Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}}, Base.RefValue{Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}}}, op::Function, comm::MPI.Comm)
      @ MPI ~/.julia/packages/MPI/APiiL/src/collective.jl:668
    [5] Allreduce!(sendbuf::Base.RefValue{Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}}, recvbuf::Base.RefValue{Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}}, op::Function, comm::MPI.Comm)
      @ MPI ~/.julia/packages/MPI/APiiL/src/collective.jl:670
    [6] Allreduce(obj::Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}}, op::Function, comm::MPI.Comm)
      @ MPI ~/.julia/packages/MPI/APiiL/src/collective.jl:695
    [7] sort_into_targets!(dtarget::Rimu.RMPI.MPIData{DVec{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64, IsDynamicSemistochastic{Float64, Rimu.StochasticStyles.ThresholdCompression{Float64}, DynamicSemistochastic{Float64, WithReplacement{Float64}}}, Dict{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64}}, Rimu.RMPI.MPIPointToPoint{Pair{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64}, 1}}, w::DVec{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64, IsDynamicSemistochastic{Float64, Rimu.StochasticStyles.ThresholdCompression{Float64}, DynamicSemistochastic{Float64, WithReplacement{Float64}}}, Dict{BoseFS{5, 15, BitString{19, 1, UInt32}}, Float64}}, stats::Rimu.MultiScalar{Tuple{Int64, Int64, Int64, Float64}})
      @ Rimu.RMPI ~/git/code/Rimu/src/RMPI/helpers.jl:91
...

Related MPI issue: JuliaParallel/MPI.jl#404.

In short we do many reduction operations with MPI.Allreduce. The issue arises when the reduction operation does something else/more than just one of the few built-in reductions for scalars. Passing a general Julia function as reduction operator to MPI.Allreduce apparently works only on Intel processors at the moment.

E.g. MPI-enabled sum fails because sum uses a non-generic reduction operator that includes some type conversion

"""
    Base.add_sum(x, y)

The reduction operator used in `sum`. The main difference from [`+`](@ref) is that small
integers are promoted to `Int`/`UInt`.
"""
add_sum(x, y) = x + y
add_sum(x::SmallSigned, y::SmallSigned) = Int(x) + Int(y)
add_sum(x::SmallUnsigned, y::SmallUnsigned) = UInt(x) + UInt(y)
add_sum(x::Real, y::Real)::Real = x + y

Attempt to resolve (or work around) the issue: #196

@joachimbrand joachimbrand linked a pull request Mar 5, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant