Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider having a configurable memory model #3673

Closed
WojciechMazur opened this issue Jan 15, 2024 · 2 comments · Fixed by #3719
Closed

Consider having a configurable memory model #3673

WojciechMazur opened this issue Jan 15, 2024 · 2 comments · Fixed by #3719
Milestone

Comments

@WojciechMazur
Copy link
Contributor

WojciechMazur commented Jan 15, 2024

Currently following Java Memory Model imposes a significant performance regression, ~10-15%. Every val class member coming from scalac compiler would be treated as Java final. eg. class (val x: Int){val y: AnyRef = ???} or case class(x: Int){val y: AnyRef} requires us to introduce load atomic acquire for every load, and introduce fence release after the end of constructor.
Not every application would require such strong guarantees. For these we might introduce a relaxed memory model which would reduce amount of synchronisation, as opposed to current strict memory model. The memory model should be selectable in the NativeConfig.
Alternative would to annotate specific classess or modules/packages/libraries(?, how?) to use a dedicated memory model.

@armanbilge
Copy link
Member

👍 yes please! I'm very interested in this. Particularly, I would like to relax the final Field Semantics imposed by the Java Memory Model, which are known to have a non-trivial performance cost even on the JVM, especially for ARM architectures.

Not every application would require such strong guarantees.

This is very subtle issue. I agree that in practice, most application code does not require such strong guarantees. I make this argument for Cats Effect applications.

However, that application code may be calling library code that does rely on these strong guarantees. This is currently the case in Cats Effect, particularly in our runtime components. It is also very likely the case for various j.u.concurrent data structures in Scala Native's Javalib.

Thus (nearly) all applications will rely on these guarantees somewhere in the callstack. All this to say, we cannot globally disable final Field Semantics at linking time. Instead, this should be known at compile-time and encoded in the NIR.

Another complication are situations where a user wants to rely on final Field Semantics for classes it doesn't control (e.g. originating in other libraries). However, there are reasonable workarounds available, such as wrapping it in another user-controlled class that does have final Field Semantics or publishing via some other form of synchronization instead of subjecting to a data race.


Before we release Scala Native 0.5 and debut multi-threading, we have the opportunity to relax the memory model by default without it being a breaking change.

For example, it seems attractive if we decide not to follow the JMM and disable final Field Semantics by default. This means that users can get the fastest performance without configuring anything. However, this shifts the burden to maintainers of concurrency frameworks and data structures to be mindful of the Scala Native Memory Model. The latter group is a much smaller set of libraries and developers.

There is of course risk in straying away from the well-established JMM :) but in general, you can make a memory model stronger backwards-compatibly, but you cannot make it weaker.

Concretely, here is one proposal of how we might action this:

  1. Do not implement the JMM final Field Semantics for vals by default.

  2. Introduce an annotation e.g. "@publish" that can be used to annotate vals that should follow JMM final Field Semantics. This annotation should be encoded as an NIR-level attribute. This annotation can be made available in a stub-library for facilitating cross-compilation with JVM.

    Note: it might even be reasonable to allow this "@publish" annotation to also be applied to vars or an entire class. The All Fields Are Final proposal implemented as the JVM flag -XX:+AlwaysSafeConstructors extends final field's publication semantics to all fields of all classes.

  3. (Optional) Introduce a compile time config (not linking time) that library authors can use to automatically apply the "@publish" annotation to all vals within an entire compilation unit. This would essentially be an escape hatch.

  4. (Optional) Introduce a linking time config that applications can use to restore final Field Semantics for all vals, in their own code and library NIR. Another escape hatch.

  5. Adapt java.util.concurrent and scala.concurrent to this new relaxed memory model. Changes are only necessary where classes are published by data race instead of by volatiles or atomics. In general, I believe these sorts of algorithms are the exception.


I need to make another review of the Java Memory Model to see if there are other areas that Scala Native may consider relaxing.

But I am confident that disabling final Field Semantics will give concurrency frameworks like Cats Effect much finer-grained control over exactly when expensive memory synchronization happens, without compromising the correctness of library and application code written using the framework.

@WojciechMazur
Copy link
Contributor Author

Thank you for lots of feedback @armanbilge and great ideas. I initially though about marking whole class with one of enum-like annotations:

enum MemoryModel: 
   case Strict  // Follows JMM as closely as possible
   case Relaxed // No final fields semantics, atomic unsychronized access
   case None    // Does not introduce any memory guarantees  

However, being able to mark a single fields might be even more interesting. I like the idea of @publish as it's more self explanatory.

  1. Optional) Introduce a compile time config (not linking time) that library authors can use to automatically apply the "@publish" annotation to all vals within an entire compilation unit.

For this one it probably would be best to use a compiler plugin settings, we already had a few in the past. This way also the solution would be more portable and would not require additional interop with each build tool

During some of the recent benchmarks based on 1BRC we've found out that current memory model accommodates for almost 4x higher cpu time, and over 2x slower total time. It motivates me to change the default behaviour ASAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants