-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test a new scheme to implement lazy vals #6979
Conversation
tests/run/lazy-impl.scala
Outdated
} | ||
|
||
def awaitRelease(): AnyRef = synchronized { | ||
if (!done) wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To handle spurious wake-ups:
if (!done) wait() | |
while (!done) wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how a spurious wakeup could happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what makes them spurious :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to something like this?
That something like this could happen was news to me! Yes, in that world we need a while
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@odersky yes, exactly that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to consider what would/should happen if someone fires a Thread.interrupt
at any threads waiting on initialisation.
Aleksandar Prokopec has pointed out to me a problem with this. There is no happens-before between the reader of a lazyval and the CAS of the initializing code. This is not a problem for the lazy value itself, since the algorithm takes care to go into a CAS if the lazy val is not propagated, which will publish the correct value. But it is a problem for fields contained in the lazy val, which will then not be safely published under the JMM. So the lazy val has to be made @volatile. Here is the relevant part of Alek's mail: Consider the following program:
My point is that, according to the way I understand JMM, is that the assert can fail on some architectures. Assume that the thread T2 is faster, and it executes the first and the second CAS. T2: Assume that the main thread now proceeds: main: At this point, there is no guarantee that the main thread also sees the other, earlier memory effects of the thread T2. at address(_obj): READ(obj +12), which is needed by the assert, can return something that was at that memory address before the T2's WRITE(obj + 12, 2). The reason for this is that the READ(_x) is not followed by a load-load barrier (which a volatile read would typically guarantee). |
Is this solvable by adding an explicit barrier in the initialization code ? I believe @retronym is the expert here. |
It seems to me the problem cannot happen. For the
|
I think it best to run it past concurrency-interest ML.
…On Tue, Aug 6, 2019 at 1:06 PM Fengyun Liu ***@***.***> wrote:
It seems to me the problem cannot happen. For the main thread, the
following two reads are impossible to be reordered due to the dependency
between them:
READ(_x, REG1)
...
READ(REG1 + 12)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6979>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAACU5ZSG2BPMVSITCFFIHDQDFZO3ANCNFSM4IITADAA>
.
--
Cheers,
√
|
From what I understand, it depends on the architecture. I'm no expert on ARM, but it seems that ARM might, as you point out, prevent reordering due to a data or control flow dependency, as explained starting from page 13 below: https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf However, I don't think one should rely on this. More importantly, what are the suspected gains from avoiding a volatile read? In particular - the read will not be reordered with subsequent loads, volatile or not (http://gee.cs.oswego.edu/dl/jmm/cookbook.html). For example, a JIT compiler could technically pull a volatile read of
No hoisting is possible in the next snippet when
Although, people using lazy vals and caring about performance would probably manually extract this read out of the loop. However, if you really think that lazy val accesses should be optimized, then I think you should identify a set of workloads where such accesses are important, and benchmark them to see the differences. It is possible that potential performance differences can be reduced or eliminated by improving the JIT compilers, rather than making the accesses weaker. I am under the impression that there is an inherent assumption here that a volatile |
I don't think so. The issue seems to be that a barrier is required in the fast-path, i.e. the part that reads the lazy value. |
If we make the cache @volatile we get volatile writes as well as volatile reads. Strictly speaking, the write is not needed as the value of the cache is set with a CAS which already is in a happens-before with a volatile read. How expensive would the volatile write be in this case? Should we optimize it away? EDIT: Actually, it won't matter. We only set |
@axel22 pointed out to me that we can also handle local lazy vals this way. In fact, if a lazy val is not captured in a closure, it is always confined to a single thread, so a simple null test, like the scheme in #6967, suffices. If a lazy val is captured, the cache variable will have to be boxed. And then we have the box object to hang our CAS on. For this to work, we have to move the LazyVals miniphase after CapturedVars and have to modify CapturedVars so that it boxes captured lazy vals as well as mutable variables. |
@axel22 Thanks for the links and detailed explanation, it's pretty helpful. The spec (jls §17.4.5) says that happens-before is consistent with program order. Maybe I miss something here. Edited: Alpha seems to be indeed weird http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html |
You could also consider the following: @volatile private[this] final var _x$generated: AnyRef = _ // do not assign to null to avoid volatile write
@tailrec final def x = _x$generated match {
case cur: A => cur // Fast path—there's already a value
case null =>
if (CMPXCHG(null, Evaluating)) {
val xResult = result // What are the desired semantics if `result` throws?
XCHG(xResult) match {
case s: Semaphore => // Semaphore would be a type which can never be `A`
s.release()
xResult
case _ =>
xResult // If there are no waiters, we can simply return the result
}
} else x // If we fail the CAS then there was a race for starting Evaluation, so retry `x`
case Evaluating =>
CAS(Evaluating, Semaphore(1)) // Only allocates in the case of initialization clashes
x // retry `x`, it might already have a value
case s: Semaphore =>
blocking { semaphore.awaitRelease() } // This would allow any current ExecutionContext to provide ManagedBlocking in the case where the initializer takes time, to try to prevent deadlocks.
_x$generated
} In essence this means that initialization is best-case volatile read + LOCK cmpxchg + LOCK xchg |
@viktorklang Yes, that seems to work as well and looks more streamlined to me. How do I access CMPXCHG and XCHG? My knowledge only extends to the old sun.misc.Unsafe which had compare-and-set but not compare-and-swap. |
@odersky cmpxchg is compareAndSwapObject, and xchg is getAndSetObject |
@odersky And, perhaps needless to say, |
@viktorklang Thanks! I did not find |
@odersky I do not know of any "official" javadocs for Unsafe. Here's what's available in jdk8: https://github.com/AdoptOpenJDK/openjdk-jdk/blob/jdk8-b120/jdk/src/share/classes/sun/misc/Unsafe.java#L1104 |
Synthesizing the state of discussion so far, and also handling exceptions:
|
@viktorklang Thanks for the link! Browsing some more it seems that XCHG is usually implemented with CAS in a loop. So we can just use CAS instead. |
Here's a slightly shorter version that uses a try-finally:
(actually, the generated bytecode is almost the same in both cases since a try-finally duplicates the finally block). |
aa83d6c
to
eb5dbca
Compare
|
||
class LazyControl | ||
|
||
class Waiting extends LazyControl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is equivalent to a CountDownLatch initialised with a count
of 1. release()
is now countDown()
and awaitRelease()
is await()
. It may be worth benchmarking both
def initialize(base: Object, offset: Long, result: Object): Unit = | ||
if (!unsafe.compareAndSwapObject(base, offset, Evaluating, result)) { | ||
val lock = unsafe.getObject(base, offset).asInstanceOf[Waiting] | ||
unsafe.compareAndSwapObject(base, offset, lock, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a CAS here:
unsafe.putObjectVolatile(base, offset, result)
x$lzy | ||
} | ||
|
||
def x$lzy: String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently Tailrec
runs before LazyVals
. We need to write the loop manually:
def x$lzy: String = {
while (<EmptyTree>) {
val current = _x
if (current.isInstanceOf[String])
return current.asInstanceOf[String]
else {
val offset = C.x_offset
if (current == null) {
if (LazyRuntime.isUnitialized(this, offset)) {
try {
val result = init("x")
LazyRuntime.initialize(this, offset, result)
return result
catch {
case ex: Throwable =>
LazyRuntime.initialize(this, offset, null)
throw ex
}
}
}
else
LazyRuntime.awaitInitialized(this, offset, current)
}
}
}
Note the while(<EmptyTree>)
. We special case it in the compiler. It means infinite loop and types to Nothing
: https://github.com/lampepfl/dotty/blob/963719ed2679f3d4c8188ec068e53941b181ef76/compiler/src/dotty/tools/dotc/typer/TypeAssigner.scala#L529-L530
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. That's trap one could fall into easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to rethrow the original exception in case the second initialize fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the second initialize can fail. There's no user code exectuted for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@odersky Good point. I guess in the cases it fails it is something like a SoE or similar VME.
There is a line in JLS that says:
I am pasting a part of my reply to Martin that he did not paste above, and which is about (my interpretation of) JMM description in JLS: A happens-before relationship between an action A1 and an action A2 exists iff
If two actions are in a happens-before relationship, then their memory effects are visible As far as I understand, the intent in the first solution is that, since the execution of |
I think you do not need to worry about the cost of a volatile write vs a CAS. I'd suggest just using CAS everywhere for simplicity - in the uncontended case, it's just an L1-cache access. CAS will usually be intrinsified to a native instruction by most VMs and on most architectures. Other instructions such as
Agreed. In the contended case, in which a |
So it might be worth it to use getAndSetObject then ? I guess we can't really say without trying it. |
It would be interesting to try out. |
We can use Here's a somewhat relevant sample test: https://hg.openjdk.java.net/code-tools/jcstress/file/96d2a23d4b7b/jcstress-samples/src/main/java/org/openjdk/jcstress/samples/JMMSample_04_PartialOrder.java I tried (but failed) to do this for the safe publication of |
One unrelated thought that was in the back of my head - wouldn't it be reasonable that a lazy value whose |
That's actually what happens. If rhs throws an exception the value is reinitialized back to |
@smarter Sorry to necro a comment from two years ago, but was crawling through Dotty Project board, found this on - if (!done) wait()
+ while (!done) wait() I read that article on "Spurious Wakeups", but I'm not certain I get what difference the If you read this and find the time to respond, would be thankful if you might explain the scenario that breaks the logic here 🙂 class Waiting:
private var done = false
private var result: AnyRef = _
def release(result: AnyRef): Unit = synchronized:
this.result = result
done = true
notifyAll()
def awaitRelease(): AnyRef = synchronized:
if !done then wait()
result |
@GavinRay97 If I am not mistaken, a thread that called |
This is a demonstrator for a new algorithm to handle lazy vals. The idea is that
we use the field slot itself for all synchronization; there are no separate bitmaps
or locks. The type of a field is always Object. The field goes through the following
state changes:
The states of a field are characterized as follows:
Note 1: This assumes that fields cannot have
null
as normal value. Once we havenullability checking, this should be the standard case. We can still accommodate
fields that can be null by representing
null
with a special value (sayNULL
)and storing
NULL
instead ofnull
in the field. The necessary tweaks are addedas comment lines to the code below.
A lazy val
x: A = rhs
is compiled to the following code scheme:The code makes use of the following runtime class:
Note 2: The code assumes that the getter result type
A
is disjoint from the typeof
Evaluating
and theWaiting
class. If this is not the case (e.g.A
is AnyRef),then the conditions in the match have to be re-ordered so that case
_x: A
becomesthe final default case.
Cost analysis:
whether cache has updated
Code sizes for getter:
this scheme, if nulls are excluded in type: 72 bytes
current Dotty scheme: 131 bytes
Scala 2 scheme: 39 bytes + 1 exception handler
Advantages of the scheme:
and normal code
Disadvantages: