Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The time-step elision optimization is broken #681

Open
cdsmith opened this issue Jul 1, 2018 · 12 comments
Open

The time-step elision optimization is broken #681

cdsmith opened this issue Jul 1, 2018 · 12 comments
Labels

Comments

@cdsmith
Copy link
Collaborator

cdsmith commented Jul 1, 2018

We broke time-step ellision for interactions somewhere. See https://code.world/haskell#PPUTRWUDaX3tF9KMQYGTS-w

@cdsmith cdsmith added the bug label Jul 1, 2018
@cdsmith cdsmith changed the title The time-step ellision optimization is broken The time-step elision optimization is broken Jul 5, 2018
cdsmith added a commit that referenced this issue Jul 15, 2018
@cdsmith
Copy link
Collaborator Author

cdsmith commented Jul 16, 2018

Strangely, the current situation is that the constant state compares equal, but isUniversallyConstant returns false, so we still sit in the loop trying new values for dt. I'm not really sure how that's possible.

@cdsmith
Copy link
Collaborator Author

cdsmith commented Aug 12, 2018

Might be a missing $! at

oldName <- makeStableName old
, I suppose. It's after 3am, though, so I'll set up the logging to test this theory tomorrow.

cdsmith added a commit that referenced this issue Aug 12, 2018
Was hoping this might help with #681, but alas, no change.
@cdsmith
Copy link
Collaborator Author

cdsmith commented Aug 13, 2018

I admit defeat for now. I added a debug function that printed if any function changed stable names of its result; but adding the debug function in the wrong places makes the problem go away. Perhaps the debugging is preventing a rewrite rule from firing, or something like that? Who knows. But this is some seriously spooky stuff.

cdsmith added a commit that referenced this issue Aug 14, 2018
This still doesn't fix the issue, but eliminates a few bugs regarding #681
@nixorn
Copy link
Contributor

nixorn commented Mar 22, 2019

What is time-step elision ?
Can you point to commit where all worked, so I could see difference?

@cdsmith
Copy link
Collaborator Author

cdsmith commented Mar 26, 2019

I answered in our email thread, but here's the details for the public record. Suppose I write this code:

program = activityOf(initial, change, solidCircle)
initial(rs) = 1
change(n, KeyPress("Up")) = n + 1
change(n, other) = n

In theory, change should be called every frame with a TimePassing event. However, in practice, we can tell that the TimePassing event will be ignored in this case. We can make that observation because change(n, TimePassing(undefined)) is always identical to n. When that happens, we'd rather not spin in a busy loop delivering TimePassing events that we know will be ignored. Time step ellision is the word for this: when TimePassing doesn't change anything regardless of the argument, we just stop sending TimePassing events at all.

The second complication is this: we can't in general compare two states, because they might contain functions or infinite data structures. So instead of comparing with ==, we instead compare for pointer equality. If two values compare equal for pointer equality, this means that they are literally represented by the same allocated object in memory. This is sufficient to prove that they are equal, but it's a conservative approximation. Two values might be equal, but NOT compare equal for pointer equality (if they are represented by different locations in memory, and those locations just have the same value).

There are two ways to compare for pointer euqality: reallyUnsafePtrEquality# is the older one, but it requires keeping the old value around to compare with, and that could increase memory usage. The other is StableName. We can make a "stable name" for a value, and any other value with the same stable name must be identical to it. The advantage is that we can keep around old stable names without actually retaining the full value, which can be important if the state type is large. There's a class of streaming algorithms that rely on garbage collection to be O(1) space, while holding onto a pointer to the original value would turn them into O(n) space. For example, suppose you have an infinite lazy list like those produced by randomeNumbers in codeworld-base. Dropping the first n elements is O(1) space if you don't keep the original, but O(n) if you do.

So what's going on is that we're failing to detect when time-step ellision is possible. Most likely, that means that I've made a mistake with the result that the two state values are equal, but no longer share the same memory location. A common way that happens is that I am applying some modification to a field of a record, and then reconstructing the record with the new value. For example:

data WrappedState a = WrappedState {
    wrappedValue :: a,
    someOtherData :: Int
    }
    deriving (Functor)

Now if I do something like fmap id wrappedState, the identity function will preserve memory sharing for the wrappedValue field, but fmap will construct a NEW WrappedState around it. So the wrapper has lost sharing even though the value inside of it hasn't. That breaks time-step ellision.

This is almost surely the kind of thing that's going on. But it's a tricky one to track down. I have tried and failed.

@cdsmith
Copy link
Collaborator Author

cdsmith commented Mar 26, 2019

@nixorn A good first step to fixing this is probably to reproduce it in a test. To do that, you'd want to refactor codeworld-api:CodeWorld.Driver to that it's possible to run the game loop with mock browser code, that doesn't actually draw any pictures. You could run it that way in a unit test, and make assertions about sharing or number of frames drawn. Given how fragile this is, that test would be a good thing to have around!

@nixorn
Copy link
Contributor

nixorn commented Mar 27, 2019

Ok, doing.

@nixorn
Copy link
Contributor

nixorn commented Mar 27, 2019

  1. How can I write program which does not draw picture if ghc-land run requires (s -> Picture) as one of arguments? I think drawing Blank is not what I need?
  2. I launched this https://code.world/haskell#PPUTRWUDaX3tF9KMQYGTS-w in local ghci and got canvas binded to http://127.0.0.1:3000/ . After I opened this in browser it started to print "foo"s into console. Is this means that we need browser to evaluate this? If it is, how can I write ghc-land tests for it?
    (Here I realized that your offer to refactor Driver.hs is more likely about moving game loop to GHC, so I guess 1 and 2 is irrelevant)
  3. is ghc consistent with ghcjs? I mean can any thing which can happen in generated js - be reproduced in ghc?

@nixorn
Copy link
Contributor

nixorn commented Mar 27, 2019

Looks I need to separate applying of fullStepHandler and drawHandler to separate functions here https://github.com/google/codeworld/blob/master/codeworld-api/src/CodeWorld/Driver.hs#L1787

@cdsmith
Copy link
Collaborator Author

cdsmith commented Mar 27, 2019

I may have been unclear. Here are a few clarifications.

First of all, I care about this behaving correctly in GHCJS, not GHC. That's because (a) 99% of users of codeworld-api are using it through the web site, and (b) the main use case for time-step ellision was to make it more efficient to embed multiple CodeWorld programs into a web page. So the test should be running locally with GHCJS and node.js. Don't test the GHC version. In fact, if I did care about time-step ellision for the GHC code path (and I do care at least a little bit; it's just not a priority), I think the fix would be to work on sharing even more code between the two paths.

To make the GHCJS code run in node.js instead of a browser, you'll need to replace a few parts of the behavior. You want to do this at the lowest level possible. A good start would be to write a new instance for MonadCanvas (https://github.com/google/codeworld/blob/master/codeworld-api/src/CodeWorld/CanvasM.hs). Your new instance would essentially do nothing. Something like this:

newtype StubCanvasM a = StubCanvasM { runStubCanvas :: IO a }
    deriving (Functor, Applicative, Monad)

instance MonadCanvas where
    type Image StubCanvasM = ()
    save = StubCanvasM $ return ()
    ...
    newImage _ _ x = fmap ((),)
    ...
    measureText _ = StubCanvasM $ return 50

and so on.

That won't remove all the browser dependencies, but it will get you started. The remaining browser dependencies can be handled either by moving them to MonadCanvas if it makes sense, or by adding a new #ifdef in CodeWorld.Driver based on whether you're in a unit test or not. (You will have to do this anyway to decide which MonadCanvas implementation to delegate to.)

cdsmith added a commit that referenced this issue Apr 2, 2019
allows much more code to be shared between the various code paths.

My goal here is to help with creating a unit test for #681, which
are currently blocked because the driver code is not easily
testable.  With a bit more progress here, we can stub out the
browser-specific canvas bits (or even mock them, using monad-mock)
and test the surrounding driver logic.
@cdsmith cdsmith reopened this Apr 3, 2019
cdsmith added a commit that referenced this issue May 12, 2019
This doesn't fix the problem, but it does fix the specific causes
in the Wrapped type.

I don't even really understand why this works.  There's some real
oddness going on here with strictness.  Making all of the
non-wrapped fields strict fixes the bug, but it should not be
required.  I wish things made more sense.
@misterbeebee
Copy link
Contributor

I've noticed high CPU usage (40%+ CPU on a 4-core 2.8 GHz Intel Core i7) in general , even in the trivial program

import CodeWorld
main = activityOf () (const id) (const blank)

and even when run under debugActivityOf and the program is paused in the CodeWorld onscreen heads-up-display debugger (but not when CodeWorld Inspector is open).

Is this a side-effect of "The time-step elision optimization is broken #681", or a separate problem?
This CPU usage happens even when animation is paused in the CodeWorld debugger, but not when the CodeWorld Inspector is open. If the CodeWorld debugger runs part of the overall animation itself, then it stands to reason that even a paused program animation would still be an active animation in CodeWorld that relies on time-step elision for CPU efficiency.

(Poking at the Chrome JS console, it appears that what's happening is a busy loop in requestAnimationFrame calling into (I presume) a GCHJS-compiled h$mainLoop function. I'm not sure, because I'm manually pausing/running the JS in Chrome DevTools, which might have bad interactions with real-clock-based timers in CodeWorld runtime)

If this is independent of time-step elision, I can file a separate issue if you want to track this.
(Perhaps one of the "main loops" is running as fast as possible, even though TimePassing events only happen at at 60fps. Also, related, if slowing down the overall system to 30fps or 15fps would cut CPU usage by a substantial fraction, providing that option would be friendly to client machines. But the main issue of this comment is that even when 0 animation work is happening, the CW runtime is doing a far too much work.)

@cdsmith
Copy link
Collaborator Author

cdsmith commented Aug 5, 2019

Yes, this is the same thing. The system is redrawing the screen at 60fps even though nothing is changing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants