New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getEvent can't deserialize event serialized by putEvent #41

Open
osa1 opened this Issue Aug 28, 2018 · 7 comments

Comments

Projects
None yet
2 participants
@osa1
Copy link

osa1 commented Aug 28, 2018

ghc-events currently fails to deserialize some events that it serializes. To reproduce, first generate a minimal eventlog file from the program main = return () (tried GHC 8.4), then run

import Data.Binary.Get
import Data.Binary.Put
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LBS
import GHC.RTS.Events
import GHC.RTS.Events.Binary
import GHC.RTS.Events.Incremental

main = do
    bs <- LBS.readFile "empty.eventlog"
    let header = runGet getHeader bs
    let parsers = mkEventParsers header

    let event = Event
          { evTime = 415731793
          , evSpec = GCStatsGHC
              { heapCapset = 0
              , gen = 1
              , copied = 305104
              , slop = 250088
              , frag = 3088384
              , parNThreads = 12
              , parMaxCopied = 4246
              , parTotCopied = 305104
              }
          , evCap = Nothing
          }

    print (runGet (getEvent parsers) (runPut (putEvent event)))

Output:

bug: Data.Binary.Get.runGet at position 60: not enough bytes
CallStack (from HasCallStack):
  error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.5.1:Data.Binary.Get
@maoe

This comment has been minimized.

Copy link
Member

maoe commented Aug 28, 2018

What does mkEventParsers do?

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 29, 2018

Given a header it returns event parsers (mkEventParsers :: Header -> EventParsers). See the definition here: https://github.com/osa1/ghc-events/blob/5090f824fa709415336248342674fdd4862695da/src/GHC/RTS/Events/Incremental.hs#L161 basically just the event parser builder from mkEventDecoder factored out for reuse.

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 29, 2018

I added a bunch of debug prints and it seems like the parser for EVENT_GC_STATS_GHC event can actually parse this and the parsed evSpec is identical to the original one, but somehow getEvent needs one more word before returning. If I add one more word to the bytestring

let bs = runPut (putEvent event)
print (runGet (getEvent parsers) (bs <> "00000000"))

it works...

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 29, 2018

OK so this is because of padParser which for some reason makes this parser expect one more word after the actual contents. Investigating more ...

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 29, 2018

So according to the event type parser event type of EVENT_GC_STATS_GHC is

EventType {num = 53, desc = "GC statistics", size = Just 58}

this says the size is 58 but actually it's declared as 50 in GHC.RTS.Events.Binary and putEvent also puts 50 bytes for the event spec (60 when you add the timestamp and event type).

So assuming event type parser is not buggy

  • ghc-events thinks size of EVENT_GC_STATS_GHC is 50
  • GHC thinks its size is 58

Not sure why there's a mismatch. Perhaps there's one more field to EVENT_GC_STATS that we currently don't parse?

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 29, 2018

Right, so GHC generates one more word for EVENT_GC_STATS_GHC which is for "par_balanced_copied": https://github.com/ghc/ghc/blob/65eec9cfd4410c0e30b0ed06116c15f8ce3de49d/rts/eventlog/EventLog.c#L413-L424

Not sure how to deal with this ...

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 29, 2018

Quoting from the diff that added this new field: https://phabricator.haskell.org/D3658

Add the new value at the end, and we'll gradually deprecate par_max_copied. Old clients that expect the old format will understand everything except the new field, and will mostly work, except that they might miscalculate work balance.

So they relied on the fact that ghc-events (or other GHC event parsers) can deal with events larger than the parser for the event expects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment