New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance handling of large eventlogs #29

Open
ghc-mirror opened this Issue May 16, 2014 · 5 comments

Comments

Projects
None yet
3 participants
@ghc-mirror
Copy link

ghc-mirror commented May 16, 2014

Original reporter: jan.stolarek@

When I enable detailed spark logging via -lf flag I end up with huge eventlog files (130MB). Attempting to load these into ThreadScope practically kills my OS - memory runs out, swapping begins and I am forced to kill TS (which takes some time before the OS actually responds and kills the process). This makes -lf flag useless for my program and I think this might not be uncommon situation. It would be good if TS supported some sort of lazy loading of big eventlogs, so users could at least view parts of the log.

@ghc-mirror ghc-mirror self-assigned this May 16, 2014

@ghost

This comment has been minimized.

Copy link

ghost commented Jan 5, 2016

Even with -l flag I get 500 MB eventlog and threadscope eats 16 GB of RAM. Please provide either some sort of granularity control before loading the file or live streaming.

@ghost ghost unassigned ghc-mirror Jan 5, 2016

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 25, 2018

This is still a problem. Loading a 1G eventlog file is impossible even with 32G RAM. I think we need two things:

  • Externally sort GHC-generated .eventlog files. Currently for sorting events ThreadScope uses ghc-events's sortEvents, which requires all events to be in memory and uses Data.List to sort. See haskell/ghc-events#32 for the tracking issue for this.

  • Implementing an abstraction over Array Int Event that doesn't require loading the whole file into memory. As far as I can see this array is used in two places

    • hecEventArray which uses it to implement
      - eventIndexToTimestamp :: HECs -> Int -> Timestamp
      - timestampToEventIndex :: HECs -> Timestamp -> Int

    • EventsView which uses a range of it to show the "Raw events" tab

    So it seems to me that we need to support three operations:

    1. Get nth event
    2. Get events in the given range (can be implemented using (i))
    3. Get index of the event at given timestamp (this currently does binary search)

One idea comes to mind is to use something like SQLite which makes these operations almost trivial.

One thing that may be a problem is when scrolling the "Raw events" tab because of querying filesystem-backed event database (SQLite or not), so we may have to implement lazy rendering of "Raw events" (as far as I can see it doesn't support this currently, drawEvents blocks the thread until all events in the range are drawn).

Any other ideas?

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 25, 2018

I started working on a fix. I currently have an external sort library and another library for filesystem-backed, cached arrays. I'll report in a few days probably.

@osa1

This comment has been minimized.

Copy link

osa1 commented Aug 29, 2018

Currently blocked on haskell/ghc-events#42.

@maoe

This comment has been minimized.

Copy link
Member

maoe commented Sep 1, 2018

We may need to fix haskell/ghc-events#14 as well since it causes ghc-events to crash when reading back serialized events for eventlogs that contain deprecated events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment