Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor task scheduler #95

Merged
merged 27 commits into from
May 10, 2023
Merged

Refactor task scheduler #95

merged 27 commits into from
May 10, 2023

Conversation

robomics
Copy link
Contributor

@robomics robomics commented May 4, 2023

Several improvements to the task scheduler:

  • More robust handling of large object allocation/deallocation (e.g. concact matrices are allocated and deallocated once in a thread-safe manner using std::call_once)
  • Make scheduler more general, so that tasks can refer to genomic intervals instead of entire chromosomes
  • Refactor task scheduling logic into a ContextManager class
  • Refactor logging of model internal state. Logs are now compressed with ZSTD instead of gzip
  • Rework PRNG seeding one more time. Each genomic interval/chromosome is hashed together with the seed provided through the CLI. PRNGs for tasks referring to the same interval are initialized by the main thread by repeatedly calling jump().
  • Rework contact sampling logic to avoid deadlocks when simulating loop extrusion on small regions using a small number of LEFs (e.g. 1 LEF on a 100kbp interval)

This PR changes MoDLE's output, so the test datasets have also been updated.

Several improvements to the task scheduler:
- More robust handling of large object allocation/deallocation (e.g.
  concact matrices are allocated and deallocated once in a thread-safe
  manner using std::call_once)
- Make scheduler more general, so that tasks can refer to genomic
  intervals instead of entire chromosomes
Rework contact sampling to deal with very small simulations (e.g. those
involving a single LEF).

With a single or few LEFs, it is possible (and even likely), that none
of the LEFs is within bound when sampling interactions. In this case,
the contact sampling loop would run indefinitely.
Changes from this commit addresst this issue.

IMPORTANT: changes from this commit changes MoDLE's output!
Instead of seeding PRNGs by hashing tasks, seed the PRNG once with the
default/user-provided seed, then call the PRNG::jump(), which is
equivalent to calling next() 2**64 times, and can be used to generate
2**64 non-overlapping sub-sequences of pseudo-random numbers.
Move most of the scheduling logic out of Simulation and into
ContextManager.
ContextManager now owns worker and IO threads, as well the task queues
and exception ptrs.
There are two task queues:
1. A queue for pending tasks
2. A queue for finished tasks

The main thread pushes tasks onto queue #1.
Worker threads pop tasks from queue #1 to then setup and run a
simulation instance.
Upon completion of a simulation instance, worker threads push the
corresponding task onto queue #2.
IO threads (for now only the thread responsible for writing interactions
to a .cool file), consume tasks from queue #2 and perform IO operations
when appropriate.

The main thread must signal that no more tasks will be submitted to
queue #1 by calling the shutdown() method.
This method blocks and waits until all tasks have been processed and
worker/IO threads have returned.

The ContextManager is also responsible of keeping track of exceptions
thrown inside worker/IO threads.
ContextManager operator bool or exception_thrown() can be used to check
whether any exception has been raised.
All threads should regularly call one of these methods, and return
immediately in case exceptions have been raised.
The main thread should check if any exceptions have been raised by
calling ContextManages::check_exceptions(), which internally calls
shutdown() and rethrows the exceptions as appropriate.
robomics added 14 commits May 6, 2023 16:03
Simulation was derived from Config to save some typing while iterating
rapidly throughout different implementations in the early days.
Those days are long gone, and so this hack is no longer required.

Going forward, params from Config should be accessed by calling
Simulation::c().
- Rename write_contacts_to_disk to simulate_io
- Update simulate_io to write 1D LEF occupancy profiles as soon as all
  simulation instances for a given genomic interval are finished
@robomics robomics force-pushed the refactor-scheduler branch 2 times, most recently from f874948 to d6250d7 Compare May 8, 2023 20:02
@robomics robomics merged commit 94ab614 into main May 10, 2023
@robomics robomics deleted the refactor-scheduler branch May 10, 2023 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant