biosal is a distributed BIOlogical Sequence Actor Library.
biosal applications are written in the form of actors which send each other messages, change their state, spawn other actors and eventually die of old age. These actors run on top of biosal runtime system called "The Thorium Engine" using the API. Thorium is a distributed actor machine. Thorium uses script-wise symmetric actor placement and is targeted for high-performance computing. Thorium is a general purpose actor model implementation.
Name | Description |
---|---|
argonnite | k-mer counter |
gc | Guanine-cytosine counter |
spate | Exact, convenient, and scalable metagenome assembly and genome isolation for everyone (with emphasis on low-abundance species too) |
Key | Value |
---|---|
Name | biosal |
Computation model | actor model (Hewitt & Baker 1977) |
Programming language | C99 (ISO/IEC 9899:1999) |
Message passing | MPI 2.2 |
Threads | Pthreads |
License | The BSD 2-Clause License |
git clone https://github.com/GeneAssembly/biosal.git
cd biosal
make tests # run tests
make examples # run examples
- see compilation options
- see examples and tests
Branch name | Person (alphabetical order) | Clone URL | Branch Build Status |
---|---|---|---|
master | Sébastien Boisvert | HTTPS | |
energy | Sébastien Boisvert | HTTPS | |
pami | Huy Bui | HTTPS | |
granularity | George K. Thiruvathukal | HTTPS | |
entropy | Fangfang Xia | HTTPS |
- There is also a read-only mirror.
- The Integration Manager Workflow is used.
- Nightly build of master branch:
- Issues: https://github.com/GeneAssembly/biosal/issues?state=open
- Website: https://github.com/GeneAssembly/biosal
- Mailing list: biosal@lists.cels.anl.gov https://lists.cels.anl.gov/mailman/listinfo/biosal
- Jenkins Continuous Integration: http://jenkins.biosal.anl.boisvert.info:8080/ (Hosted in Magellan OpenStack cloud at Argonne)
- Alternate address: http://jenkins-biosal.mcs.anl.gov:8080/ (does not seems to work everywhere)
The actor model has actors and messages, mostly.
When an actor receives a message, it can (Agha 1986, p. 12, 2.1.3):
- send a finite number of messages to other actors (thorium_actor_send);
- create a finite number of new actors (thorium_actor_spawn, ACTION_SPAWN);
- designate the behavior to be used for the next message it receives (thorium_actor_concrete_actor, ACTION_STOP).
Other names for the actor model: actors, virtual processors, activation frames, streams (Hewitt, Bishop, Steiger 1973).
Also, in the actor model, the arrival order of messages is both arbitrary and unknown (Agha 1986, p. 22, 2.4).
One of the most important requirements of actors is that of acquaintances. An actor can only send message to one of its acquaintances. Acquaintance vectors were introduced in (Hewitt and Baker 1977, p. 7, section III.3). Any actor using an acquaintance vector is migratable by an actor machine. An actor machine can distribute and balance actors according to some arbitrary rules wwhen all the actors in an actor system use acquaintance vectors.
- Why say Actor Model instead of message passing?
- Actors: a model of concurrent computation in distributed systems
- A universal modular ACTOR for malism for artificial intelligence
- Actor model
Key concepts
Concept | Description | Structure |
---|---|---|
Message | Information with a source and a destination | struct thorium_message |
Actor | Something that receives messages and behaves according to a script | struct thorium_actor |
Actor mailbox | Messages for an actor are buffered there | struct core_fast_ring |
Script | Describes the behavior of an actor (Hewitt, Bishop, Steiger 1973) | struct thorium_script |
Node | A runtime system that can be connected to other nodes (see Erlang's definition) | struct thorium_node |
Worker pool | A set of available workers inside a node | struct thorium_worker_pool |
Worker | A instance that has a actor scheduling queue | struct thorium_worker |
Scheduler | Each worker has a actor scheduling queue and an outbound message queue | struct thorium_scheduler |
Transport | Each node has a transport subsystem for moving messages between nodes | struct thorium_transport |
The code has to be formulated in term of actors. Actors are executes inside a controlled environment managed by the Thorium engine, which is a runtime system. An actor has a name, and does something when it receives a message. A message is however first received by a node. The node gives it to a worker_pool. The worker pool assigns the actor to a worker. Finally, worker eventually calls the corresponding receive function using the actor and message presented.
When an actor sends a message, the destination is either an actor on the same node or an actor on another node. The runtime sends messages for actors on other nodes with MPI. Otherwise, the message is prepared and given to a actor directly on the same node.
Each biosal node is managed by an instance of the Thorium engine. These Thorium instances speak to each other. The number of nodes is set by mpiexec -n @number_of_thorium_nodes ./a.out. The number of threads on each node is set with -threads-per-node.
The following command starts 256 nodes (there is 1 MPI rank per node) and 64 threads per node for a total of 256 * 64 = 16384 threads.
mpiexec -n 256 ./a.out -threads-per-node 64
Because the whole thing is event-driven by inbound and outbound messages, a single node can run much more actors than the number of threads it has.
The runtime also supports asymmetric numbers of threads, but most platforms have identical compute nodes (examples: IBM Blue Gene/Q, Cray XC30) so this is not very useful. Example:
# launch 32 nodes with 32 threads, 244 threads, 32 threads, 244 threads, and so on
mpiexec -n 32 ./a.out -threads-per-node 32,244
- name := biosal
- bio := Biological or Biology
- s := Sequence or Scalable or Salable or Salubrious or Satisfactory
- a := Analysis or Actor or Actors
- l := Library or Lift
Alternative name: BIOlogy Scalable Actor Library
- Cray XE6
- IBM Blue Gene/Q
- XPS 13 with Fedora
- Xeon E7-4830 with CentOS 6.5
- Apple Mavericks
see CREDITS.md