Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundling containers for a kernel session #78

Closed
5 of 8 tasks
achimnol opened this issue Apr 7, 2017 · 2 comments
Closed
5 of 8 tasks

Bundling containers for a kernel session #78

achimnol opened this issue Apr 7, 2017 · 2 comments
Assignees
Labels
comp:agent Related to Agent component comp:manager Related to Manager component type:feature Add new features
Milestone

Comments

@achimnol
Copy link
Member

achimnol commented Apr 7, 2017

For large-scale computations, sometimes we need to run multiple containers on different hosts for resource aggregation and distributed/parallel processing.

In the past, this was very difficult to implement because Docker's networking was limited to mounting a container to another via an hostname alias (--link), which is essentially one-to-one private links. Now, it's 2017, and Docker offers a nice distributed coordination called "Swarm" which includes overlay networking.

Docker Swarm uses the Raft algorithm to share node information and any new Docker daemon can join to an existing Swarm via host:port and a secret token. Once joined, any containers of the daemons in the swarm can be connected to volatile overlay networks created and destroyed at runtime.

Let's try this and support multi-container distributed computing!

Update for 2020!

Docker Swarm has problems with overlapped IP addresses in different overlay networks and creating/destroying and attaching/detaching networks is proven to be unstable.

After some testing by @kyujin-cho , we decided to fall back to the "classic" Swarm mode, which uses an external etcd to manage multi-host networks, and use namespaced container hostname aliasing to access other containers in the same overlay network.

Basically we keep the same "kernels" table extension as we prototyped in 2017-2018. A single record of the kernels table correspond to a container and multiple records may share the same sess_id indicating that they belong to a overlay cluster.

Phase 1

  • Automatic reconfiguration of existing Docker daemons in agents so that they use our etcd.
  • Add a simple integration test script that manually spawns two containers in different hosts to check if the overlay network actually works. Let's place the script into the scripts directory of this meta repository.

Phase 2

  • Design and implement a template format for a compute session. → Task template #70
    • A task template contains the target image, resource occupation, environment variables, default vfolder mounts, bootstrap script, etc.
  • Design and implement a template format for a cluster composed of session templates.
    • The cluster template may declare multiple "roles" of containers and the min/max numbers of containers for each role.
    • Each container should have special environment variables so that they can detect the role and index. (e.g., BACKEND_CLUSTER_ROLE, BACKEND_CLUSTER_ROLE_IDX)
  • (Re-)implement lifecycle mgmt. of multiple containers for a single session
    • First, use a plain for-loop to trigger creation/destruction of each kernel.
    • Set up custom hostnames via Docker daemon/API so that all containers in the same overlay cluster can access each other.
      • e.g., Spawning 2 "master" and 12 "worker" nodes: "master1", "master2", "worker1", "worker2", ..., "worker12"
      • Don't pad zeros for lexicographical ordering beacuse it causes extra complexity in automated scripts for images and task templates.
    • We need to extend the semantics of kernel lifecycle operations:
      • All lifecycle operations against a session must wait until the same operations applied to all kernels of the same session to complete.
      • Restarting and terminating any of the kernels triggers the same operation of the whole overlay cluster.
    • The kernel creation options like environment variables, vfolder mounts must be applied to all kernels of a session and override those defined by task templates which also override those defined in the images.
  • Extend the GUI to have "collapsable" kernel list so that kernels for a single session are grouped and folded together.
    • The primary kernel of a session (which is exposed in the main session list) is the first indexed kernel of the first role as in the defined order.
    • Each kernel with different roles may define their own service ports and those service ports are transparently supported.

Phase 3

  • Keep consistency when the manager is interrupted (e.g., restarted) during multi-container spawning and destruction (i.e., partially provisioned).
    • We may need to add a column specifying "desired state" so that we could resume and continue the cluster provisioning jobs upon manager restarts.
  • Optimize the provisioning using asyncio.gather() with proper interruption handling.

┆Issue is synchronized with this Asana task by Unito

@achimnol achimnol self-assigned this Apr 7, 2017
@achimnol achimnol changed the title Bundled containers for a kernel session Bundling containers for a kernel session Apr 27, 2017
achimnol referenced this issue in lablup/backend.ai-manager Aug 23, 2017
 * Now manager + gateway runs on multiple CPU cores with sane
   transaction semantics.  Thanks to aiotools!

   - It no longer depends on Redis as a pub-sub broker nor a database.
     All communications are done via ZeroMQ with no centralized queue
     server.

   - Redis is used only for per-keypair rate-limiting.

 * Now the manager searches available agent to spawn new containers
   based on available memory / CPU / GPU capacity units.
   No more hard-coded instance types!

 * The db schema is now prepared for multi-container kernel sessions.

   - User-facing APIs now use "session ID" which is directed to the
     master container of the given session.

   - Each container has unique "kernel ID" and managed individually.

 * Replace asyncpg + asyncpgsa with aiopg for better SQLAlchemy
   supports (especially custom type decorators).

 * TODOs

   - Stabilize accounting of used/available resource units.

   - Still some parts are confused of session/kernel IDs...
achimnol referenced this issue in lablup/backend.ai-manager Aug 23, 2017
 * Refactor "app.dbpool" to use the recommended custom context format:
   app['dbpool']
achimnol referenced this issue in lablup/backend.ai-manager Aug 29, 2017
achimnol referenced this issue in lablup/backend.ai-manager Jan 4, 2018
There were many places that missed appropriate filtering conditions when fetching active sessions.
This bug has been the major source of concurrency tracking errors.
@achimnol achimnol transferred this issue from lablup/backend.ai-manager Dec 10, 2019
@achimnol achimnol added this to the 20.03 milestone Jan 20, 2020
@achimnol achimnol added comp:manager Related to Manager component comp:agent Related to Agent component labels Feb 6, 2020
@achimnol achimnol modified the milestones: 20.03, 20.09 Aug 10, 2020
@achimnol
Copy link
Member Author

Now the server-side implementation is done with the v20.09.0 release, with the following work:

@achimnol
Copy link
Member Author

Closing as completed and let's handle fine-grained UI improvements in separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:agent Related to Agent component comp:manager Related to Manager component type:feature Add new features
Projects
None yet
Development

No branches or pull requests

2 participants