diff --git a/docs/docs/references/RELEASE.md b/docs/docs/references/RELEASE.md index e3343dd18..4bf6a0ac3 100644 --- a/docs/docs/references/RELEASE.md +++ b/docs/docs/references/RELEASE.md @@ -1312,7 +1312,7 @@ Explicit user facing changes: - `qhub deploy -c qhub-config.yaml` no longer prompts unsupported argument for `load_config_file`. - Minor changes on the Step-by-Step walkthrough on the docs. -- Revamp of README.md to make it concise and highlight QHub HPC. +- Revamp of README.md to make it concise and highlight Nebari Slurm. ### Breaking changes diff --git a/docs/nebari-slurm/_static/images/architecture.png b/docs/nebari-slurm/_static/images/architecture.png new file mode 100644 index 000000000..d6cdf169f Binary files /dev/null and b/docs/nebari-slurm/_static/images/architecture.png differ diff --git a/docs/nebari-slurm/_static/images/qhub-dashboards-bug.png b/docs/nebari-slurm/_static/images/qhub-dashboards-bug.png new file mode 100644 index 000000000..0d6fb3a46 Binary files /dev/null and b/docs/nebari-slurm/_static/images/qhub-dashboards-bug.png differ diff --git a/docs/nebari-slurm/_static/images/qhub-dask-gateway-cluster.png b/docs/nebari-slurm/_static/images/qhub-dask-gateway-cluster.png new file mode 100644 index 000000000..a2201edaf Binary files /dev/null and b/docs/nebari-slurm/_static/images/qhub-dask-gateway-cluster.png differ diff --git a/docs/nebari-slurm/_static/images/qhub-dask-gateway.png b/docs/nebari-slurm/_static/images/qhub-dask-gateway.png new file mode 100644 index 000000000..22c23853c Binary files /dev/null and b/docs/nebari-slurm/_static/images/qhub-dask-gateway.png differ diff --git a/docs/nebari-slurm/_static/images/qhub-grafana-node-exporter.png b/docs/nebari-slurm/_static/images/qhub-grafana-node-exporter.png new file mode 100644 index 000000000..b649946e7 Binary files /dev/null and b/docs/nebari-slurm/_static/images/qhub-grafana-node-exporter.png differ diff --git a/docs/nebari-slurm/_static/images/qhub-grafana-slurm.png b/docs/nebari-slurm/_static/images/qhub-grafana-slurm.png new file mode 100644 index 000000000..8ce6197c5 Binary files /dev/null and b/docs/nebari-slurm/_static/images/qhub-grafana-slurm.png differ diff --git a/docs/nebari-slurm/_static/images/qhub-grafana-traefik.png b/docs/nebari-slurm/_static/images/qhub-grafana-traefik.png new file mode 100644 index 000000000..05346a8ac Binary files /dev/null and b/docs/nebari-slurm/_static/images/qhub-grafana-traefik.png differ diff --git a/docs/nebari-slurm/_static/images/qhub-jupyterlab-profile.png b/docs/nebari-slurm/_static/images/qhub-jupyterlab-profile.png new file mode 100644 index 000000000..b46323377 Binary files /dev/null and b/docs/nebari-slurm/_static/images/qhub-jupyterlab-profile.png differ diff --git a/docs/nebari-slurm/_static/images/qhub-landing-page.png b/docs/nebari-slurm/_static/images/qhub-landing-page.png new file mode 100644 index 000000000..83b535a55 Binary files /dev/null and b/docs/nebari-slurm/_static/images/qhub-landing-page.png differ diff --git a/docs/nebari-slurm/benchmark.md b/docs/nebari-slurm/benchmark.md new file mode 100644 index 000000000..cf8667500 --- /dev/null +++ b/docs/nebari-slurm/benchmark.md @@ -0,0 +1,99 @@ +# Benchmarking + +There are many factors that go into HPC performance. Aside from the +obvious cpu performance network and storage are equally important in a +distributed context. + +## Storage + +[fio](https://fio.readthedocs.io/en/latest/fio_doc.html) is a powerful +tool for benchmarking filesystems. Measuring maximum performance +especially on extremely high performance filesystems can be tricky to +measure and will require research on how to effectively use the +tool. Often times measuring maximum performance on high performance +distributed filesystems will require multiple nodes and threads for +reading/writing. However it should provide a good ballpark of +performance. + +Substitute `` with the filesystem that you want to +test. `df -h` can be a great way to see where each drive is +mounted. `fio` will need the ability to read/write in the given +directory. + +IOPs (input/output operations per second) + +### Maximum Write Throughput + +```shell +fio --ioengine=sync --direct=0 \ + --fsync_on_close=1 --randrepeat=0 --nrfiles=1 --name=seqwrite --rw=write \ + --bs=1m --size=20G --end_fsync=1 --fallocate=none --overwrite=0 --numjobs=1 \ + --directory= --loops=10 +``` + +### Maximum Write IOPs + +```shell +fio --ioengine=sync --direct=0 \ + --fsync_on_close=1 --randrepeat=0 --nrfiles=1 --name=randwrite --rw=randwrite \ + --bs=4K --size=1G --end_fsync=1 --fallocate=none --overwrite=0 --numjobs=80 \ + --sync=1 --directory= --loops=10 +``` + +### Maximum Read Throughput + +```shell +fio --ioengine=sync --direct=0 \ + --fsync_on_close=1 --randrepeat=0 --nrfiles=1 --name=seqread --rw=read \ + --bs=1m --size=240G --end_fsync=1 --fallocate=none --overwrite=0 --numjobs=1 \ + --directory= --invalidate=1 --loops=10 +``` + +### Maximum Read IOPs + +```shell +fio --ioengine=sync --direct=0 \ + --fsync_on_close=1 --randrepeat=0 --nrfiles=1 --name=randread --rw=randread \ + --bs=4K --size=1G --end_fsync=1 --fallocate=none --overwrite=0 --numjobs=20 \ + --sync=1 --invalidate=1 --directory= --loops=10 +``` + +## Network + +To test network latency and bandwidth there needs to be a source and +destination that you wish to test. It will expose a given port by +default `5201` with iperf. + +### Bandwidth + +Start a server on a given `` + +```shell +iperf3 -s +``` + +No on the `` machine run + +```shell +iperf3 -c +``` + +This will measure the bandwidth of the link between the nodes from +`` to ``. This means that if you are using a provider where +your Internet have very different upload vs. download speeds you will +see very different results in the direction. Add a `-R` flag to the +client to test the other direction. + +### Latency + +[ping](https://linux.die.net/man/8/ping) is a great way to watch the +latency between `` and ``. + +From the src machine run + +```shell +ping -4 -c 10 +``` + +Keep in mind that ping is the bi-directional (round trip) time. So +dividing by 2 is roughly the latency. diff --git a/docs/nebari-slurm/comparison.md b/docs/nebari-slurm/comparison.md new file mode 100644 index 000000000..de7ec3997 --- /dev/null +++ b/docs/nebari-slurm/comparison.md @@ -0,0 +1,79 @@ +# Nebari and Nebari Slurm Comparison + +At a high level QHub is focused on a Kubernetes and container based +deployment of all of its components. Many of the advantages of a +container based deployment allow for better security, scalability of +components and compute nodes. + +QHub-HPC is focused on bringing many of the same features but within a +bare metal installation allowing users to fully take advantage of +their hardware for performance. Additionally these installations tend +to be easier to manage and debug when issues arise (traditional linux +sys-admin experience works well here). Due to this approach QHub-HPC +lacks containers but achieves workflows and scheduling of compute via +[Slurm](https://slurm.schedmd.com/documentation.html) and keeping +[services +available](https://www.freedesktop.org/wiki/Software/systemd/). + +Questions to help determine which solution may be best for you: + +1. Are you deploying to the cloud e.g. AWS, GCP, Azure, or Digital Ocean? + +QHub is likely your best option. The auto-scalability of QHub compute +allows for cost effective usage of the cloud while taking advantage of +a managed Kubernetes. + +2. Are you deploying to a bare metal cluster? + +QHub-HPC may be your best option since deployment does not require the +complexity of managing a kubernetes cluster. If you do have a devops +or IT team to help manage kubernetes on bare metal QHub could be a +great option. But be advised that managing Kubernetes comes with quite +a lot of complexity which the cloud providers handle for us. + +3. Are you concerned about absolute best performance? + +QHub-HPC is likely your best option. But note when we say absolute +performance we mean your software is able to fully take advantage of +your networks Infiniband hardware, uses MPI, and SIMD +instructions. Few users fall into this camp and should rarely be a +reason to chose QHub-HPC (unless you know why you are making this +choice). + +# Feature Matrix + +| Core | QHub | QHub-HPC | +| ------------------------------------------------ | ----------------------------------- | ----------------- | +| Scheduler | Kubernetes | SystemD and Slurm | +| User Isolation | Containers (cgroups and namespaces) | Slurm (cgroups) | +| Auto-scaling compute nodes | X | | +| Cost efficient compute support (Spot/Premptible) | X | | +| Static compute nodes | | X | + +| User Services | QHub | QHub-HPC | +| ---------------------------------- | ---- | -------- | +| Dask Gateway | X | X | +| JupyterHub | X | X | +| JupyterHub-ssh | X | X | +| CDSDashboards | X | X | +| Conda-Store environment management | X | X | +| ipyparallel | | X | +| Native MPI support | | X | + +| Core Services | QHub | QHub-HPC | +| ------------------------------------------------------------- | ---- | -------- | +| Monitoring via Grafana and Prometheus | X | X | +| Auth integration (OAuth2, OpenID, ldap, kerberos) | X | X | +| Role based authorization on JupyterHub, Grafana, Dask-Gateway | X | X | +| Configurable user groups | X | X | +| Shared folders for each user's group | X | X | +| Traefik proxy | X | X | +| Automated Let's Encrypt and manual TLS certificates | X | X | +| Forward authentication ensuring all endpoints authenticated | X | | +| Backups via Restic | | X | + +| Integrations | QHub | QHub-HPC | +| ------------ | ---- | -------- | +| ClearML | X | | +| Prefect | X | | +| Bodo | | X | diff --git a/docs/nebari-slurm/configuration.md b/docs/nebari-slurm/configuration.md new file mode 100644 index 000000000..888748083 --- /dev/null +++ b/docs/nebari-slurm/configuration.md @@ -0,0 +1,413 @@ +# Configuration + +## User and group management + +### Adding new user + +In order to add new users to Nebari-Slurm add to the `enabled_users` +variable. The format for each user is: + +```yaml +enabled_users: + ... + - username: + uid: + fullname: + email: + primary_group: + groups: [ + ... +``` + +### Adding new groups + +In order to add new groups to Nebari-Slurm add to the `enabled_groups` +variable. The format for each group is: + +```yaml +enabled_groups: + ... + - name: + gid: + ... +``` + +### Ensuring groups or users that do not exist + +Adding any username or group name to `disabled_users` or +`disabled_groups`. + +```yaml +disabled_groups: + ... + - + ... + +disalbed_users: + ... + - + ... +``` + +## Adding additional packages to nodes + +Setting the variable `installed_packages` will ensure that the given +Ubuntu packages are installed on the given node. + +```yaml +installed_packages: + ... + - git + ... +``` + +## NFS client mounts + +You may mount arbitrary nfs mounts via `nfs_client_mounts` +variable. The format for `nfs_client_mounts` is as follows: + +```yaml +nfs_client_mounts: + ... + - host: + path: + ... +``` + +## Samba/CIFS client mounts + +You may mount arbitrary cifs/samba or windows file shares with +`samba_client_mounts`. The `username`, `password`, `options`, and +`domain` fields are optional. + +```yaml +samba_client_mounts: + ... + - name: + host: + path: + options: + username: + password: + domain: + ... +``` + +## JupyterHub + +### Setting arbitrary traitlets in JupyterHub + +Setting a key, key, value in `jupyterhub_custom` is equivalent to +setting the traitlet `c.. = `. For +example to set the traitlet `c.Spawner.start_timeout = 60`. + +```yaml +jupyterhub_custom: + Spawner: + start_timeout: 60 +``` + +### Arbitrary additional files as configuration + +You may add additional files that are run at the end of JupyterHub's +configuration via Traitlets. + +```yaml +jupyterhub_additional_config: + ... + 01-myconfig: "{{ inventory_dir }}/.py + ... +``` + +The variable `inventory_dir` is a convenient variable that allows you +to reference files created within your inventory directory. + +### JupyterHub idle culler + +Adjusting the `idle_culler` settings or disabling the culler is +configurable. + +```yaml +idle_culler: + enabled: true + timeout: 86400 # 1 day + cull_every: 3600 # 1 hour +``` + +- `timeout` is the time that a user is inactive +- `cull_every` is the interval to delete inactive jupyterlab instances + +### Set default UI to classic jupyter notebooks + +As of JupyterHub 2.0, the default user interface is Jupyterlab. If the classic Jupyter notebook UI is preferred, this can be configured as shown below. + +```yaml +jupyterhub_custom: + QHubHPCSpawner: + default_url: '/tree' + Spawner: + environment: + JUPYTERHUB_SINGLEUSER_APP: notebook.notebookapp.NotebookApp +``` + +### Turn off resource selection user options form + +The resource selection options form allows users to choose the cpu, memory, and partition on which to run their user server. This feature is enabled by default, but can be disabled as shown below. This controls the option form for both the user and CDSDashboards. + +```yaml +jupyterhub_qhub_options_form: false +``` + +### Profiles (Slurm jobs resources) + +Profiles in Nebari-Slurm are defined within a YAML configuration file. Each profile specifies a set of resources that will be allocated to the JupyterHub session or job when selected by the user. Below is an example of how to define profiles in the configuration file: + +```yaml +jupyterhub_profiles: + - small: + display_name: Profile 1 [Small] (1CPU-2GB) + options: + req_memory: "2" + req_nprocs: "1" + - medium: + display_name: Profile 2 [Medium] (1CPU-4GB) + options: + req_memory: "4" + req_nprocs: "1" +``` + +In the example above, two profiles are defined: small and medium. Each profile has a display_name that describes the profile to users in a human-readable format, including the resources allocated by that profile (e.g., "Profile 1 [Small] (1CPU-2GB)"). The options section specifies the actual resources to be allocated: + +- **req_memory**: The amount of memory (in GB) to be allocated. +- **req_nprocs**: The number of CPU processors to be allocated. + +_Note_: All slurm related configuration needs to be passed down as a string. + +### Services + +Additional services can be added to the `jupyterhub_services` +variable. Currently this is only `: +`. You must keep the `dask_gateway` section. + +```yaml +jupyterhub_services: + ... + : + ... +``` + +### Theme + +The theme variables are using +[qhub-jupyterhub-theme](https://github.com/Quansight/qhub-jupyterhub-theme). All +variables are configurable. `logo` is a special variable where you +supply a url that the users web browser can access. + +```yaml +jupyterhub_theme: + template_vars: + hub_title: "This is Nebari Slurm" + hub_subtitle: "your scalable open source data science laboratory." + welcome: "have fun." + logo: "/hub/custom/images/jupyter_qhub_logo.svg" + primary_color: '#4f4173' + secondary_color: '#957da6' + accent_color: '#32C574' + text_color: "#111111" + h1_color: "#652e8e" + h2_color: "#652e8e" +``` + +## Copying Arbitrary Files onto Nodes + +Arbitrary files and folders can be copied from the ansible control +node onto the managed nodes as part of the ansible playbook deployment +by setting the following ansible variables to copy files onto all +nodes, all nodes in a particular group or only onto a particular node +respectively. + +- `copy_files_all` +- `copy_files_[ansible_group_name]` +- `copy_files_[ansible_ssh_host_name]` + +Copying two files/folders onto the hpc02-test node could be done by +setting the following ansible variable e.g. in the +host_vars/hpc02-test.yaml file. + +```yaml +... +copy_files_hpc02-test: + - src: /path/to/file/on/control/node + dest: /path/to/file/on/managed/node + owner: root + group: root + mode: 'u=rw,g=r,o=r' + directory_mode: '644' + - src: /path/to/other/file/on/control/node + dest: /path/to/other/file/on/managed/node + owner: vagrant + group: users + mode: '666' + directory_mode: 'ugo+rwx' +``` + +The owner, group, and mode fields are optional. See [copying modules](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/copy_module.html#id2) for more detail about each field. + +Remember that the home directory of users is a network file system so +it would only be necessary to copy files in the user directories into +a single node. + +## Slurm + +Slurm configuration. Only a few slurm variables should be +configured. Sadly with how ansible works all variables must be copied +from the default. These two configuration settings allow additional +ini section and keys to be set. + +```yaml +slurm_config: + ... + +slurmdbd_config: + ... +``` + +## Traefik + +### Accessing Nebari Slurm from a Domain + +By default, a Nebari-Slurm deployment must be accessed using the ip +address of the hpc_master node. However, if a domain name has been +set up to point to the hpc_master node, then Nebari Slurm's router, +[Traefik](https://doc.traefik.io/traefik/), can be configured to work +with the domain by setting the `traefik_domain` ansible variable. + +For example, if you had the example.com domain set up to point to the +hpc_master node, then you could add the following to the all.yaml file +and redeploy, after which navigating to `https://example.com` in a web +browser would bring up your Nebari Slurm deployment sign in page. + +```yaml +traefik_domain: example.com +``` + +### Automated Lets-Encrypt Certificate + +Traefik can provision a tls certificate from Let's Encrypt assuming +that your master node ip is publicly accessible. Additionally the +`traefik_domain` and `traefik_letsencrypt_email` must be set. + +```yaml +traefik_tls_type: letsencrypt +traefik_letsencrypt_email: myemail@example.com +``` + +### Custom TLS Certificate + +By default, traefik will create and use a self signed TLS certificate +for user communication. If desired, a custom TLS Certificate can be +copied from ansible to the appropriate location for use by Traefik. +To do so, set the following settings in the all.yaml file. + +```yaml +traefik_tls_type: certificate +traefik_tls_certificate: /path/to/MyCertificate.crt +traefik_tls_key: /path/to/MyKey.key +``` + +For testing out this optional it is easy to generate your own +self-signed certificate. Substitute all of the values for values that +fit your use case. If you need to copy the certificate and/or key from a +location on the remote server (as opposed to a local file on the Ansible +host), you can also add `traefik_tls_certificate_remote_src: true` and +`traefik_tls_key_remote_src: true`, respectively. + +```shell +export QHUB_HPC_DOMAIN=example.com +openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 \ + -subj "/C=US/ST=Oregon/L=Portland/O=Quansight/OU=Org/CN=$QHUB_HPC_DOMAIN" \ + -nodes +``` + +## Prometheus + +### Adding additional scrape configs + +If you want to add additional jobs/targets for Prometheus to scrape and ingest, you can +use the `prometheus_additional_scrape_configs` variable to define your own: + +```yaml +prometheus_additional_scrape_configs: + - job_name: my_job + static_configs: + - targets: + - 'example.com:9100' +``` + +## Grafana + +### Changing provisioned dashboards folder + +All provisioned dashboards in the `grafana_dashboards` variable will be added to the +"General" folder by default. "General" is a special folder where anyone can add +dashboards and can't be restricted so if you wish to separate provisioned dashboards you +can set `grafana_dashboards_folder`: + +```yaml +grafana_dashboards_folder: "Official" +``` + +### Specifying version + +The latest Grafana version will be installed by default but a specific or minimum +version can be specified with `grafana_version`: + +```yaml +grafana_version: "=9.0.0" # Pin to 9.0.0 +grafana_version: ">=9.0.0" # Install latest version only if current version is <9.0.0 +``` + +### Adding additional configuration + +You can add additional configuration to the `grafana.ini` file using +`grafana_additional_config`: + +```yaml +grafana_additional_config: | + [users] + viewers_can_edit = true # Allow Viewers to edit but not save dashboards +``` + +## Backups + +Backups are performed in Nebari Slurm via [restic](https://restic.net/) +an open source backup tool. It is extremely flexible on where backups +are performed as well as supporting encrypted, incremental backups. + +### Variables + +The following shows a daily backup on S3 for QHub. + +```yaml +backup_enabled: true +backup_on_calendar: "daily" +backup_randomized_delay: "3600" +backup_environment: + RESTIC_REPOSITORY: "s3:s3.amazonaws.com/bucket_name" + RESTIC_PASSWORD: "thisismyencryptionkey" + AWS_ACCESS_KEY_ID: accesskey + AWS_SECRET_ACCESS_KEY: mylongsecretaccesskey +``` + +- `backup_enabled` :: determines whether backups are enabled +- `backup_on_calendar` :: determines the frequency to perform backups. Consult [systemd timer](https://www.freedesktop.org/software/systemd/man/systemd.timer.html) documentation for syntax +- `backup_randomized_delay` :: is the random delay in seconds to apply to backups. Useful to prevent backups from all being performed at an exact time each day +- `backup_environment` :: are all the key value pairs used to configure restic. RESTIC_REPOSITORY and RESTIC_PASSWORD are required. The rest are environment variables for the specific [backup repository](https://restic.readthedocs.io/en/stable/030_preparing_a_new_repo.html). + +### Manual backup + +At any time you can trigger a manual backup. SSH into the master node. + +```shell +sudo systemctl start restic-backup.service +``` diff --git a/docs/nebari-slurm/development.md b/docs/nebari-slurm/development.md new file mode 100644 index 000000000..9035f78a8 --- /dev/null +++ b/docs/nebari-slurm/development.md @@ -0,0 +1,153 @@ +# Development Guide + +Welcome to the Nebari Slurm development guide! This guide will help you set up your development environment and tools for working with Nebari Slurm using Vagrant to orchestrated the virtual machines (VMs) and Ansible to provision the necessary infrastructure. + +## Prerequisites + +Before you begin, make sure you have the following prerequisites installed on your system: + +- Vagrant: Vagrant is a tool for building and managing virtual machine environments. Install the latest version for your operating system. + +- Virtualization Provider: Depending on your preference, choose either Libvirt or VirtualBox as your virtualization provider. + +- Git: Git is a version control system. You'll need it to clone the Nebari Slurm repository. + +Let's get started! + +## Additional Development Tasks + +It is recommended to use a personal environment for development. This will allow you to install additional packages and tools without affecting your system's global environment. We recommend using Conda to create a personal environment but you could use pipenv or virtualenv as well. Keep in mind that Ansible cannot run on a Windows host natively, though it can run under the Windows Subsystem for Linux (WSL). + +### Installing Ansible + +We recommend installing Ansible via Conda for a seamless development experience. First, you need to install Conda: + +```bash +conda create -n qhub-hpc -c conda-forge ansible +conda activate qhub-hpc +``` + +This creates a Conda environment named qhub-hpc and installs Ansible within it. You can activate this environment whenever you need to use Ansible for Nebari Slurm development tasks. + +If you prefer not to use Conda and want to install Ansible using other methods, please follow the official Ansible installation instructions for your specific platform. + +## Choose a Virtualization Provider + +Nebari Slurm supports two virtualization providers: Libvirt and QEMU. You can choose the one that suits your needs. If you're unsure, we recommend using Libvirt. + +### Providers + +Vagrant is a versatile tool that can manage different types of machines through various providers. While Vagrant ships with support for VirtualBox, Hyper-V, and Docker, it can work with other providers as well. Choosing the right provider can offer features that align with your specific use case. + +Alternate providers may offer advantages such as better stability and performance. For example, if you intend to use Vagrant for significant workloads, VMware providers are often recommended due to their robust support and reliability, surpassing VirtualBox in many scenarios. + +Before you can use a different provider, you must install it using the Vagrant plugin system. Once installed, using the provider is straightforward and aligns with Vagrant's user-friendly approach. + +For Nebari Slurm development, we provide instructions for two providers: Libvirt and virtualbox. Follow the relevant subsection for your chosen provider to set up your development environment. + +### Libvirt + +Libvirt is a toolkit for managing virtualization platforms. It provides a common API for different virtualization technologies, including QEMU, KVM, Xen, LXC, and VirtualBox. Libvirt is a popular choice for Linux-based systems and is the default provider for Vagrant on Linux. + +If you're using a Linux-based system, we recommend using Libvirt as your provider. It offers better performance and stability than VirtualBox and is the default provider for Vagrant on Linux. For installation documentation, please refer to the [Libvirt provider documentation](https://ubuntu.com/server/docs/virtualization-libvirt). + +Libvirt will also require you to install a an extension for Vagrant called `vagrant-libvirt`. You can install this extension by running the following command: + +```bash +vagrant plugin install vagrant-libvirt +``` + +Note: For more information you can refer to this opensource article on [Libvirt](https://opensource.com/article/21/10/vagrant-libvirt). + +### VirtualBox + +VirtualBox is a popular virtualization platform that supports a wide range of operating systems. It is a cross-platform solution that is available for Windows, macOS, and Linux. VirtualBox is the default provider for Vagrant on Windows and macOS. + +If you're using Windows or macOS, we recommend using VirtualBox as your provider. It is the default provider for Vagrant on these platforms and offers a stable and reliable experience. For installation documentation, please refer to the [VirtualBox community documentation](https://help.ubuntu.com/community/VirtualBox/Installation). + +## Create and Provision VMs + +Select one of the available test Vagrant files already present in this repository under the `/tests` directory. For example, if you want to test the `ubuntu1804` Vagrant file, run the following command: + +```bash +cd tests/ubuntu1804 +vagrant up --provider= +``` + +Before creating and provisioning the virtual machines (VMs), please be aware of potential naming conflicts if someone else has already used the same machine. To avoid conflicts, consider one of the following options: + +1. Choose a Different Ubuntu Version: + + - Instead of `ubuntu1804`, try using `ubuntu2004` if it's available. + +2. Add Your Own Prefix: + You can add a unique prefix to the VM names to avoid conflicts. Set an environment variable `HPC_VM_PREFIX` with your chosen prefix before running any Vagrant commands. For example: + + ```bash + export HPC_VM_PREFIX='-' + ``` + +This will prefix all VM names with the string you provide. For example, if you set `HPC_VM_PREFIX='-abc'`, the VM names will be `abc-ubuntu1804-master`, `abc-ubuntu1804-worker-1`, and so on. + +This should spin up the VMs and provision them using Ansible. If you encounter any errors, please refer to the [Troubleshooting] section. + +Now that your VMs are up and running, lets populate then with the QHub infrastrcture. To do so, copy the contents of `templates.inventory/*` over into the newly created `.vagrant/provisioners/ansible/inventory/` directory. For example considering that you are in the `tests/ubuntu1804` directory, run the following command: + +```bash +cp -r ../../templates.inventory/* .vagrant/provisioners/ansible/inventory/ +``` + +you should now be able to see two new folders being added `host_vars` and `group_vars` under the `.vagrant/provisioners/ansible/inventory/` directory. These folders contain the variables that are used by Ansible to provision the VMs. For more information on the variables, please refer to the [configuration](./configuration.md) page in this documentation. + +In the example above, the directory structure should be as follows: + +```bash +tests/ubuntu2004/.vagrant/provisioners/ansible/inventory/ +├── group_vars +│ ├── all.yaml +│ ├── hpc_master.yaml +│ └── hpc_worker.yaml +├── host_vars +│ └── hpc01-test.yaml +└── vagrant_ansible_inventory +``` + +Now to make the new changes to propagate, run the following command: + +```bash +vagrant provision +``` + +This should now populate the VMs with the QHub infrastructure. You can now access the JupyterHub instance by visiting `https:///` where `` is the ip address of your specific deployment. You will be prompted by the jupyterhub landing page. + +If you would like to set a DNS record for the master node, you can do so by adding the following line to your `/etc/hosts` file: + +```bash + +``` + +and make sure to replace `` with the ip address of your master node and `` with the domain name you would like to use. If using a service such as CloudFlare to manage your DNS records, you can also set up an A record to point to the master node ip address and the above step will not be required. + +Though, do update the contents of `group_vars/all.yaml` to include the extra field (at the end of the file): + +```yaml +traefik_domain: +``` + +This will ensure that the traefik reverse proxy is configured to use the domain name you have set. + +## Checking services + +For debugging purposes, you can inspect service status, logs or restart an individual service while connected to the master node. Below we give an example of how to do this for the JupyterHub and Slurm services. + +```bash +# Restart JupyterHub: +systemctl restart jupyterhub + +# Inspect logs: +journalctl -u jupyterhub -e +``` + +`SlurmSpawner` logs are stored in the worker nodes in the home folder of the user running JupyterLab, e.g., `/home/example-user/.jupyterhub_slurmspawner_9.log` + +## Troubleshooting diff --git a/docs/nebari-slurm/faq.md b/docs/nebari-slurm/faq.md new file mode 100644 index 000000000..04e0f2474 --- /dev/null +++ b/docs/nebari-slurm/faq.md @@ -0,0 +1,24 @@ +# Frequently Asked Questions + +Q1: Can a user access another user's home directory in JupyterLab? + +No. Every user's home directory is private to themselves and they cannot access contents +of any other user's home directory. Example below shows the permissions of user directories +in `/home`. + +```bash +$ ls -ltrh /home + +total 36K +drwx------ 9 john-doe example-user 4.0K Apr 1 19:22 john-doe +drwx------ 9 alice-doe example-user 4.0K Apr 1 19:34 alice-doe +``` + +```bash +john-doe@worker-01:~$ pwd +/home/john-doe + +# The user john-doe unable to access contents of user alice-doe's home directory: +john-doer@worker-01:~$ ls /home/alice-doe/ +ls: cannot open directory '/home/alice-doe/': Permission denied +``` diff --git a/docs/nebari-slurm/installation.md b/docs/nebari-slurm/installation.md new file mode 100644 index 000000000..2e4b23715 --- /dev/null +++ b/docs/nebari-slurm/installation.md @@ -0,0 +1,157 @@ +# Installation + +## Requirements + +QHub-HPC currently requires ubuntu bare metal machines. There are +plans to support additional OSs such as RHEL. We actively test on the +latest stable Ubuntu release. We require: + +- 1 main node with at least 4 cpus and 16 GB of RAM +- 0-N worker nodes (main node can be a worker node but is not + recommended) with no significant requirements on resources + +Must be able to ssh into each node from the node you are running the +ansible commands and connect via root (not recommended) or user and +sudo with or without a password. + +## Dependencies + +We recommend installing ansible via conda. First you must [install +conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html). + +```shell +conda create -n qhub-hpc -c conda-forge ansible +conda activate qhub-hpc +``` + +If you do not want to use conda follow the [Ansible installation +instructions](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html). + +## Installation + +Prior to the `0.4` release each QHub-HPC deployment required +maintaining a fork and rebasing all the changes to QHub-HPC within the +`main` branch. Now we strive to maintain the variables within the +QHub-HPC roles. + +### Copy the template + +In this example we create our own deployment `my-deployment` and +create a managed git repository for our deployment. You will want to +substitute this name. + +```shell +git clone https://github.com/Quansight/qhub-hpc /tmp/qhub-hpc + +mkdir my-deployment +cd my-deployment +git init + +cp -r /tmp/qhub-hpc/inventory.template/* . +``` + +This will initialize the directory with: + +- `inventory` which is an [ansible inventory](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html) +- `host_vars` is a collection of host specific variables within the hosts +- `group_vars` is a collection of group specific variables within the hosts + +Over time you may add additional directories and files to the +repository. + +## Modify the ansible inventory + +Below is an example ansible inventory file used for +`ansible-playbook`. There are [great docs on modifying the ansible inventory file](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html). +The important keys to set: + +- `ansible_ssh_host` which is the DNS accessible name +- `ansible_port` default is `22` +- `ansible_user` which is the username to login into node by default is the user that the `ansible-playbook` command is run as +- `ansible_ssh_private_key_file` is the path to the ssh key to login to node + +Next you must configure the groups. In this case the `hpc_master` and +`hpc_worker` groups. There must only be one node in the `hpc_master` +group. `N` nodes can be in the `hpc_worker` section (including the +hpc_master node which is not recommended). + +``` +hpc02-test ansible_ssh_host=192.168.121.124 ansible_port=22 ansible_user='vagrant' ansible_ssh_private_key_file='/home/costrouc/.vagrant.d/insecure_private_key' +hpc03-test ansible_ssh_host=192.168.121.176 ansible_port=22 ansible_user='vagrant' ansible_ssh_private_key_file='/home/costrouc/.vagrant.d/insecure_private_key' +hpc04-test ansible_ssh_host=192.168.121.133 ansible_port=22 ansible_user='vagrant' ansible_ssh_private_key_file='/home/costrouc/.vagrant.d/insecure_private_key' +hpc01-test ansible_ssh_host=192.168.121.35 ansible_port=22 ansible_user='vagrant' ansible_ssh_private_key_file='/home/costrouc/.vagrant.d/insecure_private_key' + +[hpc_master] +hpc01-test + +[hpc_worker] +hpc02-test +hpc03-test +hpc04-test + +[partition_example] +hpc02-test +hpc04-test +``` + +Arbitrary additional groups with name `partition_` may be added +to create additional slurm partition groups of name ``. This can +be useful if you want a Slurm partition for on the gpu or high memory +nodes. + +# host_vars + +If you would like to set specific variables for a given host you must +create a file in `host_vars/.yaml`. Currently we only +recommend a few variables be set on the host_vars. This is the slurm +resources for each node. E.g. the following. + +```yaml +slurm_memory: 16000 +slurm_cpus: 4 +slurm_sockets_per_board: 4 +``` + +This however is difficult to correctly set see the section +"Configuring and adding node information" in [slurm.md](./slurm.md). A +hosts file should be created for each node in the `hpc_worker` +group. In the inventory example above the following files would exist: + +- `host_vars/hpc02-test.md` +- `host_vars/hpc03-test.md` +- `host_vars/hpc04-test.md` + +It is okay to get the memory and cpus wrong. This can later be +corrected on a re-deployment. + +# group_vars + +Most if not all configuration will be done in the `group_vars` +directory. Within that directory there are three groups: + +- `all.yaml` which are variables set for all nodes +- `hpc_master.yaml` which are variables set for the hpc master node +- `hpc_worker.yaml` which are variables set for the hpc worker nodes + +Detailed information on customizing the configuration should see the +[configuration](./configuration.md). + +# Deployment + +After any modifications to the `host_vars/*`, `group_vars/*`, or +`inventory` an Ansible deployment should be performed. QHub-HPC has +intentionally followed a normal Ansible deployment pattern to allow +for reusing the amazing tutorials and documentation currently in place +around Ansible. + +```shell +cd my-deployment +ansible-playbook -i inventory /tmp/qhub-hpc/playbook.yaml +``` + +# Checking the status of the deployment + +Upon successful deployment you should be able to visit the +`https:///` where `` is the ip address or dns +name of your specific deployment. You will be prompted by the +jupyterhub landing page. diff --git a/docs/nebari-slurm/overview.mdx b/docs/nebari-slurm/overview.mdx index 6f3d1a4d2..988a2a1fb 100644 --- a/docs/nebari-slurm/overview.mdx +++ b/docs/nebari-slurm/overview.mdx @@ -8,3 +8,64 @@ The high level goal of this distribution is to form a cohesive set of tools that * monitoring of compute infrastructure and services * scalable and efficient compute via jupyterlab and dask * deployment of jupyterhub on prem without requiring deep devops knowledge of the Slurm/HPC and jupyter ecosystem +:::important +Nebari-Slurm was previously called Qhub-HPC, the documentation pages are being migrated, so there are a few mentions of the original name. +::: + +# Overview + +Nebari Slurm is a High-Performance Computing (HPC) deployment using [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/). In this document, we will discuss the services that run within this architecture and how they are interconnected. The setup follows a standard HPC configuration with a master/login node and 'N' worker nodes. + +The master node serves as the central control and coordination hub for the entire cluster. It plays a pivotal role in managing and optimizing cluster resources and ensuring secure, efficient, and reliable operations. In contrast, worker nodes primarily focus on executing computational tasks and rely on instructions from the master node for job execution. + +At a high level, the architecture comprises several key services: monitoring, the job scheduler ([Slurm](https://slurm.schedmd.com/overview.html)), and JupyterHub along with related Python services. + +Important URLs: + +- `https:///`: JupyterHub server +- `https:///monitoring/`: Grafana server +- `https:///auth/`: Keycloak server +- `https:///gateway/`: Dask-Gateway server for remote connections +- `ssh -p 8022`: SSH into a JupyterLab session for users (requires a JupyterHub token) + +## Services (All Nodes) + +- [node_exporter](https://github.com/prometheus/node_exporter): Collects node metrics (default port 9100) + +## Master Node + +### Services + +#### Authentication + +- [Keycloak](https://www.keycloak.org/): Provides enterprise-grade open-source authentication + +#### Control and Coordination + +- [Slurm](https://slurm.schedmd.com/overview.html): Manages job scheduling, resource allocation, and cluster control +- [slurmctld](https://slurm.schedmd.com/slurmctld.html): Manages the Slurm central management daemon +- [slurmdbd](https://slurm.schedmd.com/slurmdbd.html): Handles Slurm accounting +- [MySQL](https://www.mysql.com/): Acts as the database for Slurm accounting + +#### Reverse Proxy and Routing + +- [Traefik](https://traefik.io/): Serves as an open-source network proxy, routing network traffic efficiently + +#### Monitoring and Metrics + +- [Grafana](https://grafana.com/): Acts as a central place to view monitoring information (default port 3000) +- [Prometheus](https://prometheus.io/docs/introduction/overview/): Scrapes metrics (default port 9090) +- [slurm_exporter](https://github.com/vpenso/prometheus-slurm-exporter): Provides Slurm metrics (default port 9341) +- [Traefik exported metrics](https://doc.traefik.io/traefik/observability/metrics/overview/) +- [JupyterHub exported metrics](https://jupyterhub.readthedocs.io/en/stable/reference/metrics.html) + +#### Python Ecosystem + +- [JupyterHub](https://jupyter.org/hub): Provides scalable interactive computing (default port 8000) +- [Dask-Gateway](https://gateway.dask.org/): Enables scalable distributed computing +- [NFS server](https://en.wikipedia.org/wiki/Network_File_System): Facilitates sharing Conda environments and home directories among all users +- [conda-store](https://conda.store/): Manages Conda environments within nodes + +## Worker Nodes + +Worker nodes primarily focus on executing computational tasks and have minimal dependencies, making them efficient for running parallel workloads. They rely on instructions from the master node for job execution and do not have the same level of control and coordination responsibilities as the master node. The master node's role is pivotal in orchestrating the overall cluster's functionality and ensuring efficient and secure operations. diff --git a/docs/nebari-slurm/slurm.md b/docs/nebari-slurm/slurm.md new file mode 100644 index 000000000..8c7dbcabe --- /dev/null +++ b/docs/nebari-slurm/slurm.md @@ -0,0 +1,151 @@ +# Slurm + +For detailed slurm information please refer to the +[documentation](https://slurm.schedmd.com/overview.html). + +## Checking Health of Slurm Cluster + +[sinfo](https://slurm.schedmd.com/sinfo.html) is your bread and butter +and should be used to quickly check the health of the cluster. + +```shell +sinfo +``` + +```shell +PARTITION AVAIL TIMELIMIT NODES STATE NODELIST +general* up infinite 1 mix hpc02-test +general* up infinite 2 idle hpc03-test,hpc04-test +``` + +## Get current job queue including running jobs + +```shell +squeue +``` + +## Getting information about a job + +```shell +scontrol show job +``` + +```shell +JobId=37 JobName=spawner-jupyterhub + UserId=vagrant(1000) GroupId=vagrant(1000) MCS_label=N/A + Priority=4294901724 Nice=0 Account=(null) QOS=normal + JobState=RUNNING Reason=None Dependency=(null) + Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 + RunTime=00:01:16 TimeLimit=UNLIMITED TimeMin=N/A + SubmitTime=2021-01-19T14:27:24 EligibleTime=2021-01-19T14:27:24 + AccrueTime=2021-01-19T14:27:24 + StartTime=2021-01-19T14:27:24 EndTime=Unknown Deadline=N/A + SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-01-19T14:27:24 + Partition=general AllocNode:Sid=localhost:135266 + ReqNodeList=(null) ExcNodeList=(null) + NodeList=hpc02-test + BatchHost=hpc02-test + NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* + TRES=cpu=1,mem=1G,node=1,billing=1 + Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* + MinCPUsNode=1 MinMemoryNode=1G MinTmpDiskNode=0 + Features=(null) DelayBoot=00:00:00 + OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) + Command=(null) + WorkDir=/home/vagrant + StdErr=/home/vagrant/.jupyterhub_slurmspawner_37.log + StdIn=/dev/null + StdOut=/home/vagrant/.jupyterhub_slurmspawner_37.log + Power= +``` + +## Configuring and adding node information + +For each node create a `host_vars/.yaml` and omit any +fields if you want to use the default value. Suppose the following +configuration for `host_vars/hpc02-test.yaml`. + +```yaml +slurm_memory: 7976 # RealMemory (default 1024) +slurm_cpus: 4 # CPUs (default 1) +slurm_boards: 1 # Boards (default 1) +slurm_sockets_per_board: 4 # SocketsPerBoard (default 1) +slurm_cores_per_socket: 1 # CoresPerSocket (default 1) +slurm_threads_per_core: 1 # ThreadsPerCore (default 1) +``` + +Would result in the following slurm node configuration + +```init +# Nodes +NodeName=hpc02-test RealMemory=7976 CPUs=4 Boards=1 SocketsPerBoard=4 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN +``` + +You can get the detailed node specs via slurmd and can be used to +easily set the node configuration. The more accurately that you set +the node information for slurm the more accurately users can target +their programs on the hardware. + +```shell +slurmd -C +``` + +```shell +NodeName=hpc02-test CPUs=4 Boards=1 SocketsPerBoard=4 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=7976 +UpTime=0-01:46:52 +``` + +If you have set an incorrect configuration, the nodes may enter a +DRAIN state with low cores*sockets*threads and memory error. You will +then need to modify the node state to IDLE once it is properly +configured. + +## Modifying Node State + +There are several common cases where one would need to manually change +the node state. All slurm management is done via the `sacct` and +`scontrol` command. In this case we need to use `scontrol` command. + +```shell +scontrol update nodename= state=IDLE +``` + +This is useful if you want to resume a node for operation. + +## Node States + +The full list of [node +states](https://slurm.schedmd.com/sinfo.html#lbAG). Here we outline +some of the common ones. + +- ALLOCATED :: node is completely consumed +- MIXED :: node is partially consumed +- IDLE :: node is idle and has no running jobs +- DRAIN :: node is unable to schedule new jobs but running jobs will finish + +## Adding Slurm Partitions + +Partitions in slurm can easily be created via ansible groups. Any +group start starts with `partition-`. For example + +```ini +[hpc_master] +hpc01-test + +[hpc_worker] +hpc02-test +hpc03-test +hpc04-test + +[partition_example] +hpc02-test +hpc04-test +``` + +Will create the following slurm partitions + +```ini +# Partitions +PartitionName=general Nodes=hpc02-test,hpc03-test,hpc04-test Default=YES MaxTime=INFINITE State=UP +PartitionName=example Nodes=hpc02-test,hpc04-test Default=NO MaxTime=INFINITE State=UP +``` diff --git a/docs/sidebarsSlurm.js b/docs/sidebarsSlurm.js index 9a6d1503f..8c381d50a 100644 --- a/docs/sidebarsSlurm.js +++ b/docs/sidebarsSlurm.js @@ -13,11 +13,46 @@ /** @type {import('@docusaurus/plugin-content-docs').SidebarsConfig} */ module.exports = { - sidebar: [ - { - label: "Overview", - type: "doc", - id: "overview", - }, + sidebar: [ + { + label: "Overview", + type: "doc", + id: "overview", + }, + { + label: "Installation", + type: "doc", + id: "installation", + }, + { + label: "Configuration", + type: "doc", + id: "configuration", + }, + { + label: "Benchmark", + type: "doc", + id: "benchmark" + }, + { + label: "Slurm", + type: "doc", + id: "slurm" + }, + { + label: "Development", + type: "doc", + id: "development" + }, + { + label: "Comparison with Nebari", + type: "doc", + id: "comparison" + }, + { + label: "FAQ", + type: "doc", + id: "faq" + }, ], - } +}