Skip to content

Commit

Permalink
update readme (#8)
Browse files Browse the repository at this point in the history
  • Loading branch information
kerrychu authored Jan 23, 2024
1 parent d8356ce commit d100df3
Showing 1 changed file with 19 additions and 14 deletions.
33 changes: 19 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,40 @@

**⚠️This repo is still under ACTIVE development.**

## Python Version Compatibility
python version `>= 3.9`

## Huh?
- **Slurm** is a robust open-source workload manager designed for high-performance computing clusters. It efficiently allocates resources, manages job submissions, and optimizes task execution. With commands like `sbatch` and `squeue`, Slurm provides a flexible and scalable solution for seamless task control and monitoring, making it a preferred choice in academic and research settings. Various research centers and universities have unique names for their Slurm clusters. At the University of Queensland, our clusters go by the distinctive name "Bunya."
**Slurm** is a robust open-source workload manager designed for high-performance computing clusters. It efficiently allocates resources, manages job submissions, and optimizes task execution. With commands like `sbatch` and `squeue`, Slurm provides a flexible and scalable solution for seamless task control and monitoring, making it a preferred choice in academic and research settings. Various research centers and universities have unique names for their Slurm clusters. At the University of Queensland, our clusters go by the distinctive name "Bunya."

## SlurmWatch

Introducing **SlurmWatch** - a tool meticulously crafted for effortless monitoring of sbatch jobs. Say goodbye to uncertainties; experience prompt notifications, ensuring you stay informed and in control.

## Scheduling
### Current Capabilities

- monitor a single user's (the user signed in) Slurm job(s) -> `src/my_jobs.py`
- monitor multiple users' Slurm GPU job(s) -> `src/gpu_jobs.py`
- monitor resource(GPU) usage of multiple FileSet(s) -> `src/quota.py`
- monitor resource(Nodes) availability -> `src/available_nodes.py`

### Scheduling

- For the moment, you can fork it, or just clone it and use crontab to run `monitor.py`
- Follow the `dot_env_template` to create your own `.env` file
- then do `crontab -e`
- and add `* * * * * your-python-path complete-file-path-to-monitor.py` to your cronjob
- for example, `* * * * * ~/anaconda3/bin/python /scratch/user/your-username/bunya_jobs/monitor.py`
- then your jobs will be monitored at an 1 minute interval
- if you wish to have a different interval, check this [page](https://www.atatus.com/tools/cron).
- and add a schedule of your preference
- for example, `* * * * * ~/anaconda3/bin/python /scratch/user/your-username/SlurmWatch/src/quota.py`
- to choose a schedule of your preference, check this helpful [crontab expression page](https://www.atatus.com/tools/cron).

## Slack Integration
### Integration

#### Slack

- follow [slack webhook tutorial](https://api.slack.com/messaging/webhooks) to create a slack app for your slack workspace and add it to appropriate channels
- remember to replace the `.env` webhook to your own

## Future Features
- notification when job status change
- enable capability to monitor multiple users jobs instead of the signed in user
- flexible configuration
- adding a debug mode

## Future Integrations
### Future Features & Integrations

Currently, the future integrations considered are
- email
Expand Down

0 comments on commit d100df3

Please sign in to comment.