# cloudos-cli training

## Repository and documentation

Repository link: https://github.com/lifebit-ai/cloudos-cli (public repository).
Available documentation:
   - Repository documentation: https://github.com/lifebit-ai/cloudos-cli/blob/main/README.md


## Installation

The package requires:
- Python >= 3.7
- click >= 8.0.1
- pandas >= 1.3.4
- numpy==1.26.4
- requests >= 2.26.0
- pip

Clone the repository and install it using pip:

In [None]:
git clone https://github.com/lifebit-ai/cloudos-cli
cd cloudos-cli
pip install -r requirements.txt
pip install .
cd ..

### Recommended alternative: docker image

Instead of installing it from the GitHub repository, we recommend to use the already available docker image. You can check the latest version available at : https://github.com/lifebit-ai/cloudos-cli/releases or simply use the `latest` tag. 

`docker run --rm -it quay.io/lifebitaiorg/cloudos-cli:latest` (currently equivalent to `docker run --rm -it quay.io/lifebitaiorg/cloudos-cli:v2.13.0`)

You can check the current version using:

In [None]:
cloudos --version

## CloudOS required variables

Running `cloudos-cli` usually requires to get some values from CloudOS UI:
- Cloudos URL: https://cloudos.lifebit.ai
- Workspace ID: 5c6d3e9bd954e800b23f8c62
- API key: xxx (first, you need to generate it from the UI)
>NOTE: Please, change these values according to your CloudOS workspace.

Additionally, for using job functionality you normally also need:
- Project name: an already existing Project from "Projects" CloudOS section
- workflow name: an already available pipeline from "Pipelines & Tools" CloudOS section.
> NOTE: currently, `cloudos-cli` only supports the execution of Nextflow and WDL pipelines.

We can set them as bash variables to re-use them in serveral `cloudos-cli` calls:

In [None]:
# Please, change the CloudOS URL and workspace ID according to your version of CloudOS
CLOUDOS="https://cloudos.lifebit.ai"
WORKSPACE_ID="5c6d3e9bd954e800b23f8c62"
APIKEY="xxx"
PROJECT="cloudos-cli-training"

We can test our credentials by running a simple command to list all the available projects in the workspace:

In [None]:
cloudos project list \
    --cloudos-url $CLOUDOS \
    --apikey $APIKEY \
    --workspace-id $WORKSPACE_ID

## Preview of cloudos-cli features

Currently, `cloudos-cli` include the following modules:
- **job**:CloudOS job functionality: run and check jobs in CloudOS.
- **cromwell**: Cromwell server functionality: check status, start and stop.
- **workflow**: CloudOS workflow functionality: list workflows in CloudOS.
- **project**: CloudOS project functionality: list projects in CloudOS.
- **queue**:CloudOS job queue functionality.

You can get general help using `--help` command:

In [None]:
cloudos --help

And module specific help and description of all the available parameters using `--help` on each module and submodule. E.g.:

In [None]:
cloudos job --help

In [None]:
cloudos job run --help

For a more detailed explanation of all the available features, please check the official documentation at: https://github.com/lifebit-ai/cloudos-cli/README.md

## cloudos-cli test case 1: launch and monitor a Nextflow job

In this first test case, we will try to launch and check the status of a job using the following pipeline: "Cufflinks pipeline".
We will use the following example paramters for this pipeline:
```
--reads "s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data"
--genome "s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
--annot "s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.bed.gff"
```
Run using the AWSbatch executor we have an optional parameter:
- `--job-queue` (optional): the name of the job queue to use. If no valid queue is provided, cloudos-cli will use
the workspace default queue.

To list all available job queues in your workspace you can use:

In [None]:
cloudos queue list \
    --cloudos-url $CLOUDOS \
    --apikey $APIKEY \
    --workspace-id $WORKSPACE_ID

In [None]:
cat job_queue_list.csv

> Note: the job queue name that is visible in CloudOS and has to be used in combination with `--job-queue` parameter is the one in `label` field

A typical command to launch a Nextflow job like this using `cloudos-cli` would be:

In [None]:
cloudos job run \
    --cloudos-url $CLOUDOS \
    --apikey $APIKEY \
    --workspace-id $WORKSPACE_ID \
    --project-name $PROJECT \
    --job-name "Cufflinks-test" \
    --workflow-name "Cufflinks pipeline" \
    --parameter "reads=s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data" \
    --parameter "genome=s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" \
    --parameter "annot=s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.bed.gff" \
    --job-queue "job_queue_nextflow"

We can check the status of our submitted job just using the suggested command:

In [None]:
cloudos job status \
        --apikey $APIKEY \
        --cloudos-url $CLOUDOS \
        --job-id 645a52dbb60a3fd7b2884d7f

### Extra option: await for job completion

If we want to avoid constantly checking the job status, we can use the `--wait-completion` flag when launching the job. With this flag, `cloudos-cli` will inform about the job status until its completion.

```
# NOTE: this command can take more than 10 min to complete
cloudos job run \
    --cloudos-url $CLOUDOS \
    --apikey $APIKEY \
    --workspace-id $WORKSPACE_ID \
    --project-name $PROJECT \
    --job-name "Cufflinks-test-wait-completion" \
    --workflow-name "Cufflinks pipeline" \
    --parameter "reads=s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data" \
    --parameter "genome=s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" \
    --parameter "annot=s3://gel-lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.bed.gff" \
    --job-queue "job_queue_nextflow" \
    --wait-completion
```

> NOTE: this command is not actually executed in this session to avoid waiting > 10 min until job completion.

## cloudos-cli test case 2: launch and monitor a WDL job

In this second test case we will launch a WDL pipeline job: "wdl-tests". The main difference is the requirement of a working and started Cromwell
server in CloudOS. This can be managed automatically by `cloudos-cli`, so the job launch command will look
very similar to the previous one. Another important difference is that for WDL pipelines you should specify the used main file with ` --wdl-mainfile <mainfile>` and,
if required, an imports file with `--wdl-importsfile <importsfile>`.
For this example we will use the example job parameters provided with the `cloudos-cli` repo:

In [None]:
cat cloudos-cli/cloudos/examples/wdl.config

In [None]:
cloudos job run \
    --cloudos-url $CLOUDOS \
    --apikey $APIKEY \
    --workspace-id $WORKSPACE_ID \
    --project-name $PROJECT \
    --job-name "WDL-test" \
    --workflow-name "member-created-wdl" \
    --wdl-mainfile "hello.wdl" \
    --wdl-importsfile "imports.zip" \
    --job-config "cloudos-cli/cloudos/examples/wdl.config" \


Again, we can also check the job status using `cloudos-cli`:

In [None]:
cloudos job status \
        --apikey $APIKEY \
        --cloudos-url $CLOUDOS \
        --job-id 645a52e0b60a3fd7b2884f67

When your job is completed, you could stop the Cromwell server using the following command:

In [None]:
cloudos cromwell stop \
    --cloudos-url $CLOUDOS \
    --apikey $APIKEY \
    --workspace-id $WORKSPACE_ID