# Execution of tasks on remote host without shared file system

SoS allows you to execute tasks on remote hosts with their own file system. For example, you can execute a complex workflow mostly locally, but execute a few jobs on remote servers if they have more computing power, or if they have some software that cannot be installed locally.

With help from a few runtime options (options to `task`), SoS can

* Copy specified local files to the remote host, possibly to different directories
* Start a SoS task on the remote machine and wait for the completion of the task
* Copy results back from the remote host if the execution is successful


## System setup

### Set up public-key access to the remote host

Following any online tutorial, set up public-key access from your local machine to the remote host. If your public key does not work, check file permissions of `.ssh`, keys under `.ssh`, and `$HOME` in some cases. After setting up the server, make sure you can login without password using command

```
% ssh remote-host
```

### Install SoS and required software

You will need to install the latest version of sos (preferrably identical version between local and remote hosts), and the software you will need to run. Test it by logging to the remote machine with commands

```
% sos -h
```

### Make sure `$PATH` works in non-login mode

Commands that are available in login shell are not necessarily available during remote execution. Basically, remote execution through `ssh` invokes a non-interactive and non-login shell with basic `$PATH`. SoS tries to address this problem by executing commands through a login shell

```
% ssh remote-host "bash --login -c 'sos-command-to-run'"
```

but default `.bashrc` on the remote server might contain a line like

```
[ -z "$PS1" ] && return
```

that makes it exit when `bash` is not running interactively. This line has to be removed in order to make it work.

Now, fire command

```
% ssh remote-host "bash --login -c 'sos -h'"
```

from your local machine and see if `sos` can be invoked. Similarly, check if the command you would like to execute remotely can be executed in this way.

### Set up a local configuration file

This step is optional but is highly recommended. Basically you can save necessary information for each remote host that you would like to use in SoS configuration file so that you do not have to specify them one by one.

First, give your host an alias so that you do not have to specify the long URL each time. To do this execute command

```
% sos config --global --set hosts.monster.address dcdrlue8ee.yourdomain.com
```

where `monster` is a short alias and `dcdrlue8ee.yourdomain.com` is the complete address.

This commands writes to `~/.sos/config.yml` the following entry

```
$ cat ~/.sos/config.yml
hosts:
  monster:
    address: dcdrlue8ee.yourdomain.com
```

You can write to this file directly if you are familiar with YML format.

If your account name differ from the local and remote servers, the complete address should be `username@address`. In this example `john@dcdrlue8ee.yourdomain.com` if the remote server account is `john`.

### Set up `path_map`

`path_map` is a list of directory mappings between local and remote directories. For example, if you work locally on a Mac machine with home directory `/Users/myuser`, and the remote server is a Linux machine with home directory `/home/myuser`, you should define a `path_map` using command

```
% sos config --global --set hosts.monster.path_map /Users/myuser/:/home/myuser/
```

In this way, if the local data is `/User/myuser/projects/input.fastq`, the path will be translated to `/home/myuser/projects/input.fastq` during remote execution.

In more complicated cases where there are different directories, more than one mapping can be specified. For example, if you have directories under different volumes, you can map them differently using

```
% sos config --global --set hosts.monster.path_map \
    /Users/myuser/projects/:/home/myuser/scratch/projects/  \
    /Volumes/Resource:/home/myuser/resource \
    /Users/myuser:/home/myuser
```

This command will result in 

```
$ cat ~/.sos/config.yml
hosts:
  monster:
    address: dcdrlue8ee.yourdomain.com
    path_map:
    - /Users/myuser/projects/:/home/myuser/scratch/projects/
    - /Volumes/Resource:/home/myuser/resource
    - /Users/myuser:/home/myuser
```

and will map your local direcrories as follows

```
~/projects/data ==> /home/myuser/scratch/projects/data
/Volumes/Resources/hg19.fasta ==> /home/myuser/resource/hg19.fasta
~/myscript ==> /home/myuser/myscript.py
```

Note that

1. SoS expands local directories to absolute path before applying `path_map`.
2. SoS applies path maps at the other in which they are specified, so you should specify general mappings after more specific ones. 


## Copying files back and forth (`task` options `to_host` and `from_host`)

Now that you have your machine configured, you should try to copy some files and see if they work correctly. File copy is specified with options `to_host` and `from_host`. You can test these options using simple SoS steps such as (replace filenames with files you have, of course),

```
[1]
task: 
    on_host: 'monster',
    to_host: ['~/projects/data/test1', '/Volumes/Resources/hg19.fasta'],
    from_host: '~/projects/data/test1.res'
run:
    echo "Hello, World"
```

Note that:

1. Option `on_host` specifies the host to connect, and allows SoS to retrieve `path_map` from configuration file.
2. Option `to_host` specifies local files or directories that will be copied to the remote host (using `rsync`).
3 Option `from_host` specifies **local files** that needs to be copied from the remote host. SoS will use `path_map` to determine path to the corresponding remote file to be copied.

## Translate your task to be executed on remote host (`task` option `map_vars`)

Each task has an `environment` that contains variables that will be used to, for example, compose scripts to be executed. Even if the task will be executed remotely, you should write your task as if it is executed locally. For example, you might have a script that generates a `STAR` index from fasta file. You should have all these files available locally and write the task as:

```
depends:      hg19_fasta
run:
    STAR \
		--runThreadN 8 \
		--runMode genomeGenerate \
		--genomeDir ${hg19_star_index} \
		--genomeFastaFiles ${hg19_fasta} \
		--sjdbGTFfile ${hg19_genes_gtf} \
		--sjdbOverhang 100
```

Now, to make the script execute remotely, you should tell SoS which variables need to be translated to remote path. This can be done using option `map_vars`


```
depends:      hg19_fasta

task:     on_host='monster', map_vars=['hg19_fasta', 'hg19_star_index', 'hg19_genes_gtf']
run:
    STAR \
		--runThreadN 8 \
		--runMode genomeGenerate \
		--genomeDir ${hg19_star_index} \
		--genomeFastaFiles ${hg19_fasta} \
		--sjdbGTFfile ${hg19_genes_gtf} \
		--sjdbOverhang 100
```

This option will **change the runtime environment** of the task that reconfigures file paths so that the script can be executed using variables corresponding to remote servers.

Note that 

1. Variables such as `_input`, `input`, `_output` are automatically translated so you only need to specify non-system variables.
2. You can hard-code path in remote host, but that will make your script host-dependent. It is therefore highly recommended that you **write your script in local paths** and let SoS do the conversion so that you do not have to change the script itself if you are switching to another host with differnt `path_map`.

## Running task remotely

With all the pieces putting together, you can now execute the task on the remote host using `task` options

```sos
depends:      hg19_fasta

task:     on_host='monster', map_vars=['hg19_fasta', 'hg19_star_index', 'hg19_genes_gtf'],
          to_host=[hg19_fasta, hg19_genes_gtf], from_host=hg19_star_index

run:
    STAR \
		--runThreadN 8 \
		--runMode genomeGenerate \
		--genomeDir ${hg19_star_index} \
		--genomeFastaFiles ${hg19_fasta} \
		--sjdbGTFfile ${hg19_genes_gtf} \
		--sjdbOverhang 100
```

## Advanced usages

### `path_map` option

You can specify `path_map` as a `task` option. This allows you to write everything in the script clearly and you do not have to use any configuration file. For example,

```
task:     on_host='dcdrlue8ee.yourdomain.com', path_map='/Users/myuser:/home/myuser',
          other_options...
```

### `send_cmd`, `received_cmd`, `execute_cmd`

SoS uses `rsync` command to exchange files between hosts, and use `ssh` to execute command. If the default commands do not work for your configuration (e.g. if you do not have `rsync` and need to use `scp`, you can 

1. Use option `-v3` to display the exact command used to transfer files and execute commands

2. Define options `send_cmd`, `received_cmd` and `execute_cmd` for your particular configuration. These variables should be defined with `${source}` and `${dest}` which will be replaced by source and destination filenames for each file.

