<a href="https://colab.research.google.com/github/rosafilgueira/Workflows_Seminar/blob/main/cwl_tutorial_2024.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1.1 - <font color='#097969'>Opening Statement<font>


**Purpose**

This tutorial demonstrates how to write workflows using a workflow language. An introduction to workflow theory, implementation using the CWL language, and execution on a compute environment will be demonstrated. By the end of the session, participants will be able to write and execute basic workflows in CWL. Skills learned will be easily transferable to other workflow languages (eg Nextflow) participants may want to learn in the future.







**Goals**

The workshop is split into 2 sections: *Tools* and *Workflows*.

During the ***tools*** section, we will learn the mechanics of CWL. <br>
We will learn how to write YAML, and how tools are defined in CWL. <br>
As we go, we will write a CWL tool wrapper for the `cutadapt` software.

During the ***workflows*** section, we will extend our knowledge to workflows.  <br>

By the end of this tutorial, you will be able to:

- Understand the main components of a workflow
- Understand the main features of CWL
- Write basic workflows in CWL
- Run CWL workflows using an execution engine
- Have the skills to continue your own learning

**Intended Knowledge**

 This tutorial will involve writing workflows using CWL and running command line tools.

The following is highly recommended:

- Basic familiarity with a programming language (ie Python, R, Javascript)


### 1.2 - <font color='#097969'>Google Colab<font>

Workflows can be developed using any IDE or environment you are familiar with. VSC is common for workflow development because it is popular, and has plugins for various workflow languages.

Today, we are using Google Colab.
[Colab](https://research.google.com/colaboratory/faq.html) is a popular service which provides an [interactive notebook](https://jupyter.org/) and a [runtime](https://research.google.com/colaboratory/faq.html#executed-code).
The notebook allows us to write text and code, while the runtime provides some compute resources to execute our code. It is a fantastic resource for learning so we will use this environment today.

TODO mention that CLI is usually used, magic here to reduce complexity

**The notebook (main window)** allows you to read instructions, and write / run CWL.  
This is where will spend most of our time.

**The sidebar** allows you to perform basic tasks. You can view the notebook outline, search for text, or manipulate files in our local directory. Click the Explorer button to see which files are in this repository. If a file is opened, it will appear as a new panel on the right of the notebook.

**Runtime** colab provides you with a free VM environment which runs your code in the background. <br>It has 1-2 CPU cores, a few gb MEM, and 100 GB disk. These are more than adequate for today. The runtime will timeout after 90 minutes of inactivity.  

**Files can be saved** to your local machine from the file browser (sidebar -> files). <br>
Hover over a file, click on the three dots `⋮` then select `Download`.



<br>

**Fixing Mistakes**

If you make a mistake, thats ok!

To undo editing a CWL cell, `ctrl + z` works as normal. <br>
Make sure you're actively editing the cell - double click on it if needed.

If you deleted a cell from the notebook, `ctrl + m + z` should do the trick.

If things aren't recoverable, refreshing the page and reconnecting to a new runtime should reset everything. <br>
In this case, make sure you re-acquire the data and re-install the software environment (see the [setup section](#setup)).


<br>

**CWL Cells**

This material will ask the user to write and run CWL. <br>
When running CWL, user inputs may be needed.



<br>

**Editing CWL Cells**

You can edit a cell containing CWL code by `double-clicking`.<br>
Once you are done, `click away` or `shift + enter` to stop editing. <br>
Make sure you keep the three backticks \`\`\` at the top and bottom of the cell!


<figure>
<img src="https://raw.githubusercontent.com/GraceAHall/cwl-workshop/main/media/colab3.gif" width="600">
<figcaption>Fig 1. gif illustrating how to edit a CWL cell. </figcaption>
</figure>


<br>

**Running CWL Cells**



To run a CWL code cell, press the ▶ button below.

<figure>
<img src="https://raw.githubusercontent.com/GraceAHall/cwl-workshop/main/media/colab1.gif" width="600">
<figcaption>Fig 2. gif illustrating how to run a CWL cell. </figcaption>
</figure>


<br>

**Input Value Form**



The values for inputs can also be changed. <br>This will be used to test our CWL code given different inputs.

<figure>
<img src="https://raw.githubusercontent.com/GraceAHall/cwl-workshop/main/media/colab2.gif" width="600">
<figcaption>Fig 3. gif illustrating how to change form values. </figcaption>
</figure>


If **Show Code** was clicked on an CWL cell run form, it can be hidden again using *right click -> form -> hide code*

<figure>
<img src="https://raw.githubusercontent.com/GraceAHall/cwl-workshop/main/media/colab5.gif" width="600">
<figcaption>Fig 4. gif illustrating how to hide form code. </figcaption>
</figure>


<br>


### 1.3 - <font color='#097969'>Background<font>


Computational workflows have become a core aspect of scientific research. Most researchers will need to run a repeatable analysis involving multiple software tools at some stage in their career. For this reason, Research Software Engineer (RSE) skills involving developing, executing, or maintaining workflows are becoming increasingly important.  



***History***

Workflows are not new!

They can be written in almost any computer language. <br>
For example, a workflow can be written using a shell script, or python. <br>
The core ingredient of what makes a workflow is simply running multiple programs to provide a more complete analysis.

Today, there are more than 150 workflow systems designed to execute workflows.
They are continually evolving as new data processing and internet technologies arise.

In the last decade, <font color="red">workflow languages & systems</font> have gained increasing popularity. <br>
The choice of language / system depends on multiple factors. Whether a UI or code is preferred, which software programs are executed in the workflow, the hardware required, and even the privacy requirements of the data.

Galaxy, CWL, WDL and Nextflow are currently very popular.  While Galaxy is the easiest system to get started with, it is not suitable for all workflows and all data. In these cases, researchers may need to use a dedicated workflow language; for example, CWL.






***Tools and Workflows***

<font color="red">Workflows</font> are computational pipelines designed to perform analysis. <br> They accept input data, do some processing, then return outputs. <br>
For processing, workflows use ***tools*** to perform the tasks needed.

<font color="red">Tools</font> perform a specific task in a workflow. <br>
An example in bioinformatics is [read alignment](https://www.ebi.ac.uk/training/online/courses/functional-genomics-ii-common-technologies-and-data-analysis-methods/rna-sequencing/performing-a-rna-seq-experiment/data-analysis/read-mapping-or-alignment/) to a reference genome. <br>
Tools accept inputs, run the software required, then return outputs back to the workflow.


<figure>
<img src="https://raw.githubusercontent.com/GraceAHall/cwl-workshop/main/media/tools_and_workflows.png" width="700px">
<figcaption>Fig 5. Relationship between tools, workflows, and workflow languages. </figcaption>
</figure>



***Workflow Languages & Workflow Systems***

A well written workflow should be easy for someone else to pick up and run without detailed instruction. <br>
This is harder than it sounds! Different RSEs write code differently, and workflows can be tricky to execute when you consider the filesystem, software environment, and running on different machines. The goal of workflow languages & workflow systems is to solve these issues.

<font color="red">Workflow languages</font> are specially designed for defining computational workflows.<br> They provide a standardised syntax to define workflow components and their behaviour, as well as the movement of data through the pipeline.

Workflow languages strive to achieve two goals: simplicity & expressiveness.
In terms of simplicity - workflows should be easy to read and write. This is so the development process is smooth, and so others can read & understand your code. Expressiveness refers to the flexibility of the language - in short, can users create the exact workflow they have in mind? If not, they may be turned away.

<font color="red">Workflow systems</font> simplify the execution of workflows.
They handle many complex tasks associated with workflows automatically. For example, they can manage the filesystem and software environment while executing tools and workflows, and can even cache intermediate results so workflows can be resumed later in the case of an error or crash.

Due to the above, workflow languages & their systems have become fantastic way to deliver FAIR (Findable, Accessible, Interoperable, Reusable) research.  RSEs can more easily ***share*** workflow definitions because of the simplified & structured language. The finished workflows are more ***accessible*** for others due to the workflow system handling execution, and recent adoption of containerisation and cloud computing features. Collectively, the above, results in workflow languages & systems being a great choice when developing computational pipelines.












***In This Training Material...***

This training material aims to demonstrate how CWL tools and workflows are written.

In the first section, we will learn how to write CWL tools. <br>

In the second section, we will focus on CWL workflows. <br>


<br>

---

# 2 - Setup

Run each of the following cells. <br>
These are needed to get the input data, and set up the software environment. <br>
If your ***runtime disconnects*** at any point, you may need to re-run these.







step 1: runtime

- To execute code, make sure your runtime is connected.<br>
- In the top right hand corner of the screen, click `Connect` to acquire a runtime.  <br>
- Once connected, you should see a RAM and DISK gauge.

step 2: data + software  (below)

- Once you have a runtime, we need to *get the data* and *install some software*.
- The data is input data we will use to test our tool and workflow definitions. <br>
It is cloned from a github repository.
- The software is required to run CWL and execute the tools in our workflow. <br>We will need `java + cwltool` to run CWL, and will need `cutadapt`, `bwa`, `samtools` and `freebayes` to execute our pipeline.


In [None]:
#@title Get Session and Exercise Data
%%capture
# cwl files for learning & input data
!sudo rm -r sample_data   # remove default google colab 'sample_data'
!sudo rm -r session       # remove 'session' folder if exists
!sudo rm -r exercise
!git clone https://github.com/GraceAHall/cwl-workshop session   # clone  files
!git clone https://github.com/screx/cwl-tutorial.git exercise  # clone files for the Hands-On-Exercise


In [None]:
#@title Install CWL

%%capture
### thanks to Sanjeev v (https://stackoverflow.com/questions/51287258/how-can-i-use-java-in-google-colab)

# cwltool
!pip install cwltool

# cwlref-runner
!pip install cwlref-runner

# some magic so we can run cells as cwl
from google.colab import _message

def write_cell_above_to_file(search_term, filename):
  cell = get_cell_above(search_term)
  code_block = get_cell_code_block(cell)
  with open(filename, 'w') as fp:
    fp.writelines(code_block)

def get_cell_above(search_term):
  # Load the notebook JSON.
  nb = _message.blocking_request('get_ipynb')

  # Search for current markdown cell (using search term)
  for i, cell in enumerate(nb['ipynb']['cells']):
    if search_term in ''.join(cell['source']):
      return nb['ipynb']['cells'][i - 1]

def get_cell_code_block(cell):
  # Get the code block in previous cell
  cell_lines = cell['source']
  code_block = []
  in_block = False
  for ln in cell_lines:
    if '```' in ln:
      in_block = not in_block  # boolean switch
    else:
      if in_block:
        code_block.append(ln)
  return code_block


---

# 3 - Tools

In this section, we will build a simple tool using `echo`.

If you're unfamiliar: `echo` prints any supplied text to the console. <br>
The goal will be to demonstrate how tools work using ***inputs***, the ***command***, and ***outputs***.

>***Note***<br>
>Don't worry about the exact syntax for now - it's explained in detail below!






### 3.1 - <font color='#097969'>What are Tools?<font>

A ***tool*** represents a task to perform.

They are the smallest unit of computation in a workflow. <br>
For example - a variant calling workflow might use `bwa mem` to align reads.<br>
In this case, we need a ***tool*** which encapsulates `bwa mem` so it can be used in our workflow.


<figure>
<img src="https://raw.githubusercontent.com/GraceAHall/cwl-workshop/main/media/tool1.png" width="60%">
<figcaption>Fig 6. Diagram of a tool. </figcaption>
</figure>



Tools consist of 3 main components.
- <font color='red'>inputs</font> - data supplied from outside world
- <font color='red'>command</font> - the dynamically generated CLI command to execute
- <font color='red'>outputs</font> - data we will return to outside world




Keep in mind - the command is ***separate*** from the inputs and outputs. <br>Tool inputs ***are not*** command inputs, and tool outputs ***are not*** command outputs.




This structure means Tools are encapsulated - they ***do not*** know anything about the outside world. <br>
Because they don't depend on anything external, they are reusable.
- A single tool can be used *multiple times* within a workflow
- A single tool can be used in *multiple different* workflows



<figure>
<img src="https://raw.githubusercontent.com/GraceAHall/cwl-workshop/main/media/tool2.png" width="60%">
<figcaption>Fig 7. Dataflow before, during, and post command execution. </figcaption>
</figure>



When a tool is run, there are multiple stages.

Gather inputs
- At the start, each ***input*** is fed a value.  

Execute command
- After all necessary inputs have a value, a ***command*** is generated.
- The command is then run, performing the desired processing.

Collect outputs
- The command may have produced some ***outputs*** during runtime.
- If we have specified some of these for collection, they will be gathered as outputs.
- These outputs can be returned to the user (after the workflow has finished), or can be passed to other tools as inputs.


<br>

### 3.2 - <font color='#097969'>Our First CWL Tool<font>

Here is an example of the most basic tool we can write in CWL.

It has no inputs, no outputs, and the final command simply evaluates to `echo`.




```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: [echo, hello]
inputs: []
outputs: []
```




In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:overview1'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

!cwltool '/tmp/tmpfile.cwl'

Congratulations! ✨✨✨

You just ran your first CWL tool.



Let's explain a little about the code you see.

- The first two lines are <font color='red'>metadata</font>. These specify this is a CWL `CommandLineTool` (tool), and the CWL version is v1.2.
- The `baseCommand` is an array of strings (text) which will appear at the start of the <font color='red'>command</font>.
- `inputs` is a list of the <font color='red'>inputs</font> to this tool. We use them to customise our command.
- `outputs` is a list of the <font color='red'>outputs</font> this tool produces. We tell CWL to collect specific outputs the software produces during execution.




>***Note***<br>
>CWL has 3 release versions - CWL v1.0, v1.1 and v1.2. <br>
>The current version is v1.2 and should be used. <br>
>Older versions may have slightly different syntax.

### 3.2.2 - <font color='#097969'> Another way to implement the Echo tool: echo.cwl <font>


Here we are going to implement an alternative method for implementing `echo tool' in CWL, using a YAML file to specify the message that we want to print out using echo.

The YAML file store the actual values that later CWL worfkow will use ( in inputs descriptions).

```
# helloworld.yml

message: Hello, world!
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:overview1.2'
write_cell_above_to_file(search_term, '/tmp/helloworld.yml')

 Creating the CWL workflow - echo.cwl

```
#!/usr/bin/ cwltool

cwlVersion: v1.0
class: CommandLineTool #this says that the CWL file describes a command line tool
baseCommand: echo #this says we are using the echo command

inputs:
  message: # the unique identifier to the input
    type: string # the type of input parameter that is passed in
    inputBinding:
        position: 1
    label: message to print to stdout

outputs: [] # here there are no outputs as we are simply printing to stdout
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:overview1.3'
write_cell_above_to_file(search_term, '/tmp/echo.cwl')

!cwltool '/tmp/echo.cwl'  '/tmp/helloworld.yml'




For the echo tool we see that it takes one input parameter message and its type is string and we can define a YAML file, helloworld.yml

### 3.3 - <font color='#097969'>BaseCommand<font>

Let's focus on the `baseCommand`.

```
baseCommand: echo              # single item
baseCommand:                   # multiple items (list format)
  - program1
  - program2        
baseCommand: [program1, program2]        # multiple items (bracket format)
baseCommand: ["echo", "hi"]    # quoting is fine
baseCommand: [echo, hello] # without quoting is fine too
```

The `baseCommand` is a array of strings which appear at the start of the command. <br>
Above, we see that either *list format* or *bracket format* is allowed. <br>
These formats are explained in the [Tools - YAML](#tools_-_yaml) section.

The best approach is to use `[]`. <br>
This is the most clear, as it visually shows that the `baseCommand` is an *array of strings.*<br>

The exception is where `baseCommand` is a single item.<br>
In this case, we can ignore using `[]` because CWL is smart enough to convert it to a single item array. <br>
This is what we are seeing in the `baseCommand: echo` example above.

Run the CWL cell below to see `baseCommand` in action with multiple items.

```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo
inputs: []
outputs: []
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:overview1.1'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

!cwltool '/tmp/tmpfile.cwl'

<br>

### 3.4 - <font color='#097969'>CWL Output Log<font>

Before we continue, lets look at that CWL output. <br>
The CWL output provides useful information, summarised in the example below.
```
INFO [job /tmp/tmpfile.cwl] /tmp/4g1ax89r$ echo \    # command which was executed  
    hello \                                          # command which was executed
    there!                                           # command which was executed
hello there!                                         # stdout
INFO [job /tmp/tmpfile.cwl] completed success        # tool status
{}                                                   # tool outputs
INFO Final process status is success                 # final status
```
The CWL output spreads the ***command*** over multiple lines using `\`.<br>
In the text above, we can infer the ***command*** was `echo hello there!`, <br>
***stdout*** was `hello there!`.  

For the rest of this section, keep an eye on the *command* and *stdout*.

<br>

### 3.5 - <font color='#097969'>Adding an Input<font>

Currently, our CWL tool doesn't do anything.

The command line simply evaluates to `echo` - theres nothing to actually print! <br>
Let's add a new ***input*** to our tool definition so we can provide some text after `echo`. <br>
Our new input's name is `message`, it accepts text (`type: string`), and appears directly after the `baseCommand` (`position: 1`).

```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo

inputs:
  message:                        # new
    type: string                  # new
    inputBinding:                 # new
      position: 1                 # new

outputs: []
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:overview2'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

import os
message = 'Hello there!' #@param {type:"string"}
os.environ['MSG'] = message

!cwltool '/tmp/tmpfile.cwl' --message "$MSG"

We can confirm this has been successful by looking at the CWL output.
- The ***command*** (line 3) was `echo 'Hello there!'`
- The ***stdout*** (line 4) was `Hello there!`


⚡ <font color='Orange'>**TASK**</font> <br>
>Change the text supplied to `message` and re-run the cell above.<br>
>Does the output change?


<br>

### 3.6 - <font color='#097969'>Adding an Output<font>

The final thing to add is an output.

Except in special cases, tools will always have ***inputs*** and ***outputs*** <br>
Currently, we are not collecting any outputs from this tool (regardless of anything being produced!)<br>
Lets collect the text being printed to `stdout` as an output.



```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo

inputs:
  message:            
    type: string      
    inputBinding:     
      position: 1     

outputs:
  message_out:                 # new
    type: stdout               # new
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:overview3'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

import os
message = 'Hello there!' #@param {type:"string"}
os.environ['MSG'] = message

!cwltool '/tmp/tmpfile.cwl' --message "$MSG"


Instead of a blank `{}`, we now see some information about our ***tool output***.<br>
We can see that there is an output called `message_out` which is a `File` amongst other info. <br>
This confirms we have collected our first output, which we got from `stdout`.




Now that we have witnessed a basic ***CWL tool***, and have gotten a little familiar with its ***syntax***, we will start learning how to write these tools for ourselves. <br>

<br>

---

# 4 - Tools: YAML

>***Note*** - If you are familiar with YAML, feel free to skip this section










Before diving in, lets explain the ***format*** of the CWL code above.

CWL is written with [YAML](https://learnxinyminutes.com/docs/yaml/). It is highly similar to JSON, but easier to work with because it is more human readable. <br>
YAML is based on `key: value` pairs, where each `key` is a string (text) and each `value` is a primitive type, an array, or an object.

<br>

### 4.1 - <font color='#097969'>Key Value Pairs<font>

YAML is built on `key: value` pairs.

YAML `keys` are always ***strings*** (text), while the associated `value` can take multiple forms. <br>
Have a look at the `key: value` pairs below.<br>
Each `key` is a meaningful name, and each `value` is the value that name takes.<br>
It makes sense to read that this person's name is `Grace`, their age is `111` years old, and their height is `184.5cm`.

```
name:   Grace         
age:    111       
height: 184.5     
```

<br>

### 4.2 - <font color='#097969'>Arrays<font>

Rather than a single `value`, sometimes a key should be associated with multiple `values`. <br>
In this case we can use an ***array***. <br>
If you're familiar with python, an array is similar to a list - the difference is that each item in an array must have the same datatype.

```
hobbies:
  - programming
  - fashion
  - anime
```

The YAML above states that there are 3 hobbies. <br>
Each hobby appears on a new line, and starts with a dash `-`.  

> ***Note the indentation!*** <br>
By indenting 2 spaces, we specify this element ***belongs*** to the element above.

This structure is the most common format used in CWL, but arrays can also be specified in-line if desired:

```
hobbies: [programming, fashion, anime]
```

This format is ok here, but may result in bad readability for other situations. <br>
It is generally suggested to use the ***indented format***, except in specific cases.


<br>

### 4.3 - <font color='#097969'>Objects<font>

The final type a `value` can take is an ***object***. <br>
An object is an entity with its own `key: value` pairs. These key value pairs belong to that object.  <br>
In the example below, `friends` and `josie` are both objects.
- `friends` is an object because the key `josie` is contained inside.
- `josie` is an object because the keys `name`, `age`, `height` are contained inside.

```
friends:
  josie:
    name:   josie
    age:    21
    height: 175.7
```

Remember that by dropping to a new line and indenting 2 spaces, we specify this element ***belongs*** to the element above. <br>
Therefore, `josie` is a sub-object of `friends`, and `name`, `age`, `height` belong to `josie`!


Always pay close attention to the level of indentation. Mistakes in indentation cause the YAML to have different meaning. <br>
Try to identify the mistake in the YAML below:

```
friends:
  josie:
    name:   josie
    age:    21
  height: 175.7
```

The indentation of `height` is wrong! it should be shifted right 2 spaces.  <br>
The code above reads that there are 2 `friends` - `josie` and `height`, which is a little odd!

<br>

### 4.4 - <font color='#097969'>Be careful of quotes!<font>

Types in YAML are auto-interpreted. <br>
Most individual values will be interpreted as strings, unless they look like numbers (integer or float). <br>
For example - the `baseCommand` of a CWL tool must be a single string, or an array of strings.

```
baseCommand: echo             # ok!  (single item)
baseCommand: "echo hi"          # error  (multiple items, not in an array)
baseCommand: [echo, "42"]     # ok!  (echo will be interpreted as string)
baseCommand: ["echo", "42"]   # ok!  (each item is string)
baseCommand: [echo, 42]       # error  (42 is interpreted as int)
```

In the first example, `42` is interpreted as an integer, which would cause a CWL error. <br>
In the second example, `"42"` is ok because we are specifying `42` is a string using quotes.



<br>

### 5.2 - <font color='#097969'>The Basics<font>

Inputs provide a gateway to outside world.

They allow us to provide some ***data*** to the tool, which can then be used within the command.

```
inputs:
  message:
    type: string            
```

Tool inputs are added as sub-objects inside the `inputs` key.

Inputs must have a unique name (in this case `message`). <br>
Inputs must also specify the `type` of data they will recieve (in this case `string`).





***New concept:*** `types`
> Each tool input / output must specify a ***type***.<br>
> This lets CWL check the data supplied to this input during runtime looks correct. <br>
> Common types include `string`, `integer`, `float`, `boolean` and `file`.

Lets explore how to create and manipulate inputs using the `echo` command.

Below is an `echo` tool which has an input called `message`.<br>
This `message` input should accept a `string`, so we can print something to the console with echo. <br>
The command we wish to produce is `echo <message>`.

```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo

inputs:
  message:                       
    type: string                 

outputs: []
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:inputBinding0'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

import os
message = 'hi there!' #@param {type:"string"}
os.environ['MSG'] = message

!cwltool '/tmp/tmpfile.cwl' --message "$MSG"


😐 Hmm... that worked, but didn't go as planned!

Our message isn't appearing in the ***command*** (line 3) and isn't printed to ***stdout*** (line 4). <br>
The reason - <font color='red'>tool inputs don't automatically appear in the command!</font> <br>
To make them appear in the command, we need to specify an `inputBinding`.





<br>

### 5.3 - <font color='#097969'>InputBinding<font>

An `inputBinding` customises how something is presented in the ***command***.

```
baseCommand: echo

inputs:
  message:                 
    type: string           
    inputBinding:          
      position: 1
      prefix: --my-message
```

The `baseCommand` always appears at the start, whereas each tool `input` appears according to its ***inputBinding***. <br>
In the example above, the final command would be `echo --my-message <message>`

An `inputBinding` has multiple attributes, but for now lets just focus on `position`.


<br>

In the example below, it becomes clear that `arguments` is an ***array***. <br>
Each `argument` is declared using a `dash -`, and any lines including and below the dash are part of that argument. <br>
To supply a value, `valueFrom` is used.

`prefix`, `position`, `separate` can also be used, and behave in the same way as `inputs`. <br>
The ***order*** of these attributes does not matter.





```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo

inputs: []

arguments:
  - valueFrom: "one"
    position: 1
    prefix: --argument

  - prefix: --argument
    valueFrom: two
    position: 2

outputs: []
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:argument2'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

!cwltool '/tmp/tmpfile.cwl'

When supplying a value using `valueFrom`, make sure it is interpreted as a string!<br>
If anything else, CWL will report an error. <br>
This can be done by wrapping the value in quotes, or allowing CWL to auto-interpret the type as string where possible.  

```
arguments:
  - valueFrom: 42                # not ok! interpreted as int
  - valueFrom: "42"              # ok! quoted = string
```

>***Note***<br>
>`valueFrom` can be used in a number of places, not just `arguments`.

hmmmm, `position`... `prefix`... that sounds familiar.

That's because each `argument` is actually just a plain old `inputBinding`!
<br>
They don't have a name, nor a `type`, but still need an `inputBinding` to appear in the command.

🔥 <font color='red'>**CHALLENGE**</font>

> ```
arguments:
  - valueFrom: "200"
    prefix: --max-length
  - prefix: --min-length
    valueFrom: "100"
  - valueFrom: "30"
    prefix: --quality-cutoff
  - prefix: --output
    valueFrom: reads_1.trimmed.fq
    position: 1
  - position: 2
    prefix: --paired-output
    valueFrom: reads_2.trimmed.fq    
```

In [None]:
#@markdown How many `arguments` are in the code snippet above?

correct_ans = 5
user_ans = 0 #@param {type:"integer"}

def check_results(correct_answers, user_answers):
  for user, correct in zip(correct_answers, user_answers):
    if user != correct:
      return '😐 try again'
  return '✅ correct!'

print(check_results([correct_ans], [user_ans]))

### 5.9 - <font color='#097969'>Default Values (extension)<font>

`Inputs` can have a ***default*** value. <br>
If no value is supplied to the `input` during execution, the default will take its place in the command.



This is different from an ***optional*** tool input:

- If an optional input is not supplied a value, it will be missing from the command.
- If an input with a default is not supplied a value, it will still be present in the command.



A default value for an input is specified using `default: <default>`.<br>
`default` is an attribute of the `input` (same as `type`), rather than the input's `inputBinding`.


```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo

inputs:
  threads:
    type: int
    default: 8
    inputBinding:
      prefix: --threads

outputs: []
```



In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:inputDefault1'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

null = None
threads = 3 #@param {type:"raw"}

import os
if isinstance(threads, int):
  os.environ['THREADS'] = '--threads ' + str(threads)
else:
  os.environ['THREADS'] = ''

!cwltool '/tmp/tmpfile.cwl' $THREADS

What about ***optional*** inputs with a ***default***?

This is fine! That said, it is a little redundant.  <br>
It will behave exactly the same regardless of whether it is marked ***optional*** or not. <br>
This is because ***default*** means the input will always have a value, even if we don't supply one.  

Run the cell below to check.

```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo

inputs:
  int_optional:
    type: int?
    inputBinding:
      prefix: --int-optional

  int_default:
    type: int
    default: 8
    inputBinding:
      prefix: --int-default

  int_optional_default:
    type: int?
    default: 8
    inputBinding:
      prefix: --int-optional-default

outputs: []
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:inputDefault2'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

null = None
int_optional = 3 #@param {type:"raw"}
int_default = 2 #@param {type:"raw"}
int_optional_default = 1 #@param {type:"raw"}

import os
if isinstance(int_optional, int):
  os.environ['OPTTHREADS'] = '--int_optional ' + str(int_optional)
else:
  os.environ['OPTTHREADS'] = ''

if isinstance(int_default, int):
  os.environ['DEFTHREADS'] = '--int_default ' + str(int_default)
else:
  os.environ['DEFTHREADS'] = ''

if isinstance(int_optional_default, int):
  os.environ['OPTDEFTHREADS'] = '--int_optional_default ' + str(int_optional_default)
else:
  os.environ['OPTDEFTHREADS'] = ''

!cwltool '/tmp/tmpfile.cwl' $OPTTHREADS $DEFTHREADS $OPTDEFTHREADS

### 5.10 - <font color='#097969'>Flags (extension)<font>

Some software arguments don't have a value. <br>
They are either 'on' or 'off' depending on if they are present in the command or not.  <br>
For example, `--conservative`,  or `--compress-output`.

This type of argument is also known as a boolean 'flag'.  <br>
Their `type` is `boolean`, and can either be `true` or `false`. <br>
If set to `true`, the flag appears in the command line. If `false`, the flag does not appear.

We can see them in action below.

>***NOTE***
>
>Boolean flags are `false` by default

```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo

inputs:
  conservative:
    type: boolean
    default: false
    inputBinding:
      prefix: --conservative

outputs: []
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:inputFlag1'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

conservative = 'true' #@param ["false", "true"]

import os
if conservative == 'true':
  os.environ['CONS'] = '--conservative'
else:
  os.environ['CONS'] = ''

!cwltool '/tmp/tmpfile.cwl' $CONS

⚡ <font color='Orange'>**TASK**</font> <br>
>Change the value for `conservative` from `true` to `false` in the form above then re-run the cell.<br>
>Does `--conservative` still appear?

It is often convenient to mark flags as optional. <br>
Marking them optional result in more concise code, and means we don't need to always provide a `true` / `false` value.



```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo

inputs:
  conservative:
    type: boolean?
    inputBinding:
      prefix: --conservative

outputs: []
```

In [None]:
#@markdown <font size='4'>Run the cell above</font>
search_term = 'id:inputFlag2'
write_cell_above_to_file(search_term, '/tmp/tmpfile.cwl')

conservative = "null" #@param ["null", "false", "true"]

import os
if conservative == "true":
  os.environ['CONS'] = '--conservative'
else:
  os.environ['CONS'] = ''

!cwltool '/tmp/tmpfile.cwl' $CONS

### 5.11 - <font color='#097969'>Parameter References (extension)<font>

<font color="red">Parameter references</font> allow us to access ***properties*** of objects within our CWL tool definition. <br>
The most simple way to explain is by looking at some CWL.












In certain locations in a CWL tool / workflow definition, we can <font color="red">refer</font> to other a CWL <font color="red">parameters</font> using their names.

`$()` is the syntax for these <font color="red">parameter references</font>.

When CWL sees `$()`, it knows that the text inside refers to a variable. <br>
In the case above, our variable is `$(inputs.infile.basename)`.







We can understand what this variable means by looking at YAML.

```
friends:
  josie:
    name:   josie
    age:    21
    height: 175.7
```

Data within YAML files can be accessed using bracket or dot notation.

Taking the above as an example, we can access data about `josie` via
- bracket notation - `friends['josie']` or
- dot notation - `friends.josie`

To further illustrate - if we wanted to know `josie`'s age, we could write `friends.josie.age`.


CWL allows us to access this information in a similar way.<br>
If we want to access attributes of our `inputs`, we can write `inputs.<name>` where `<name>` is the input we are interested in.

If we see `$(inputs.infile.basename)`, we know that we are accessing the `infile` input.




***file.basename***

At this point, there is still a question - what is `basename`? <br>
We can't see `basename` in the CWL definition - so what is it?

`basename` is a special attribute of `File` type data.

It is created by CWL at runtime, by looking at the data which was passed to the `File`. <br>
`File` data has a number of these hidden attributes.





---

# 6 - Workflows

### 6.1 - <font color='#097969'>Overview<font>

Workflows combine multiple tools to perform a more comprehensive analysis.

Rather than just running a single tool, we often want to run a group of tools in a pipeline.

Take variant calling as an example - we need to perform read QC, align reads to a reference genome, then identify variants. <br>
These three main steps can be combined end to end in a workflow. <br>
Our workflow <font color='red'>inputs</font> are raw reads and a reference genome, and our workflow <font color='red'>outputs</font> are the identified variants.

***Files & the File Explorer***

From this point, we will start using Google Colab differently. <br>
Previously we have been focusing on the notebook. From here, we will use colab as an IDE. <br>

Instead of writing CWL inside notebook cells, we will switch to writing CWL as proper files. <br>
We will also be looking at the files produced when we run tools / workflows.  

To start, open the <font color='red'>file explorer</font> (files) in the left-hand sidebar. <br>
You should see a folder called `session` and a few files. <br>



***Running CWL via CLI***

Up to this point, the mechanism of running CWL was hidden to avoid complexity. Now it's time to learn this for ourselves.

An <font color='red'>execution engine</font> is used to run CWL tool and workflow files. From here, we will be responsible for creating / editing CWL files, and executing those files using the `cwltool` engine.

In the paragraph above covering *Files & the File Explorer*, we could see that there is a `cutadapt` tool definition at *session/variants/tools/cutadapt.cwl*.

Read & run the CLI commands below to see how ***tools*** and ***workflows*** can be executed using the `cwltool` engine.

>***Note***<br>
>If curious, you can open these files using the file explorer. <br>
>If the workflow doesn't make sense, thats ok! It's all explained below.

In [None]:
# TOOL
#cwltool <tool.cwl> [inputs]
!cwltool session/examples/basics/echo.cwl --message "hello there!"

In [None]:
# WORKFLOW
#cwltool <workflow.cwl> [inputs]
!cwltool session/examples/basics/wf_outputs.cwl --greeting "good afternoon!"

<br>

### 6.2 - <font color='#097969'>Our First CWL Workflow<font>

Workflows are highly similar to tools. <br>
Workflows have the following main components:
- <font color='red'>metadata</font> - information about the workflow & extra details
- <font color='red'>inputs</font> - data supplied from outside world
- <font color='red'>steps</font> - tools to execute
- <font color='red'>outputs</font> - data we collect after workflow execution is finished






```
class: Workflow
cwlVersion: v1.2

inputs:
  in1: string

steps:
  echo:
    run: ./echo.cwl
    in:
      message: in1
    out: [phrase]
    
outputs:
  echo_out:
    type: File
    outputSource: echo/phrase
```



**Inputs** are the data supplied from the outside world. Each time the workflow is run, we need to supply values for these inputs. Inputs can be used by steps in the workflow.

**Steps** consist of the tools we wish to use. We need to specify the <font color='green'>*tool*</font> we wish to execute, as well as how to supply data to it's <font color='green'>*tool inputs*</font>. We also need to specify which <font color='green'>*tool outputs*</font> we are interested in.

**Outputs** are the data produced by the workflow which we wish to return to the outside world. <br>
After the workflow has finished running, these outputs are presented back to the user.

⚡ <font color='Orange'>**TASK**</font> <br>
>📂 <font color='red'>**Open**</font> *session/examples/basic/wf_final.cwl* - double click in the file <br>
>📂 <font color='red'>**Open**</font> *session/examples/basic/echo.cwl*  - double click in the file <br>
> (You have them bellow too!!). <br>

> Looking at `wf_final.cwl`, you should see a single input called `greeting`, and a single output called `phrase`. <br>
>It has a step called `echo` which runs `./echo.cwl`.<br>
>Switch over to `echo.cwl` to see this tool definition. <br>
>
>After a brief look, run the cell below to execute the workflow.

## session/examples/basic/wf_final.cwl
```
class: Workflow
cwlVersion: v1.2

inputs:
  greeting: string

steps:
  echo:
    run: ./echo.cwl
    in:
      message: greeting
    out: [phrase]

outputs:
  echo_out:
    type: File
    outputSource: echo/phrase
```

## session/examples/basic/echo.cwl
```
cwlVersion: v1.2
class: CommandLineTool

baseCommand: echo
stdout: phrase.txt

inputs:
  message:
    type: string
    inputBinding:
      position: 1

outputs:
  phrase:
    type: stdout
```



In [None]:
#cwltool <workflow.cwl> [inputs]
!cwltool session/examples/basics/wf_final.cwl --greeting "good morning!"

CWL produces a useful output log while running workflows.

We can use this for ***debugging***.<br>
The start and finish of each workflow, step, and tool, as well as any errors that occur are logged. <br>
If errors arise during execution, this information can be used to track down where the error occurred in the pipeline.



```
INFO [workflow ] start                   # workflow started
INFO [step echo] start                   # step started (supplying data)
INFO [job echo] <tmpfile> echo           # tool started
INFO [job echo] completed success        # tool finished
INFO [step echo] completed success       # workflow finished
{ ... }                                  # workflow outputs
INFO Final process status is success     # final status
```

Congratulations! ✨✨✨

Now that we have run our first CWL workflow, we can learn how to write them ourselves. <br>
We will learn how inputs, steps, and outputs are written for a CWL workflow. <br>
As we learn, we will build a variant calling workflow using `cutadapt` and other tools we write along the way.

>***Note***<br>
>If you get stuck during the next three chapters (inputs / steps / outputs), move on to the next chapter. <br>
>The starting point of the next chapter is the answer for the current chapter!

<br>

### 6.3 - <font color='#097969'>Inputs<font>



Each **Workflow Input** must have a unique name and a type.

They can be defined on a single line, if only `name: type` is needed.

```
inputs:
  my_string: string
  my_int: int
  my_file: File
```

Inputs also allow other parameters, including `label`, `doc`, `default`, and <font color="red">`secondaryFiles`</font> which will be touched on later.

```
inputs:
  my_string:
    type: string
    label: "input message"
    doc: "A mandatory string message. Used by step.... etc etc"
    default: "HELLO!!"
```

>***Note***<br>
>Workflow inputs can be optional. This is performed in the same manner as for tool inputs.
>```
inputs:
  my_string: string?     # optional
  my_int: int?           # optional
  my_file: File
```

⚡ <font color='Orange'>**TASK**</font> <br>
>📂 <font color='red'>**Open**</font> *session/examples/basic/wf_inputs.cwl*
>
>Add an input called `greeting`, with the type `string`. <br>
>A value for this input is supplied in the CLI command.
>
>Run the cell below to check your answer.



In [None]:
!cwltool session/examples/basics/wf_inputs.cwl --greeting "hello!"

If correct, `cwltool` will run without errors. You should see something like the following:
```
INFO [workflow ] start
INFO [workflow ] completed success
{}
INFO Final process status is success
```

We started the workflow, executed zero steps and collected zero outputs.

<br>

### 6.4 - <font color='#097969'>Steps<font>

Each ***Workflow Step*** has a unique name, and three mandatory subobjects.




```
steps:
  wc:
    run: ./wc.cwl  
    in:
      infile: my_file      
    out: [count]    
```

<font color="red">`run`</font> provides a path to a CWL tool definition we wish to use. <br>
The path can be relative to the workflow file path, or absolute.

<font color="red">`in`</font> allows us to feed data to the tool's inputs.<br>
Make sure each tool input is accounted for - if any are missing, CWL will throw an error! <br>
Both <font color="green">workflow inputs</font>, and the outputs of other <font color="green">workflow steps</font> can be used to feed data to tool inputs. <br>

<font color="red">`out`</font> specifies the tool outputs we wish to keep for this step.  <br>These can be used in other steps, or as workflow outputs.



***Feeding data from workflow inputs***

workflow.cwl
```
class: Workflow
cwlVersion: v1.2

inputs:
  my_file: File

steps:
  wc:
    run: ./wc.cwl          # the tool to use is 'wc.cwl' (a CWL tool stored in the local directory)
    in:
      infile: my_file      # the 'infile' tool input of wc.cwl will be fed data from the 'my_file' workflow input
    out: [count]           # the 'count' tool output of wc.cwl will be used later in the workflow

outputs: []
```

./wc.cwl
```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: wc

inputs:
  infile:
    type: File
    inputBinding:
      position: 1

outputs:
  count:
    type: stdout  
```


In the example below, `workflow.cwl` has a single step called `wc`. <br>
In the above, we see that
- `./wc.cwl` is the tool to use
- The <font color="green">`my_file`</font> workflow input should feed the <font color="orange">`infile`</font> tool input
- We wish to keep the <font color="orange">`count`</font> tool output for use somewhere else in the workflow.



>***Note***<br>
>If a tool input is `optional`, we can choose whether to supply data to that input.<br>
>We can omit it from the step inputs.


⚡ <font color='Orange'>**TASK 11**</font> <br>
>📂 <font color='red'>**Open**</font> *session/examples/basic/wf_steps_task1.cwl*<br>
>
>Execute the workflow by running the cell below. <br>
>Note that for `[job echo]` `"hello!"` is being echoed.
>
>This workflow has two inputs - <font color="green">`greeting`</font> and <font color="green">`farewell`</font>.<br>
>Swap the workflow input feeding the <font color="orange">`message`</font> tool input from <font color="green">`greeting`</font> to <font color="green">`farewell`</font>.<br>
>It should now echo `"goodbye!"`


In [None]:
!cwltool session/examples/basics/wf_steps_task1.cwl --greeting "hello!" --farewell "goodbye!"

***Feeding data from step outputs***

Tool inputs can also be fed data from the outputs of other workflow steps.<br>
This is achieved using `step/outname` - where `step` is the name of the step, and `outname` is the name of the specific output.


```
class: Workflow
cwlVersion: v1.2

inputs:
  raw_reads: File
  reference: File

steps:
  cutadapt:
    run: tools/cutadapt.cwl
    in:
      reads: raw_reads
    out: [trimmed_reads]
    
  bwa_mem:
    run: tools/bwa_mem.cwl
    in:
      reads: cutadapt/trimmed_reads     # fed from a step output
      ref: reference
    out: [bam]

outputs: []
```

⚡ <font color='Orange'>**TASK**</font> <br>
>📂 <font color='red'>**Open**</font> *session/examples/basic/wf_steps_task2.cwl*
>
>Read the `wc` step. <br>
>Is data being fed to the `infile` tool input using a <font color="green">workflow input</font>, or the output of another <font color="green">workflow step</font>?<br>
>Run the cell below to see the workflow in action. <br>

In [None]:
!cwltool session/examples/basics/wf_steps_task2.cwl --greeting "hello!" --farewell "goodbye!"

⚡ <font color='Orange'>**TASK 13**</font> <br>
>📂 <font color='red'>**Open**</font> *session/examples/basic/wf_steps_task3.cwl*
>
>Add a step called `echo`.
>
>This step should use the `./echo.cwl` tool. <br>
>It should feed the `greeting` workflow input to the `message` tool input. <br>
>It should collect the `phrase` tool output.
>
>Run the cell below to check your answer.<br>
>The `echo` step should appear in the CWL log starting with `[job echo]`.



In [None]:
!cwltool session/examples/basics/wf_steps_task3.cwl --greeting "hello!"

<br>

### 6.5 - <font color='#097969'>Outputs<font>

***Workflow Outputs*** collect the data we want to return after execution.<br>
They have a unique name, a `type`, and an <font color="red">`outputSource`</font>. <br>
Their `type` is almost always `File`.

```
class: Workflow
cwlVersion: v1.2

inputs:
  my_string: string

steps:
  echo:
    run: ./echo.cwl
    in:
      the_string: my_string
    out: [phrase]

outputs:
  echo_out:
    type: File
    outputSource: echo/phrase
```

In the workflow above, we have a single workflow output called `echo_out`. <br>
You may notice that the `outputSource` is familiar - its referencing a step output! <br>
In practice, `outputSource` is almost always a step output.



⚡ <font color='Orange'>**TASK 14**</font> <br>
>📂 <font color='red'>**Open**</font> *session/examples/basic/wf_outputs.cwl*
>
>Add an output to this workflow called `echo_out`. <br>
>It's `type` should be `File`, and it should reference the `phrase` output of the `echo` step.
>
>Run the cell below to check your answer. <br>
>The CWL output log should now report an output called `echo_out` being produced.


In [None]:
!cwltool session/examples/basics/wf_outputs.cwl --greeting "hello!"

<br>

### 6.6 - <font color='#097969'>Job Params<font>






We must supply inputs when running a tool or workflow with an execution engine.<br>
Until now, we have been supplying inputs manually using the CLI :

<font color='green'>`!cwltool echo.cwl --message "hello there!"`</font>

This has been fine since we only have a single input. <br>
Once a workflow has 5... 10... 20...+ inputs, this method can get inconvenient!<br>
Luckily, we can specify our inputs using a YAML file instead.

<font color='green'>`!cwltool echo.cwl params.yml`</font>

This file is commonly named `params.yml`, `job.yml` or `inputs.yml` - whichever makes most sense to you. <br>
Today, we have used `params.yml`.




⚡ <font color='Orange'>**TASK**</font> <br>
>📂 <font color='red'>**Open**</font> *session/examples/basics/params.yml*
and inspect the contents.<br>
>See that the `greeting` workflow input is being provided. <br>
>
>We also provide two extra values - `my_int` and `my_file`. <br>
>They're just included to provide a reference for `File` and `int` type input values. <br>
>Since their names don't correspond to any workflow input, they are dropped. <br>
>
>Run the cell below to demonstrate. It should run without errors.


In [None]:
#cwltool <file.cwl> <params.yml>
!cwltool session/examples/basics/wf_final.cwl session/examples/basics/params.yml

<br>

### <font color="white"></font>

# 7 - <font color='#097969'>Continue Learning CWL<font>






***Best Practices***

CWL tools and workflows can be written in many different ways.

CWL provides a set of features and rules - the way these building blocks are pieced together depends on the developer!

To avoid common pitfalls, the [CWL best-practices guide](https://www.commonwl.org/user_guide/topics/best-practices.html#) can be consulted. <br>
This is a culmination of experience garnered from the CWL community, so is a highly valuable resource!
More useful best-practices information can be found in the [CWL FAQ guide](https://www.commonwl.org/user_guide/faq.html).


***Advanced CWL***

This material covered the fundamentals of CWL tools and Workflows, but there is still more you may need to learn.

The main omission from this material is array inputs.<br>
A chapter on arrays will be added at a future data as an extension.

Here are some resources which will extend your CWL skills even futher:

Tools
- [CWL CommandLineTool specification](https://www.commonwl.org/v1.2/CommandLineTool.html)
- [Expressions](https://www.commonwl.org/user_guide/topics/expressions.html)
- [Creating files at runtime](https://www.commonwl.org/user_guide/topics/creating-files-at-runtime.html)

Workflows
- [CWL Workflow specification](https://www.commonwl.org/v1.2/Workflow.html)
- [Conditional steps](https://www.commonwl.org/user_guide/topics/using-containers.html)
- [Scatter parallelisation](https://www.commonwl.org/user_guide/topics/using-containers.html)
- [Subworkflows](https://www.commonwl.org/user_guide/topics/using-containers.html)




***Communities***

CWL has an active community of developers. <br>
Becoming a part of this community is the best way to increase your skills and become an expert!

Here are some of the many ways to become part of the CWL community:
- [CWL Gitter](https://gitter.im/common-workflow-language/common-workflow-language)
- [CWL Discourse Group](https://cwl.discourse.group/)
- [Community Meetings](https://www.commonwl.org/community/#cwl-online-community-meetings)

# 8 - <font color='#097969'>Exercises<font>

## 8.1 - <font color='#097969'> Building Your CWL Tools <font>

Create three CWL tools:

*GREP Tool*: Follow the instructions/code from [here](https://github.com/screx/cwl-tutorial?tab=readme-ov-file#grep):

*  Define inputs for the search term and the file to search in.
* Specify the base command and arguments to reflect GREP's syntax

WC Tool: Follow the instructions/code detailed [here](https://github.com/screx/cwl-tutorial?tab=readme-ov-file#wc)
* Set up parameters to count occurrences.
* Configure the output to capture the count in a designated file

TAR Tool:  Follow the instructions/code detailed [here](https://github.com/screx/cwl-tutorial?tab=readme-ov-file#tar)
* Detail the input as the compressed file
* Configure the output as the uncompressed content.

Note: You can use the input files that you have in ‘exercise/cl-tools/<TOOL>’ folder


In [None]:
### Code for GREP CWL Tool

In [None]:
### Run GREP CWL Tool

In [None]:
### Code for WC CWL Tool

In [None]:
### Run WC CWL Tool

In [None]:
### Code for TAR CWL Tool

In [None]:
### Run TAR CWL Tool

## 8.2 - <font color='#097969'> Creating Your CWL Workflow <font>

Design a workflow that integrates the GREP, WC, and TAR tools.

The workflow should sequentially uncompressed a file;
* Search for a string and count the occurrences, with the result saved in count.txt.

Follow the instructions/code detailed [here](https://github.com/screx/cwl-tutorial?tab=readme-ov-file#workflows).

**Workflow Steps**:
* Step 1: Start with the TAR tool to uncompress  the input file.

* Step 2: Use the GREP tool to search for the desired string in the uncompressed data.
* Step 3: Apply the WC tool on the output of GREP to count the occurrences and output to count.txt.

**Workflow Execution**:
* Ensure that each tool's output is correctly piped as the input to the subsequent tool.
* Set up the final output file count.txt to store the count from the WC tool.


In [None]:
### Code for CWL workflow

In [None]:
### Run CWL workflow