# Day 6 - Nextflow Basics: Introduction to channels and operators

Today, we will begin exploring Nextflow, the programming language that powers the advanced nf-core pipelines you worked with last week. Your task is to dive into the core concepts and syntax of Nextflow, understanding how this language enables the development of scalable and reproducible workflows.

Nextflow works quite differently from traditional programming languages like Python or Java that you may already be familiar with. To get started, it is essential to understand the foundational concepts that set Nextflow apart.

### 1. Describe the concept of Workflows, Processes and Channels we deal with in Nextflow

Nextflow is built on a dataflow programming paradigm.  

- Workflows: A workflow is a higher-level composition that connects processes via channels and defines the pipeline logic.  

- Processes: A process in Nextflow encapsulates a computational step (script, command, tool) you want to run. It has declared inputs and outputs. Independent processes execute tasks when their inputs are ready. Processes don’t share mutable state; they communicate only via channels. 

Channels: They are the “wires” through which data flows from producers (process outputs) to consumers (process inputs).

Thus, a pipeline is essentially a graph: processes are nodes, channels are edges defining how data flows.  

Quellen: https://www.nextflow.io/docs/latest/workflow.html, https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1, https://www.nextflow.io/docs/latest/process.html

## Introduction to channels

Please refer to the file  $\texttt{channels\_intro.nf}$ for the next exercises. Then run the code with the respective flag here below. 

In [1]:
# Check example
!nextflow run channels_intro.nf --step 0


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mberserk_sammet[0;2m] DSL2 - [36mrevision: [0;36m937e040295[m
[K
1


In [2]:
# Check example
!nextflow run channels_intro.nf --step 00


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mfocused_brattain[0;2m] DSL2 - [36mrevision: [0;36m937e040295[m
[K
1


In [3]:
# check example
!nextflow run channels_intro.nf --step 000


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mexotic_planck[0;2m] DSL2 - [36mrevision: [0;36m937e040295[m
[K
1


In [11]:
# Task 1 - Create a channel that enumerates the numbers from 1 to 10
!nextflow run channels_intro.nf --step 1


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36msoggy_rutherford[0;2m] DSL2 - [36mrevision: [0;36mee2744220b[m
[K
1
2
3
4
5
6
7
8
9
10


In [13]:
# Task 2 - Create a channel that gives out the entire alphabet
!nextflow run channels_intro.nf --step 2


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mfestering_stonebraker[0;2m] DSL2 - [36mrevision: [0;36m327ad19a85[m
[K
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z


In [15]:
# Task 3 - Create a channel that includes all files in the "files_dir" directory
!nextflow run channels_intro.nf --step 3


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mgolden_moriondo[0;2m] DSL2 - [36mrevision: [0;36m327ad19a85[m
[K
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_4.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq_1.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR4_2.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR1_1.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq3_2.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq1_2.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_2.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq2_2.fq

In [16]:
# Task 4 - Create a channel that includes all TXT files in the "files_dir" directory
!nextflow run channels_intro.nf --step 4


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mhungry_heisenberg[0;2m] DSL2 - [36mrevision: [0;36m327ad19a85[m
[K
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_4.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_2.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_3.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_5.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_1.txt


In [17]:
# Task 5 - Create a channel that includes the files "fastq_1.fq" and "fastq_2.fq" in the "files_dir" directory
!nextflow run channels_intro.nf --step 5


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mmighty_gilbert[0;2m] DSL2 - [36mrevision: [0;36m327ad19a85[m
[K
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq_1.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq_2.fq


In [20]:
# Task 6 - go back to the time when you included all files. 
# Are you sure that really ALL files are included? If not, how can you include them?
!nextflow run channels_intro.nf --step 6


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mfestering_mestorf[0;2m] DSL2 - [36mrevision: [0;36m96f2b422d2[m
[K
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_4.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq_1.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR4_2.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR1_1.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq3_2.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq1_2.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/file_2.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq.fq
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq2_2.

In [22]:
# Task 7 - get all filepairs in the "files_dir" directory
!nextflow run channels_intro.nf --step 7


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `channels_intro.nf` [0;2m[[0;1;36mjolly_cray[0;2m] DSL2 - [36mrevision: [0;36m0814395a3e[m
[K
[fastq3, [/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq3_1.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq3_2.fq]]
[SRR4, [/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR4_1.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR4_2.fq]]
[SRR2, [/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR2_1.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR2_2.fq]]
[fastq, [/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq_1.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq_2.fq]]
[fastq5, [/home/chrissi/BioPrak/computational-w

## Now that you have a solid understanding of the basic concepts of channels in Nextflow, it’s time to experiment and see how they work in practice.

To do so, Nextflow has the concept of Operators to give and pass information inbetween channels.

Please answer the questions in $\texttt{basic\_channel\_operations.nf}$ and run the code here. 

In [23]:
# Task 1 - Extract the first item from the channel
!nextflow run basic_channel_operations.nf --step 1


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36madoring_mclean[0;2m] DSL2 - [36mrevision: [0;36mc8a597c056[m
[K
1


In [25]:
# Task 2 - Extract the last item from the channel
!nextflow run basic_channel_operations.nf --step 2


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mgolden_linnaeus[0;2m] DSL2 - [36mrevision: [0;36m642c12c221[m
[K
3


In [26]:
# Task 3 - Use an operator to extract the first two items from the channel
!nextflow run basic_channel_operations.nf --step 3


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mprickly_neumann[0;2m] DSL2 - [36mrevision: [0;36m642c12c221[m
[K
1
2


In [27]:
# Task 4 - Return the squared values of the channel
!nextflow run basic_channel_operations.nf --step 4


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mgolden_wegener[0;2m] DSL2 - [36mrevision: [0;36mad0f1ab6d6[m
[K
4
9
16


In [None]:
# Task 5 - Remember the previous task where you squared the values of the channel. 
# Now, extract the first two items from the squared channel
!nextflow run basic_channel_operations.nf --step 5


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36msilly_payne[0;2m] DSL2 - [36mrevision: [0;36mad0f1ab6d6[m
[K
4
9


In [1]:
# Task 6 - Remember when you used bash to reverse the output? 
# Try to use map and Groovy to reverse the output
!nextflow run basic_channel_operations.nf --step 6


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mamazing_sax[0;2m] DSL2 - [36mrevision: [0;36m4927197e8a[m
[K
[Swift, Taylor]


In [3]:
# Task 7 - Use fromPath to include all fastq files in the "files_dir" directory, then use map to return a pair containing the file name and the file path (Hint: include groovy code)
!nextflow run basic_channel_operations.nf --step 7


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mdreamy_plateau[0;2m] DSL2 - [36mrevision: [0;36mec611231c6[m
[K
[fastq_1.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq_1.fq]
[SRR4_2.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR4_2.fq]
[SRR1_1.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/SRR1_1.fq]
[fastq3_2.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq3_2.fq]
[fastq1_2.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq1_2.fq]
[fastq.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq.fq]
[fastq2_2.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/files_dir/fastq2_2.fq]
[fastq3_1.fq, /home/chrissi/BioPrak/computational-workflows-2025/notebo

In [8]:
# Task 8 - Combine the items from the two channels into a single channel
!nextflow run basic_channel_operations.nf --step 8


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mwise_sinoussi[0;2m] DSL2 - [36mrevision: [0;36mf75a2f797e[m
[K
a
b
c
1
4
5
2
6
3


In [9]:
# Task 9 - Flatten the list in the channel
!nextflow run basic_channel_operations.nf --step 9


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mchaotic_woese[0;2m] DSL2 - [36mrevision: [0;36m4b844ea913[m
[K
1
2
3
4
5
6


In [None]:
# Task 10 - Collect the items of a channel into a list. What kind of channel is the output channel?
!nextflow run basic_channel_operations.nf --step 10

# The collect operator returns a value channel containing a single item, 
# which is a list of all items emitted by the input channel.


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mridiculous_knuth[0;2m] DSL2 - [36mrevision: [0;36m076998424b[m
[K
[1, 2, 3]


What kind of channel is the output channel?  
The collect operator returns a value channel containing a single item, which is a list of all items emitted by the input channel.

In [12]:
# Task 11 -  From the input channel, create lists where each first item in the list of lists is the first item in the output channel, followed by a list of all the items its paired with
!nextflow run basic_channel_operations.nf --step 11


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mbackstabbing_lichterman[0;2m] DSL2 - [36mrevision: [0;36mba16f89ca5[m
[K
[1, [V, f, B]]
[3, [M, G, 33]]
[2, [O, L, E]]


In [16]:
# Task 12 - Create a channel that joins the input to the output channel. What do you notice?
!nextflow run basic_channel_operations.nf --step 12
!nextflow run basic_channel_operations.nf --step 122


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mgloomy_colden[0;2m] DSL2 - [36mrevision: [0;36mf4a810abb3[m
[K
[1, V, f]
[3, M, G]
[2, O, L]

[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mintergalactic_lumiere[0;2m] DSL2 - [36mrevision: [0;36mf4a810abb3[m
[K
[1, V, f]
[3, M, G]
[2, O, L]
[1, B, null]
[3, 33, null]
[2, null, E]


Task 12 - What do you notice compared to Task 11?  
The join operator transforms a sequence of tuples like (K, V1, V2, ..) and (K, W1, W1, ..) into a sequence of tuples like (K, V1, V2, .., W1, W2, ..). By default, the first element of each item is used as the key. By default, unmatched items are discarded. The remainder option can be used to emit them at the end.

In [17]:
# Task 13 - Split the input channel into two channels, one of all the even numbers and the other of all the odd numbers.
#           Write them to stdout including information about which is which
!nextflow run basic_channel_operations.nf --step 13 -dump-channels


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mastonishing_jepsen[0;2m] DSL2 - [36mrevision: [0;36m0e42119425[m
[K
Odd numbers: [1, 3, 5, 7, 9]
Even numbers: [2, 4, 6, 8, 10]


In [35]:
# Task 14 - Nextflow has the concept of maps. 
# Write the names in the maps in this channel to a file called "names.txt". 
# Each name should be on a new line. 
# Store the file in the "results" directory under the name "names.txt"

!nextflow run basic_channel_operations.nf --step 14

!cat results/names.txt



[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basic_channel_operations.nf` [0;2m[[0;1;36mvoluminous_lichterman[0;2m] DSL2 - [36mrevision: [0;36md188aafa6d[m
[K
Snape
Albus
Ron
Hagrid
Dobby
Hermione
Harry


## Now that we learned about Channels and Operators to deal with them, let's focus on Processes that make use of these channels.

Please answer the questions in $\texttt{basics\_processes.nf}$ and run the code here. 

In [36]:
# Task 1 - create a process that says Hello World! (add debug true to the process right after initializing to be sable to print the output to the console)
!nextflow run basics_processes.nf --step 1


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36mcurious_cori[0;2m] DSL2 - [36mrevision: [0;36mdf317feb24[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO -[K
[2A
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO[2m |[m 0 of 1[K
[2A
[2mexecutor >  local (1)[m[K
[2m[[0;34m73/b31345[0;2m] [0;2m[mSAYHELLO[2m |[m 1 of 1[32m ✔[m[K
Hello world![K
[K



In [None]:
# Task 2 - create a process that says Hello World! using Python
!nextflow run basics_processes.nf --step 2

# Also possible with shebang in script
# #!/usr/bin/env python as first line in script section


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36msleepy_pauling[0;2m] DSL2 - [36mrevision: [0;36mdc64163480[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO_PYTHON -[K
[2A
[2mexecutor >  local (1)[m[K
[2m[[0;34m20/95f639[0;2m] [0;2m[mSAYHELLO_PYTHON[2m |[m 0 of 1[K
[3A
[2mexecutor >  local (1)[m[K
[2m[[0;34m20/95f639[0;2m] [0;2m[mSAYHELLO_PYTHON[2m |[m 1 of 1[32m ✔[m[K
Hello world![K
[K



In [39]:
# Task 3 - create a process that reads in the string "Hello world!" from a channel and write it to command line
!nextflow run basics_processes.nf --step 3


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36madmiring_jepsen[0;2m] DSL2 - [36mrevision: [0;36mc3011366e2[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO_PARAM -[K
[2A
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO_PARAM[2m |[m 0 of 1[K
[2A
[2mexecutor >  local (1)[m[K
[2m[[0;34mc8/6ace09[0;2m] [0;2m[mSAYHELLO_PARAM[33;2m ([0;33m1[2m)[m[2m |[m 1 of 1[32m ✔[m[K
Hello world![K
[K



In [None]:
# About bin dirctory:
# Nextflow automatically adds a directory named 'bin' located in the same directory as the 
# main script to the PATH environment variable of all processes.
# This allows you to place executable scripts in that directory and invoke them directly from your process scripts.
# This is particularly useful for organizing and reusing scripts across multiple processes within the same workflow.
# You need to make the script executable with: chmod +x scriptname, or chmod a+x scriptname (all users)

In [40]:
# Task 4 - create a process that reads in the string "Hello world!" from a channel and write it to a file. 
!nextflow run basics_processes.nf --step 4


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36magitated_ritchie[0;2m] DSL2 - [36mrevision: [0;36m0bfb5aa036[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO_FILE -[K
[2A
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO_FILE[2m |[m 0 of 1[K
[2A
[2mexecutor >  local (1)[m[K
[2m[[0;34m9c/59848a[0;2m] [0;2m[mSAYHELLO_FILE[33;2m ([0;33m1[2m)[m[2m |[m 1 of 1[32m ✔[m[K



If you add debug true, Nextflow prints the path of the work directory for that process: [9c/59848a]  
Inside that work directory, the greeting.txt can be found.

In [42]:
# Task 5 - create a process that reads in a string and converts it to uppercase and saves it to a file as output. View the path to the file in the console
!nextflow run basics_processes.nf --step 5


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36mprickly_gilbert[0;2m] DSL2 - [36mrevision: [0;36m687f23068f[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mUPPERCASE -[K
[2A
[2mexecutor >  local (1)[m[K
[2m[[0;34ma3/5b7b15[0;2m] [0;2m[mUPPERCASE[33;2m ([0;33m1[2m)[m[2m |[m 0 of 1[K
[3A
[2mexecutor >  local (1)[m[K
[2m[[0;34ma3/5b7b15[0;2m] [0;2m[mUPPERCASE[33;2m ([0;33m1[2m)[m[2m |[m 1 of 1[32m ✔[m[K
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/work/a3/5b7b154ba0b346969df355cdcb76eb/uppercase.txt[K



In [43]:
# Task 6 - add another process that reads in the resulting file from UPPERCASE and print the content to the console (debug true).
!nextflow run basics_processes.nf --step 6


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36mcompassionate_banach[0;2m] DSL2 - [36mrevision: [0;36m9e06b5a980[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mUPPERCASE  -[K
[2m[[0;34m-        [0;2m] [0;2m[mPRINTUPPER -[K
[3A
[2mexecutor >  local (1)[m[K
[2m[[0;34mf4/de3d93[0;2m] [0;2m[mUPPERCASE[33;2m ([0;33m1[2m)[m[2m |[m 1 of 1[32m ✔[m[K
[2m[[0;34m-        [0;2m] [0;2m[mPRINTUPPER    -[K
[4A
[2mexecutor >  local (2)[m[K
[2m[[0;34mf4/de3d93[0;2m] [0;2m[mUPPERCASE[33;2m ([0;33m1[2m)[m [2m |[m 1 of 1[32m ✔[m[K
[2m[[0;34m85/fb3883[0;2m] [0;2m[mPRINTUPPER[33;2m ([0;33m1[2m)[m[2m |[m 1 of 1[32m ✔[m[K
HELLO WORLD![K
[K



Comared to all the other runs. What changed in the output here and why?  
So far we had only one process called in our workflow. in this task we first call the UPPERCASE process to write the file, before we can read the file to run the process PRINTUPPER. This is why now 2 [] occure and are executed after each other.

In [36]:
# Task 7 - based on the paramater "zip" (see at the head of the file), create a process that zips the file created in the UPPERCASE process either in "zip", "gzip" OR "bzip2" format.
#          Print out the path to the zipped file in the console
!nextflow run basics_processes.nf --step 7


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36mprickly_pike[0;2m] DSL2 - [36mrevision: [0;36ma850a52625[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO_FILE -[K
[2m[[0;34m-        [0;2m] [0;2m[mZIPFILE       -[K
[3A
[2mexecutor >  local (1)[m[K
[2m[[0;34md7/9198db[0;2m] [0;2m[mSAYHELLO_FILE[33;2m ([0;33m1[2m)[m[2m |[m 0 of 1[K
[2m[[0;34m-        [0;2m] [0;2m[mZIPFILE           -[K
[4A
[2mexecutor >  local (2)[m[K
[2m[[0;34md7/9198db[0;2m] [0;2m[mSAYHELLO_FILE[33;2m ([0;33m1[2m)[m[2m |[m 1 of 1[32m ✔[m[K
[2m[[0;34mce/0e464e[0;2m] [0;2m[mZIPFILE[33;2m ([0;33m1[2m)[m      [2m |[m 1 of 1[32m ✔[m[K
  adding: greeting.txt (stored 0%)[K
Created file: uppercase.zip[K
[K
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/work/ce/0e464e268bbbe48dfa0bd67569154e/uppercase.zip[K
[8A
[2mexecutor >  local (2)[m[K
[2m[[0;

In [37]:
# Task 8 - Create a process that zips the file created in the UPPERCASE process in "zip", "gzip" AND "bzip2" format. Print out the paths to the zipped files in the console
!nextflow run basics_processes.nf --step 8


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36mevil_nobel[0;2m] DSL2 - [36mrevision: [0;36m29610a1ec7[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO_FILE -[K
[2A
[2m[[0;34m-        [0;2m] [0;2m[mSAYHELLO_FILE -[K
[2m[[0;34m-        [0;2m] [0;2m[mZIP_VARIANTS  -[K
[3A
[2mexecutor >  local (1)[m[K
[2m[[0;34m78/efb249[0;2m] [0;2m[mSAYHELLO_FILE[33;2m ([0;33m1[2m)[m[2m |[m 0 of 1[K
[2m[[0;34m-        [0;2m] [0;2m[mZIP_VARIANTS      -[K
[4A
[2mexecutor >  local (2)[m[K
[2m[[0;34m78/efb249[0;2m] [0;2m[mSAYHELLO_FILE[33;2m ([0;33m1[2m)[m[2m |[m 1 of 1[32m ✔[m[K
[2m[[0;34m3b/538dde[0;2m] [0;2m[mZIP_VARIANTS[33;2m ([0;33m1[2m)[m [2m |[m 1 of 1[32m ✔[m[K
  adding: greeting.txt (stored 0%)[K
Created file: uppercase.zip[K
Created file: uppercase.gz[K
Created file: uppercase.bz2[K
[K
[/home/chrissi/BioPrak/computational-workflows-20

In [40]:
 # Task 9 - Create a process that reads in a list of names and titles from a channel and writes them to a file.
#           Store the file in the "results" directory under the name "names.tsv"
!nextflow run basics_processes.nf --step 9


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `basics_processes.nf` [0;2m[[0;1;36mangry_torricelli[0;2m] DSL2 - [36mrevision: [0;36m94a25a768f[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mWRITETOFILE -[K
[2A
[2mexecutor >  local (3)[m[K
[2m[[0;34m3f/cfcc47[0;2m] [0;2m[mWRITETOFILE[33;2m ([0;33m1[2m)[m[2m |[m 0 of 7[K
[3A
[2mexecutor >  local (7)[m[K
[2m[[0;34m02/27a2b3[0;2m] [0;2m[mWRITETOFILE[33;2m ([0;33m4[2m)[m[2m |[m 7 of 7[32m ✔[m[K
[3A
[2mexecutor >  local (7)[m[K
[2m[[0;34m02/27a2b3[0;2m] [0;2m[mWRITETOFILE[33;2m ([0;33m4[2m)[m[2m |[m 7 of 7[32m ✔[m[K



## Now, let's try some more advanced Operators

Please answer the questions in $\texttt{advanced\_channel\_operations.nf}$ and run the code here. 

To come closer to actual pipelines, we introduce the concept of "meta-maps" which you can imagine as dictionaries that are passed with data via channels containing crucial metadata on the sample. 

Also, we will come back to samplesheets which you should remember from last week.

In [42]:
# Task 1 - Read in the samplesheet.

!nextflow run advanced_channel_operations.nf --step 1


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `advanced_channel_operations.nf` [0;2m[[0;1;36mberserk_escher[0;2m] DSL2 - [36mrevision: [0;36mb4e40b479d[m
[K


In [43]:
# Task 2 - Read in the samplesheet and create a meta-map with all metadata and another list with the filenames ([[metadata_1 : metadata_1, ...], [fastq_1, fastq_2]]).
#          Set the output to a new channel "in_ch" and view the channel. YOU WILL NEED TO COPY AND PASTE THIS CODE INTO SOME OF THE FOLLOWING TASKS (sorry for that).

!nextflow run advanced_channel_operations.nf --step 2


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `advanced_channel_operations.nf` [0;2m[[0;1;36msharp_panini[0;2m] DSL2 - [36mrevision: [0;36mcdb95245c4[m
[K
[[CONTROL_REP1, fq_1_R1.fastq.gz, fq_1_R2.fastq.gz, auto], [null, fq_1_R1.fastq.gz, fq_1_R2.fastq.gz]]
[[CONTROL_REP2, fq_2_R1.fastq.gz, fq_2_R2.fastq.gz, forward], [null, fq_2_R1.fastq.gz, fq_2_R2.fastq.gz]]
[[CONTROL_REP3, fq_3_R1.fastq.gz, fq_3_R2.fastq.gz, reverse], [null, fq_3_R1.fastq.gz, fq_3_R2.fastq.gz]]
[[CONTROL_REP1, fq_4_R1.fastq.gz, fq_4_R2.fastq.gz, auto], [null, fq_4_R1.fastq.gz, fq_4_R2.fastq.gz]]


In [49]:
# Task 3 - Now we assume that we want to handle different "strandedness" values differently. 
#          Split the channel into the right amount of channels and write them all to stdout so that we can understand which is which.

!nextflow run advanced_channel_operations.nf --step 3 -dump-channels


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `advanced_channel_operations.nf` [0;2m[[0;1;36mromantic_rutherford[0;2m] DSL2 - [36mrevision: [0;36m48f5a33fb2[m
[K
Metadaten:
	sample=CONTROL_REP1
	fastq_1=fq_1_R1.fastq.gz
	fastq_2=fq_1_R2.fastq.gz
	strandedness=auto
File-Listen:
	[null, fq_1_R1.fastq.gz, fq_1_R2.fastq.gz]
Metadaten:
	sample=CONTROL_REP2
	fastq_1=fq_2_R1.fastq.gz
	fastq_2=fq_2_R2.fastq.gz
	strandedness=forward
File-Listen:
	[null, fq_2_R1.fastq.gz, fq_2_R2.fastq.gz]
Metadaten:
	sample=CONTROL_REP3
	fastq_1=fq_3_R1.fastq.gz
	fastq_2=fq_3_R2.fastq.gz
	strandedness=reverse
File-Listen:
	[null, fq_3_R1.fastq.gz, fq_3_R2.fastq.gz]
Metadaten:
	sample=CONTROL_REP1
	fastq_1=fq_4_R1.fastq.gz
	fastq_2=fq_4_R2.fastq.gz
	strandedness=auto
File-Listen:
	[null, fq_4_R1.fastq.gz, fq_4_R2.fastq.gz]


In [50]:
# Task 4 - Group together all files with the same sample-id and strandedness value.

!nextflow run advanced_channel_operations.nf --step 4


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `advanced_channel_operations.nf` [0;2m[[0;1;36melated_ride[0;2m] DSL2 - [36mrevision: [0;36m2e913d0d37[m
[K
Metadaten:
	sample=CONTROL_REP1
	fastq_1=fq_1_R1.fastq.gz
	fastq_2=fq_1_R2.fastq.gz
	strandedness=auto
File-Listen:
	[null, fq_1_R1.fastq.gz, fq_1_R2.fastq.gz]
Metadaten:
	sample=CONTROL_REP2
	fastq_1=fq_2_R1.fastq.gz
	fastq_2=fq_2_R2.fastq.gz
	strandedness=forward
File-Listen:
	[null, fq_2_R1.fastq.gz, fq_2_R2.fastq.gz]
Metadaten:
	sample=CONTROL_REP3
	fastq_1=fq_3_R1.fastq.gz
	fastq_2=fq_3_R2.fastq.gz
	strandedness=reverse
File-Listen:
	[null, fq_3_R1.fastq.gz, fq_3_R2.fastq.gz]
Metadaten:
	sample=CONTROL_REP1
	fastq_1=fq_4_R1.fastq.gz
	fastq_2=fq_4_R2.fastq.gz
	strandedness=auto
File-Listen:
	[null, fq_4_R1.fastq.gz, fq_4_R2.fastq.gz]


## It's finally time to link processes and channels with each other

Please go to the file $\texttt{link\_p\_c.nf}$

In [None]:
!nextflow run link_p_c.nf

# Lateron if-clause with params.step == 1 was added, so for rerunning use:
# !nextflow run link_p_c.nf --step 1


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `link_p_c.nf` [0;2m[[0;1;36msmall_solvay[0;2m] DSL2 - [36mrevision: [0;36m73b3894485[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mSPLITLETTERS   -[K
[2m[[0;34m-        [0;2m] [0;2m[mCONVERTTOUPPER -[K
[3A
[2m[[0;34m-        [0;2m] [0;2m[mSPLITLETTERS   -[K
[2m[[0;34m-        [0;2m] [0;2m[mCONVERTTOUPPER -[K
Input channel: [[block_size:4], Hello World][K
Input channel: [[block_size:3], Computational Workflows][K
[5A
[2m[[0;34m-        [0;2m] [0;2m[mSPLITLETTERS  [2m |[m 0 of 2[K
[2m[[0;34m-        [0;2m] [0;2m[mCONVERTTOUPPER -[K
Input channel: [[block_size:4], Hello World][K
Input channel: [[block_size:3], Computational Workflows][K
[5A
[2mexecutor >  local (2)[m[K
[2m[[0;34mfd/46f959[0;2m] [0;2m[mSPLITLETTERS[33;2m ([0;33m1[2m)[m[2m |[m 1 of 2[K
[2m[[0;34m-        [0;2m] [0;2m[mCONVERTTOUPPER   -[K
Input channel: [[block_size:

### Give a list with the paths to the chunk files

In [62]:
# give alist fo chunk files by listing all files in results directory that start with "upper_chunk_" with python and without nextflow

import os
import glob

# Verzeichnis, in dem die resultierenden Dateien liegen
results_dir = "/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results"

# Alle Dateien auflisten, die mit "upper_chunk_" beginnen
chunk_files = glob.glob(os.path.join(results_dir, "upper_chunk_*"))

# Optional: absolute Pfade
chunk_files_abs = [os.path.abspath(f) for f in chunk_files]

# Ausgabe
print("Gefundene Chunk-Dateien:")
for f in chunk_files_abs:
    print(f)

Gefundene Chunk-Dateien:
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results/upper_chunk_007.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results/upper_chunk_000.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results/upper_chunk_004.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results/upper_chunk_006.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results/upper_chunk_005.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results/upper_chunk_001.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results/upper_chunk_003.txt
/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_04/results/upper_chunk_002.txt


### Why was CONVERTTOUPPER run so often?

Das Verhalten ist erwartet, weil Nextflow pro Input-Tuple ein Task ausführt.  

split_ch.flatMap { meta, files -> 
    files.collect { file -> tuple(meta, file) } 
}  
- split_ch ist der Output von SPLITLETTERS, der pro Sample mehrere Chunk-Dateien enthält (je nach block_size und Dateigröße).
- flatMap erzeugt für jede Chunk-Datei einen Input-Tuple für CONVERTTOUPPER.
- CONVERTTOUPPER schreibt für jede dieser Dateien eine neue Output-Datei in results/.  
Ergebnis: so viele Output-Dateien wie Chunks insgesamt.  

Angenommen, man hat 2 Samples, die jeweils in 4 Chunks gesplittet werden:  
split_ch enthält 2 × 4 = 8 Chunk-Dateien.  
flatMap wandelt jede Datei in einen eigenen Input für CONVERTTOUPPER.  
Daher erzeugt CONVERTTOUPPER 8 Dateien in results/.  

Wenn man weniger Output-Dateien möchte, müsste man entweder:
- Größere block_size wählen, sodass weniger Chunks entstehen.
- Nach SPLITLETTERS die Chunks wieder zusammenfassen (z. B. collect()), bevor man sie in CONVERTTOUPPER einspeist.

In [64]:
!nextflow run link_p_c.nf --step 2


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `link_p_c.nf` [0;2m[[0;1;36mcondescending_salas[0;2m] DSL2 - [36mrevision: [0;36m5bf26cc739[m
[K
[2m[[0;34m-        [0;2m] [0;2m[mSPLITLETTERS   -[K
[2m[[0;34m-        [0;2m] [0;2m[mCONVERTTOUPPER -[K
[3A
[2mexecutor >  local (4)[m[K
[2m[[0;34m6b/e26700[0;2m] [0;2m[mSPLITLETTERS[33;2m ([0;33m3[2m)[m[2m |[m 0 of 4[K
[2m[[0;34m-        [0;2m] [0;2m[mCONVERTTOUPPER   -[K
[4A
[2mexecutor >  local (4)[m[K
[2m[[0;34m6b/e26700[0;2m] [0;2m[mSPLITLETTERS[33;2m ([0;33m3[2m)[m[2m |[m 0 of 4[K
[2m[[0;34m-        [0;2m] [0;2m[mCONVERTTOUPPER   -[K
[31mERROR ~ Error executing process > 'SPLITLETTERS (1)'[K
[K
Caused by:[K
  Process `SPLITLETTERS (1)` terminated with an error exit status (1)[K
[K
[K
Command executed:[K
[K
  echo "Processing: [null, fq_1_R1.fastq.gz, fq_1_R2.fastq.gz] with block_size: null"[K
  echo "[null, fq_1_R1.fast

In the rest of the provided time, I was not able to fix this error.