

# Introduction to Shell Scripting

## Section 1: Shell Scripting - Variables, Arrays, and Expressions

1. **Introduction to Variables in Bash**
   - Explanation and examples of declaring and using variables.
   - Interactive cells where students can declare their own variables and print them.

2. **Introduction to Bash Functions**
   - Explanation and examples of declaring and using functions.
   - Interactive cells where students can declare their own functions and call them.

3. **Arrays in Bash**
   - Demonstrating how to create and access arrays.
   - Interactive examples for students to modify and access array elements.

4. **Basic Expressions and Operations**
   - Arithmetic operations and string manipulations.
   - Hands-on cells for students to try out expressions.

## Section 2: Flow Control and Repetition

1. **Conditional Statements**
   - Explanation of `if-else` and `case` statements with examples.
   - Interactive cells for students to write their own conditional statements.

2. **Loops in Bash**
   - Demonstrating `for`, `while`, and `until` loops with practical examples.
   - Exercises for students to write loops for specific tasks.

## Hands-On Exercise

- A set of exercises that combine variables, arrays, conditional statements, and loops.
- Example: A script that reads data from a file, processes it based on certain conditions, and outputs results.

## Assignment: Basic Bioinformatics Workflow

- Task: Write a Bash script for basic file manipulations in bioinformatics.
- Example assignment: A script that filters a dataset based on certain criteria, counts the number of occurrences of specific patterns, or reformat data files.



# Section 1: Shell Scripting - Variables, Arrays, and Expressions

1. **Introduction to Variables in Bash**
   - Explanation and examples of declaring and using variables.
   - Interactive cells where students can declare their own variables and print them.

2. **Introduction to Bash Functions**
   - Explanation and examples of declaring and using functions.
   - Interactive cells where students can declare their own functions and call them.

3. **Arrays in Bash**
   - Demonstrating how to create and access arrays.
   - Interactive examples for students to modify and access array elements.

4. **Basic Expressions and Operations**
   - Arithmetic operations and string manipulations.
   - Hands-on cells for students to try out expressions.

## 1. **Introduction to Variables in Bash**
   - Explanation and examples of declaring and using variables.
   - Interactive cells where students can declare their own variables and print them.

## Variable naming conventions

1. - Variable names can contain letters, numbers, and underscores.
2. - Variable names must start with a letter or an underscore.
3. - Variable names are case-sensitive.
4. - Variable names should be descriptive, without being too long.
5. - Variable names should not contain spaces.
6. - Variable names should not start with numbers.

## Global vs. Local Variables
- Global variables are available throughout the script, while local variables are only available within a function.
- Global variables can be accessed within a function, but local variables cannot be accessed outside the function.
- Global variables are usually declared at the beginning of the script, while local variables are declared within functions.
- Global variables are usually declared in uppercase, while local variables are usually declared in lowercase.




In [1]:
# Declaring a variable

my_var="Hello World"
new_var=10

# Escaping characters with backslash

echo \#Concatenating variables
echo $my_var $new_var

echo \#Concatenating variables with strings
echo $my_var "I am a new string"

echo Make new variable from existing variables
new_var2=$my_var$new_var
echo $new_var2



#Concatenating variables
Hello World 10


#Concatenating variables with strings
Hello World I am a new string
Make new variable from existing variables
Hello World10


In [2]:
echo \# Using curly with variables

echo ${my_var}2

echo \# Alternative way to curly braces
echo "$my_var"2

echo \# Problems when not using curly braces
echo $my_var2


# Using curly with variables
Hello World2


# Alternative way to curly braces
Hello World2
# Problems when not using curly braces



## Common Environment Variables

The following environment variables are commonly used in Bash scripts. They are usually declared in uppercase by the OS or Bash itself. However, they can be modified by the user.

- `HOME`: The home directory of the current user.
- `PATH`: A list of directories where the shell looks for commands.
- `PWD`: The current working directory.
- `SHELL`: The path to the current shell.
- `USER`: The username of the current user.
- `HOSTNAME`: The hostname of the current machine.
- `TERM`: The terminal type.
- `EDITOR`: The default text editor.
- `LANG`: The default language.
- `MAIL`: The location of the mailbox.
- `PS1`: The primary prompt string.
- `PS2`: The secondary prompt string.
- `HISTSIZE`: The number of commands to remember in the command history.
- `HISTFILESIZE`: The maximum number of lines contained in the history file.
- `HISTCONTROL`: Determines what commands are saved in the history list.
- `HISTIGNORE`: A list of patterns to ignore when saving the history list.
- `HISTTIMEFORMAT`: The format for the date/time stamp associated with each history entry.
- `HISTFILE`: The name of the file in which command history is saved.
- `HISTFILESIZE`: The maximum number of lines contained in the history file.
- `HISTSIZE`: The number of commands to remember in the command history.




In [None]:
echo \# Print environment variables
echo $PATH

In [None]:

echo \# Print all environment variables currently set
printenv


## Variable Expansion

This is very useful technique for manipulating variables in Bash. It allows you to perform operations on variables, such as extracting substrings, replacing patterns, and performing arithmetic operations.

- `${var}`: The value of the variable `var`.
- `${var:-word}`: If `var` is unset or null, the expansion of `word` is substituted. Otherwise, the value of `var` is substituted.
- `${var:=word}`: If `var` is unset or null, the expansion of `word` is assigned to `var`. The value of `var` is then substituted. Positional parameters and special parameters may not be assigned to in this way.
- `${var:?message}`: If `var` is null or unset, the expansion of `message` is written to the standard error and the shell, if it is not interactive, exits. Otherwise, the value of `var` is substituted.
- `${var:+word}`: If `var` is null or unset, nothing is substituted. Otherwise, the expansion of `word` is substituted.



## Examples of Variable Expansion


In [5]:
echo \# Declare a variable
var="Hello World"


# Declare a variable


In [6]:

echo \# Print the value of the variable
echo $var

echo \# Print the length of the variable
echo ${#var}

echo \# Print the substring starting at position 2
echo ${var:2}


# Print the value of the variable
Hello World


# Print the length of the variable
11
# Print the substring starting at position 2
llo World


In [7]:
echo \# Print the substring starting at position 2 and ending at position 7
echo ${var:2:7}


# Print the substring starting at position 2 and ending at position 7
llo Wor


In [8]:
echo \# Replace the first occurrence of a pattern
echo ${var/Hello/Goodbye}

echo \# Replace all occurrences of a pattern
echo ${var//l/L}


# Replace the first occurrence of a pattern
Goodbye World


# Replace all occurrences of a pattern
HeLLo WorLd


In [9]:
echo ${my_var/Hello/Goodbye}

echo \# Variable expansion with multiple replacements

echo ${my_var//l/L}

Goodbye World
# Variable expansion with multiple replacements


HeLLo WorLd


## Introduction to bash functions

- Functions are a way to group commands for later execution using a single name for the group.
- Functions are declared using the following syntax:

```bash
function_name () {
    commands
}
```

- To call a function, simply write its name followed by any parameters (if any) and parentheses.
- Functions can be called anywhere in the script, even before they are declared.
- Functions can be declared in the global scope or inside other functions.



In [10]:
my_path="/home/username/Documents/Scripts"

echo ${my_path#/*/}
echo ${my_path##/*/}

username/Documents/Scripts


Scripts


In [11]:
my_path="/home/username/Documents/Scripts"

echo one ${my_path%/*/*s}
echo two ${my_path%%/*/*s}

one /home/username
two


In [50]:

# Global variable of commonly used colors
red=31
green=32
yellow=33
blue=34
purple=35
cyan=36
white=37


# Wrapper function for echo command with color
function info() {
    # $1 is the color code
    # The rest of the arguments are the strings to print
    color=$1
    shift
    echo -e "\e[1;${color}m$@\e[0m"
}


In [13]:

# Example usage
info $red "Hello World"
info $purple "Hello World"

echo \$0: $0

[1;31mHello World[0m


[1;35mHello World[0m
$0: /usr/bin/bash


## Special variables in bash functions

- `$0`: The name of the script.
- `$1`: The first argument passed to the function.
- `$2`: The second argument passed to the function.
- `$3`: The third argument passed to the function. And so on.
- `$@`: All arguments passed to the function.
- `$#`: The number of arguments passed to the function.
- `$?`: The exit status of the last command executed in the function.
- `$$`: The process ID of the current shell.
- `$!`: The process ID of the last background command executed in the function.
- `$*`: All arguments passed to the function.
- `$-`: The current options supplied to the shell.
- `$?`: The exit status of the last command executed in the function.


### Excercise 1: 

Write a function that takes two numbers as arguments and returns their sum.


## 2. Arrays in Bash

- Arrays are a way to store multiple values in a single variable.
- Arrays are declared using the following syntax:

```bash 
array_name=(value1 value2 ... valueN)
```

- To access an element in an array, use the following syntax:

```bash
${array_name[index]}
```

- To access all elements in an array, use the following syntax:

```bash
${array_name[@]}
```

- To get the length of an array, use the following syntax:

```bash
${#array_name[@]}
```

- To get the length of an element in an array, use the following syntax:

```bash
${#array_name[index]}
```

- To add an element to an array, use the following syntax:

```bash
array_name+=(value)
```

- To remove an element from an array, use the following syntax:

```bash
unset array_name[index]
```

- To remove all elements from an array, use the following syntax:

```bash
unset array_name[@]
```

- To remove an element from an array by value, use the following syntax:

```bash
array_name=(${array_name[@]/value})
```

- To remove an element from an array by index, use the following syntax:

```bash
array_name=(${array_name[@]:0:index} ${array_name[@]:index+1})
```


Commony use cases for arrays in bash scripts:

- Storing the output of a command in an array.
- Holding the arguments passed to a function.
- Holding a list of filenames or directories.
- Holding a list of strings to be used in a loop.
- Holding a list of numbers to be used in a loop.


In [14]:
# Useful common examples of using arrays in bash

# Declare an array
my_array=(one two three four five)


## Basic Math Expressions and Operations

- Arithmetic operations can be performed using the following syntax:

```bash
$((expression))
```

- The following operators can be used in arithmetic expressions:

| Operator | Description |
|----------|-------------|
| `+`      | Addition    |
| `-`      | Subtraction |
| `*`      | Multiplication |
| `/`      | Division |
| `%`      | Modulus |
| `**`     | Exponentiation |
| `++`     | Increment |
| `--`     | Decrement |



# Section 2: Flow Control and Repetition

1. **Conditional Statements**
   - Explanation of `if-else` and `case` statements with examples.
   - Interactive cells for students to write their own conditional statements.

2. **Loops in Bash**
   - Demonstrating `for`, `while`, and `until` loops with practical examples.
   - Exercises for students to write loops for specific tasks.

## Conditional Statements

- Conditional statements are used to execute a certain block of code if a condition is true.
- The following syntax is used to write conditional statements in Bash:

```bash
if condition
then
    commands
fi
```

- The `if` statement can be followed by an optional `elif` statement and an optional `else` statement.
- The `elif` statement is used to check another condition if the previous condition is false.
- The `else` statement is used to execute a block of code if all previous conditions are false.
- The `elif` and `else` statements must be preceded by an `if` statement.

```bash
if condition1
then
    commands
elif condition2
then
    commands
else
    commands
fi
```


In [15]:
# Example of conditional statement

if [ $my_var == "Hello World" ]; then
    echo "The condition is true"
else
    echo "The condition is false"
fi

bash: [: too many arguments
The condition is false


In [16]:
# Example using if elif else

if [ $my_var == "Hello World" ]; then
    echo "The condition is true"
    echo "You said hello"
elif [ $my_var == "Goodbye World" ]; then
    echo "You said goodbye"
else
    echo "The condition is false"
fi

bash: [: too many arguments
bash: [: too many arguments
The condition is false


### File and directory tests in bash
- `-e`: True if the file exists.
- `-f`: True if the file exists and is a regular file.
- `-d`: True if the file exists and is a directory.
- `-s`: True if the file exists and is not empty.
- `-r`: True if the file exists and is readable.
- `-w`: True if the file exists and is writable.
- `-l`: True if the file exists and is a symbolic link.
- `-x`: True if the file exists and is executable.
- `-z`: True if the length of the string is zero.
- `-n`: True if the length of the string is non-zero.


In [17]:
# Some prepartion for testing files and directories

mkdir -p test_dir
touch test_dir/test_file.txt
echo "Hello World" > test_dir/non_empty_file.txt
echo "echo Hello World" >test_dir/test_script.sh
chmod +x test_dir/test_script.sh


In [18]:
# Example of testing if a file exists

if [ -e test_dir/test_file.txt ]; then
    echo "The file exists"
else
    echo "The file does not exist"
fi

# Example of testing if a file is a regular file not a symlink

if [ -f test_dir/test_file.txt ]; then
    echo "The file is a regular file"
else
    echo "The file is not a regular file"
fi

if [ -f test_dir ]; then
    echo "The file is a regular file"
else
    echo "The file is not a regular file"
fi

# Example of testing if a file is a directory

if [ -d test_dir ]; then
    echo "The file is a directory"
else
    echo "The file is not a directory"
fi



The file exists
The file is a regular file
The file is not a regular file
The file is a directory


In [19]:

# A function to test and print the result of file and directory tests
# Function name: test_file
# Arguments: $1 is the file or directory to test
# Inside the function assign input=$1 for clarity
# More improvements could be made to this function.

function test_file() {
    input=$1
    if [ -e $input ]; then
        echo "$input exists"
    else
        echo "$input does not exist"
    fi

    if [ -f $input ]; then
        echo "$input is a regular file"
    else
        echo "$input is not a regular file"
    fi

    if [ -d $input ]; then
        echo "$input is a directory"
    else
        echo "$input is not a directory"
    fi

    if [ -r $input ]; then
        echo "$input is readable"
    else
        echo "$input is not readable"
    fi

    if [ -w $input ]; then
        echo "$input is writable"
    else
        echo "$input is not writable"
    fi

    # Combine two or more tests with logical AND
    
    if [ -f $input ] && [ -r $input ]; then
        echo "$input is a regular file and readable"
    else
        echo "$input is not a regular file or not readable"
    fi

}

# Warning: The above function is buggy. Can you fix it?
# Problem: 
# - The function does not work with files or directories with spaces in the name
# - What if the user does not pass an argument to the function?
# - What if user passes an empty string as an argument to the function?



### Numeric tests in bash
- `-eq`: True if the two operands are equal.
- `-ne`: True if the two operands are not equal.
- `-lt`: True if the first operand is less than the second operand.
- `-le`: True if the first operand is less than or equal to the second operand.
- `-gt`: True if the first operand is greater than the second operand.
- `-ge`: True if the first operand is greater than or equal to the second operand.


In [20]:
# Example of numeric comparison

num_a=10
num_b=20
num=0
empty_num=""

if [ $num_a -lt $num_b ]; then
    echo "$num_a is less than $num_b"
else
    echo "$num_a is not less than $num_b"
fi

# It is a good idea to use double quotes around variables
# to avoid problems with spaces in the variable value, 
# or if the variable is empty

if [ "$num" -gt "$num_b" ]; then
    echo "$num_a is greater than $num_b"
elif [ "$num_a" -lt "$num_b" ]; then
    echo "$num_a is less than $num_b"
else
    echo "$num_a is equal to $num_b"
fi


10 is less than 20
10 is less than 20


In [21]:

# Buggy code with empty string

if [ $empty_num -gt $num_b ]; then
    echo "$empty_num is greater than $num_b"
elif [ $empty_num -lt $num_b ]; then
    echo "$empty_num is less than $num_b"
else
    echo "$empty_num is equal to $num_b"
fi


bash: [: -gt: unary operator expected
bash: [: -lt: unary operator expected
 is equal to 20


In [22]:
# Exercise: Fix the above code to work with empty strings



### Combining logical operators
- `-a`: Returns true if both operands are true.
- `-o`: Returns true if either operand is true.
- `!`: Invert the result of the following expression.

In [33]:
# Example of using logical AND with `-a` flag and single square brackets

if [ "$num_a" -gt "$num" -a "$num_a" -lt "$num_b" ]; then
    echo "$num_a is greater than $num and less than $num_b"
else
    echo "$num_a is not greater than $num or not less than $num_b"
fi


info $green "# Combine logical AND and OR"

if [ "$num_a" -gt "$num_b" ] || [ "$num_a" -eq "$num_b" ]; then
    echo "$num_a is greater than or equal to $num_b"
else
    echo "$num_a is less than $num_b"
fi



10 is greater than -10 and less than 20
[1;32m# Combine logical AND and OR[0m
10 is less than 20


In [36]:

info $green "# Check if number is positive and even"
num=10

if [ "$num" -gt 0 ] && [ "$((num % 2))" -eq 0 ]; then
    echo "$num is positive and even"
else
    echo "$num is not positive and even"
fi

# Rewrite above example using double square brackets
# This is not POSIX compliant, but it will work with `bash` shell

if [[ "$num" -gt 0 && "$((num % 2))" -eq 0 ]]; then
    echo "$num is positive and even"
else
    echo "$num is not positive and even"
fi



[1;32m# Check if number is positive and even[0m


10 is positive and even
10 is positive and even


In [38]:

info $green "Another way with '-a' flag"

num=-10
info $green "# This works as expected"

if [ "$num" -gt 0 -a "$((num % 2))" -eq 0 ]; then
    echo "$num is positive and even"
else
    echo "$num is not positive and even"
fi

info $red "This is dangerous. It seems to work but the logic is wrong."
num=-10
if [ "$num" > 0 -a "$((num % 2))" = 0 ]; then
    echo "$num is positive and even"
else
    echo "$num is not positive and even"
fi

[1;32mAnother way with '-a' flag[0m


[1;32m# This works as expected[0m
-10 is not positive and even
[1;31mThis is dangerous. It seems to work but the logic is wrong.[0m
-10 is positive and even


In [39]:

info $green "# Alterative syntax for logical OR"
info $green "# with double square brackets"

if [[ "$num_a" -gt "$num_b" || "$num_a" -eq "$num_b" ]]; then
    echo "$num_a is greater than or equal to $num_b"
else
    echo "$num_a is less than $num_b"
fi

# Note: The above syntax is not POSIX compliant
# It will not work with the `sh` shell
# It will only work with `bash` shell


[1;32m# Alterative syntax for logical OR[0m
[1;32m# with double square brackets[0m
10 is less than 20


In [40]:

# Fix the following code:
if [[ "$num_a" -gt "$num_b"  -o "$num_a" -eq "$num_b" ]]; then
    echo "$num_a is greater than or equal to $num_b"
else
    echo "$num_a is less than $num_b"
fi

bash: syntax error in conditional expression
bash: syntax error near `-o'
10 is greater than or equal to 20
bash: syntax error near unexpected token `else'
10 is less than 20
bash: syntax error near unexpected token `fi'


: 2


### String tests in bash
- `=`: True if the strings are equal.
- `!=`: True if the strings are not equal.
- `-z`: True if the length of the string is zero.
- `-n`: True if the length of the string is non-zero.



In [None]:
# Excercise: String test. Write your own code to test string based on the information above and examples of numeric tests


## Branching with the case statement

- The `case` statement is used to execute a block of code based on a pattern.
- The following syntax is used to write `case` statements in Bash:

```bash
case expression in
    pattern1)
        commands
        ;;
    pattern2)
        commands
        ;;
    pattern3)
        commands
        ;;
    *)
        commands
        ;;
esac
```
This makes the code more readable and easier to maintain.

In [41]:
# Example of using case statement

fruit="apple"

case $fruit in
    apple)
        echo "This is an apple"
        ;;
    banana)
        echo "This is a banana"
        ;;
    orange)
        echo "This is an orange"
        ;;
    *)
        echo "I do not know what this is"
        ;;
esac


This is an apple


In [45]:
# Another example of using case statement, testing a string varaible starts with a certain character
# Wildcard matching is allowed. And multiple patterns can be tested with the same code block

fruit="Bapple"

case $fruit in
    a*|A*)
        echo "This fruit starts with a"
        ;;

    b*|B*)
        echo "This fruit starts with b"
        ;;

    # Note: The order of patterns matters
    # Start with b or B, ends with e or E
    b*e|B*e)
        echo "This fruit starts with b and ends with e"
        ;;
    *)
        echo "I do not know what this is"
        ;;
esac


This fruit starts with b


In [47]:
# Case statement with regular expression matching and AND operator

fruit="Bapple"

case $fruit in
    [aA]*)
        echo "This fruit starts with a"
        ;;
    # Note: The order of patterns matters
    # Start with b or B, ends with e or E
    [bB]*[eE])
        echo "This fruit starts with b and ends with e"
        ;;

    [bB]*)
        echo "This fruit starts with b"
        ;;

    *)
        echo "I do not know what this is"
        ;;
esac

This fruit starts with b and ends with e


In [48]:
# Cases statement to check number

num=10

case $num in
    1)
        echo "The number is 1"
        ;;
    2)
        echo "The number is 2"
        ;;
    3)
        echo "The number is 3"
        ;;
    *)
        echo "The number is not 1, 2, or 3"
        ;;
esac


The number is not 1, 2, or 3


## 2. **Loops in Bash**
   - Demonstrating `for`, `while`, and `until` loops with practical examples.
   - Exercises for students to write loops for specific tasks.

### `for` loops in bash

- `for` loops are used to iterate over a list of items.
- The following syntax is used to write `for` loops in Bash:

```bash
for item in list
do
    commands
done
```


In [None]:
info $green "# Example of using for loop"

# Loop through numbers 1 to 10
for i in {1..10}; do
    echo $i
done

info $green "Loop through numbers 1 to 10 with step size 2"
for i in {1..10..2}; do
    echo $i
done

info $green Loop through a list of items
for i in apple banana orange; do
    echo "This is a $i"
done

info $green Set an array of fruits
fruits=(apple banana orange)

info $green Loop through the array
for fruit in ${fruits[@]}; do
    echo "This is a $fruit"
done


In [75]:
info $green For loop with output from a command

for i in $(ls); do
    echo $i
done

[1;32mFor loop with output from a command[0m
2016VangLe.GATKEssentials.org
2023_Novodan_LinuxEssentials.org
2023_Novodan_LinuxEssentials.pdf
2023_Novodan_LinuxEssentials.tex
data
examples
images
install_dependencies.sh
install_jupyterlab.sh
jupyter_venv
Learn_gawk_hands-on.ipynb
Session1_LinuxBasics.ipynb
Session2_Intro_Shell_Scripting.ipynb
Session3_Advanced_Shell_Scripting.ipynb
start_jupyterlab.sh
Teaching_Plan.docx
Teaching_Plan.md
test_dir


### `while` loops

- `while` loops are used to execute a block of code as long as a condition is true.
- The following syntax is used to write `while` loops in Bash:

```bash
while condition
do
    commands
done
```

`while` loops are useful when you don't know how many times you need to execute a block of code.


In [57]:
# Use while loop to read a file line by line

# Create a file with the following content with cat and heredoc
cat << EOF > test_dir/test_while.txt
1 one
2 two
3 three
4 four
EOF


In [62]:

# The following code will read the file line by line
# Create an arrary to store the lines, with from the first column, value from second column
# and print the line number and the content of the line
# 
my_array=()
i=0 # Set counter to 0 before the loop. This is important to unexpected counter values in the loop

while read line; do
    echo "Line $((++i)): $line"
    # Split the line into two elements by space
    a=($line)
    # Assign my_array a new item where the key is the first element of tempoary array a
    # and the value is the second element of tempoary array a
    my_array[${a[0]}]=${a[1]}

done < test_dir/test_while.txt

info $green "Print the array"
echo ${my_array[@]}
info $green "Print the array keys"
echo ${!my_array[@]}
info $green "Print the array values"
echo ${my_array[@]}
info $green "Print the array length"
echo ${#my_array[@]}

info $green Loop through the array by keys
for key in ${!my_array[@]}; do
    echo "Key: $key, Value: ${my_array[$key]}"
done

info $green Loop through the array by values
for value in ${my_array[@]}; do
    echo "Value: $value"
done


Line 1: 1 one
Line 2: 2 two
Line 3: 3 three
Line 4: 4 four
[1;32mPrint the array[0m
one two three four
[1;32mPrint the array keys[0m
1 2 3 4
[1;32mPrint the array values[0m
one two three four
[1;32mPrint the array length[0m
4
[1;32mLoop through the array by keys[0m
Key: 1, Value: one
Key: 2, Value: two
Key: 3, Value: three
Key: 4, Value: four
[1;32mLoop through the array by values[0m
Value: one
Value: two
Value: three
Value: four


### `until` loops

- `until` loops are used to execute a block of code as long as a condition is false.
- The following syntax is used to write `until` loops in Bash:

```bash
until condition
do
    commands
done
```
- It is similar to the `while` loop, except that the condition is reversed.

In [63]:
info $green Until loop example

# Set a counter
counter=0

# Loop until counter is greater than 5
until [ $counter -gt 5 ]; do
    echo $counter
    # Increment counter
    ((counter++))
done


[1;32mUntil loop example[0m


0
1
2
3
4
5


# Extra hands-on exercises


# Asignment: Basic Bioinformatics Workflow

- Task: Write a Bash script for basic file manipulations in bioinformatics.
- Example assignment: A script that filters a dataset based on certain criteria, counts the number of occurrences of specific patterns, or reformat data files.


In [65]:
# Use wget to download e-coli genome from NCBI

ecoli_genome=test_dir/ecoli_genome.fasta

# Check if the file exists, if not, download it
if [ ! -f $ecoli_genome ]; then
    wget -O $ecoli_genome.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz
    gunzip $ecoli_genome.gz
fi

# Count the number of lines in the file
wc -l $ecoli_genome

# Count the number of lines in the file that start with >
grep -c "^>" test_dir/ecoli_genome.fasta

# Count each nucleotide in the file

grep -v "^>" test_dir/ecoli_genome.fasta | grep -o . | sort | uniq -c



58022 test_dir/ecoli_genome.fasta
1
1142742 A
1180091 C
1177437 G
1141382 T


In [71]:

# Count each nucleotide in the file and sort by the number of occurences with the most frequent at the top
# Use gawk. 

grep -v "^>" test_dir/ecoli_genome.fasta | grep -o . | gawk '{count[$1]++} END {for (i in count) print count[i], i}' | sort -nr



1180091 C
1177437 G
1142742 A
1141382 T


In [73]:

# Use more advanced gawk-only solution to count each nucleotide in the file and sort by the number of occurences with the most frequent at the top
# Fasta header lines start with >, so we use them as first level of a 2-d array
# The second level of the array is each nucleotide in the sequence
# The value of each element is the number of occurences of each nucleotide

gawk '/^>/ {header=$0; next} {
    for (i=1; i<=length($0); i++) 
    count[header][substr($0, i, 1)]++
} END {
    for (header in count) {
        print header; 
        for (nucleotide in count[header]) print nucleotide, count[header][nucleotide] 
        }
}' test_dir/ecoli_genome.fasta # | sort -nr -k2,2 





>NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome
A 1142742
C 1180091
T 1141382
G 1177437


**Asignment:** Do the similar as above but for chromosomes of S. cerevisiae genome

Hint: Link to download the genome: 
