<a href="https://colab.research.google.com/github/lorenzopallante/BiomeccanicaMultiscala/blob/main/LAB/03-Intro_BashLinux/03-Intro_LinuxBash.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Laboratorio 3
**Introduction to Linux and Bash language**


Authors:
    
- Prof. Marco A. Deriu (marco.deriu@polito.it)
- Lorenzo Pallante (lorenzo.pallante@polito.it)
- Eric A. Zizzi (eric.zizzi@polito.it)
- Marcello Miceli (marcello.miceli@polito.it)
- Marco Cannariato (marco.cannariato@polito.it)

# Table of Contents

3. What is Linux? 
4. The PDB files
5. Exercises

**Learning outcomes:** 
- understand what is Linux and its basic commands and scripting codes
- know PDB files and their structure
- Try yourselves with exercises

**Remember to copy all the necessary files if you're using COLAB**

In [None]:
# copy over data repository
!if [ -n "$COLAB_GPU" ]; then git clone https://github.com/lorenzopallante/BiomeccanicaMultiscala.git; fi
!if [ -n "$COLAB_GPU" ]; then mv BiomeccanicaMultiscala/LAB/03-Intro_BashLinux/* .; fi

# What is Linux? 
- An operating system
- Unix was the predecessor of Linux
- One of the most popular platforms on the planet, Android, is powered by the Linux operating system
- An operating system is software that manages all of the hardware resources associated with your desktop or laptop
![Linux Image](https://www.redhat.com/cms/managed-files/tux-327x360.png)

***Why using Linux?***
1. Linux is **free**
2. It’s **fully customizable**
3. It’s **stable** (i.e. it almost never crashes)

*These characteristics make it an ideal OS for programmers and scientists*

The Linux operating system comprises several different pieces:

* **Bootloader** –  The software that manages the boot process of your computer. For most users, this will simply be a splash screen that pops up and eventually goes away to boot into the operating system.
* **Kernel** – This is the one piece of the whole that is actually called ‘Linux’. The kernel is the core of the system and manages the CPU, memory, and peripheral devices. The kernel is the lowest level of the OS.
* **Init system** – This is a sub-system that bootstraps the user space and is charged with controlling daemons. One of the most widely used init systems is systemd, which also happens to be one of the most controversial. It is the init system that manages the boot process, once the initial booting is handed over from the bootloader (i.e., GRUB or GRand Unified Bootloader).
* **Daemons** – These are background services (printing, sound, scheduling, etc.) that either start up during boot or after you log into the desktop.
* **Graphical server** – This is the sub-system that displays the graphics on your monitor. It is commonly referred to as the X server or just X.
* **Desktop environment** – This is the piece that the users actually interact with. There are many desktop environments to choose from (GNOME, Cinnamon, Mate, Pantheon, Enlightenment, KDE, Xfce, etc.). Each desktop environment includes built-in applications (such as file managers, configuration tools, web browsers, and games).
* **Applications** – Desktop environments do not offer the full array of apps. Just like Windows and macOS, Linux offers thousands upon thousands of high-quality software titles that can be easily found and installed. Most modern Linux distributions (more on this below) include App Store-like tools that centralize and simplify application installation. For example, Ubuntu Linux has the Ubuntu Software Center (a rebrand of GNOME Software) which allows you to quickly search among the thousands of apps and install them from one centralized location.

## Basic Command Structure

`$ command –x <args> -y <args> …<target>`

- `command`: Binary/script. Must be in current folder or in $PATH!
- `args`: Specific option, with its argument (e.g. –verbose False)
- `target`: Positional argument, e.g. a target file (if applicable)

<div class="alert alert-block alert-success"> 
    
    
In general, you can use `command –-help>` or `command –h>` to have an help message for the command used

</div>

<div class="alert alert-block alert-warning"> 
    
**REMEMBER** 

If you want to write code in bash you must start your code cell with  `%%bash`  or use  `!`  before your bash line
</div>

When you use a **native Linux environment** you will be able to run all your command inside a **terminal** using the bash syntax.

![bash2](imgs/bash.png)

In this tutorial, we will execute all cells in the notebook simulating the shell environment!

## The Directory Tree
The **Directory Tree** is the organitazion of folders and files in your computer.

***Note***: Unless explicitly specified, command will look for files (and create them) in the current directory. Different locations must be specified!

![title](imgs/01-Picture1.png)

To know your current location, you can type the following command: 

In [None]:
!pwd

## Basic Linux Commands

### pwd (Print Working Directory)
Command to get the current location inside the directory tree

In [None]:
!pwd

### echo 
Command to print text

> `echo [message]`

***Hint***: If you want to print more than a word you must use " at the beginning and at the end of your message!

*Examples:*
- `echo "ciao"` print ciao
- `echo "a"` print a
- `echo "$a" ` print the content of the variable a
- `echo "$(command)"` print the output of the command in the parentheses
- `echo "ciao">file.txt` create a file named file.txt writing ciao in it
- `echo "ciao">>file.txt` append ciao to the file named file.txt

In [None]:
%%bash
a=10
echo $a

### cd (Change Directory)

- `cd ..`: return to the previous level
- `cd ../../`: return back of 2 levels
- `cd foldername`: go the the folder named "foldername" (***Note: the folder must exist!!***)
- `cd absolute_path_folder`

<div class="alert alert-block alert-warning"> 

**WARNING**
    
Inside *notebooks*, the `cd` command affects only the present cell. This means that in a new cell your location will be again the notebook location.
    
</div>

In [2]:
%%bash
pwd
cd data/ #enter in the data folder
pwd
cd .. # return to the previous level
pwd
cd /usr/local/bin #go to the absolute location of the "bin" folder
pwd

/home/lorenzo/Documenti/GitHub/BiomeccanicaMultiscala/LAB/03-Intro_BashLinux
/home/lorenzo/Documenti/GitHub/BiomeccanicaMultiscala/LAB/03-Intro_BashLinux/data
/home/lorenzo/Documenti/GitHub/BiomeccanicaMultiscala/LAB/03-Intro_BashLinux
/usr/local/bin


### ls (list)
Lists all items (files and folders) in the folder you are in.
> `ls [options] [target]`

- `ls folder`: List items in the specified folder
- `ls -l`: Show more information about the files and folders listed
- `ls -a`: It also shows the hidden files in the folder

See the example below:

In [None]:
%%bash 
ls

### mkdir (make directory)
Command to create directories

> `mkdir [target]`

- `mkdir pippo`: Create a folder named "pippo"
- `mkdir -p pippo/pluto`: Recursively create folder tree "pippo/pluto"


In [None]:
%%bash
ls

In [None]:
%%bash
mkdir -p pippo
ls

### Exercise 1

- Print the current working directory in the directory tree
- Create a new directory called "exe-0" and enter in it
- Check if you made the correct stuff print the new working directory path. 

In [None]:
# try yourself

### Solution 1

In [None]:
%%bash
echo $(pwd)
mkdir -p exe-0
cd exe-0
echo $(pwd)

### touch
Command to create files

> `touch [target]`

- `touch file.txt`: create an empty file named file.txt
- `touch exe-0/file.txt`: create an empty file named file.txt in the ese-0 folder

In [None]:
%%bash
ls
touch file.txt

In [None]:
%%bash 
ls

### mv (move)
Command to rename or move files and folders

> `mv [source][destination]`

- `mv file.txt file1.txt `: rename file.txt to file1.txt
- `mv exe-0 exe-00 `: rename folder exe-0 in exe-00
- `mv file1.txt exe-00 `: move file1.txt into folder exe-00

<div class="alert alert-block alert-warning"> 

**WARNING**
    
If **[destination]** already extist in the target path the command `mv` will **overwrite** the file without asking.

</div>

In [None]:
%%bash 
mv file.txt file1.txt
ls

### cp (copy)
Command to copy files and folders

> `cp [source][destination]`

- `cp file.txt file1.txt `: copy file.txt to file1.txt
- `cp -r exe-0 exe-00 `: copy folder exe-0 in exe-00
- `cp file.txt exe-00/ `: copy file file.txt into folder exe-00

In [None]:
%%bash
cp file1.txt exe-0/
ls exe-0/

### rm (remove)
Command to remove files and folders

> `rm [target]`

- `rm file.txt`: remove file named file.txt
- `rm -r foldername`: remove folder named foldername

In [None]:
%%bash
rm -r pippo
rm file1.txt
ls

### cat 
Print the content of a file

> `cat [target]`

- `cat file.txt`: print the content of file.txt

In [None]:
%%bash
echo -e "1\n2\n3\n4" >file.txt # create a file with a number in each row
cat file.txt # print the content of the file

### head 
Print the beginning of a file

> `head [target]`

- `head file.txt`: print the beginning of file.txt
- `head -n 20 file.txt`: print the first 20 lines of file.txt

### tail
Print the end of a file

> `tail [target]`

- `tail file.txt`: print the end of file.txt
- `tail -n 20 file.txt`: print the last 20 lines of file.txt

In [None]:
%%bash
head -n 3 file.txt # print first 3 lines of file temp

In [None]:
%%bash
tail -n 2 file.txt # print last 2 lines of file temp


### grep
Command to isolate rows containing a specific pattern (word, number, character, etc...)

> `grep "PATTERN" [target]`

- `grep 3 file.txt`: isolate lines of file.txt with the number 3 
- `grep "ciao mondo" file.txt`: isolate lines of file.txt with the number pattern "ciao mondo"

In [None]:
%%bash
grep 3 file.txt # print first row(s) with 3  

### uniq
Finds repeated lines and keeps only one copy

> `uniq [target]`

- `uniq file.txt` keeps only uniq rows in file.txt

In [None]:
%%bash
echo -e "1\n2\n2\n2\n2\n2" >temp.txt # create file with duplicated rowsù
cat temp.txt
echo -e "\nUnique rows:\n"
uniq temp.txt 

### sort
Sorts the rows of a file in numerical and then alphabetical order. Numeric has priority over alphabetical

> `sort [target]`

- `sort file.txt` sort rows in file.txt

In [None]:
%%bash
echo -e "4\n3\n2\n1" >temp.txt # create file with descending order in line number
cat temp.txt
echo -e "\nSorted rows:\n"
sort temp.txt

### cut
Print specific parts of each line of a file (columns or characters).

> `cut [OPTIONS] [target]`

- `cut -c 1-5 file.txt` print characters in positions 1 to 5

In [None]:
%%bash
echo -e "ciao mondo\ncome va?\n123456789" >temp.txt # create file with descending order in line number
cat temp.txt 
echo -e "\nCutted rows:\n"
cut -c 1-6 temp.txt

### wc
Count the words, rows, or elements indicated by the options.

> `wc [OPTIONS] [target]`

- `wc -l file.txt` count number of lines in the file file.txt
- `wc -w file.txt` count number of words in the file file.txt
- `wc -m file.txt` count number of characters in the file file.txt
- `wc -c file.txt` count number of bytes in the file file.txt

In [None]:
%%bash
echo -e "ciao mondo\ncome va?\n123456789" >temp.txt # create file with descending order in line number
wc -l temp.txt #lines
wc -w temp.txt #words
wc -m temp.txt #characters
wc -c temp.txt #bytes

### Exercise 2

Create a folder called “exe-01” and move into it. Print the folder path and save it in a variable called pathx. Print the variable on your screen.

**Hints**: mkdir, cd, pwd, echo

In [None]:
# try yourself


### Solution 2

In [None]:
%%bash
mkdir exe-01
cd exe-01
pathx=$(pwd)
echo $pathx

### Exercise 3

Create a file called file01.txt inside the exe-01 folder previously created; write the pathx variable containing th current path into it; finally, print the file to the terminal

**Hints**: touch, pwd, cat

In [None]:
# try yourself


### Solution 3

In [None]:
%%bash
cd  exe-01/
touch file01.txt
pathx=$(pwd)
echo $pathx >file01.txt
cat file01.txt

### awk
awk is a command that integrates a complex and complete programming language.. we will see just a few functionalities

AWK can be defined as a generic filter for text files.
It processes one line at a time of the text file, performing different actions depending on whether the line meets certain conditions or contains certain patterns. It then reads line by line and applies the rules defined by the programmer to each.
* Everything that is to be read in the specific language of AWK must be inserted between ' '

* Actions in awk are enclosed in curly brackets { }
* Multiple instructions are executed in the order they appear and must be separated by ;

* The pattern consists of the expression enclosed between / /


> `awk '{awk code}' [target]`

**Examples using awk:** 

In [None]:
# loading example file text
!cp data/elenco.txt .
!cp data/numbers.txt . 

# have a look to the files
!cat elenco.txt
!echo -e "\n"
!cat numbers.txt

Print only a specific column..

In [None]:
!cat elenco.txt

In [None]:
%%bash
echo -e "\nSecond Column:\n"
cat elenco.txt | awk '{print $2}' #print the 2nd column

Print number of fields per row..

In [None]:
!cat elenco.txt

In [None]:
%%bash
echo -e "\nNumber of fields per row:"
cat elenco.txt | awk '{print NF}' #print the number of  filed

Print number of rows..

In [None]:
!cat elenco.txt

In [None]:
%%bash
echo -e "\nNumber of rows"
cat elenco.txt | awk '{print NR}' #print the number of  processed row

Change columns...

In [None]:
!cat elenco.txt

In [None]:
%%bash
echo -e "\nThird column = 2: "
awk '{$3=2}{print}' elenco.txt

Do some math...

In [None]:
!cat numbers.txt

In [None]:
%%bash
echo -e "\nFind maximum of 4th column:"
awk -v max=0 '{ if ($4>max) {max=$4} } END { print max }' numbers.txt

Do some math...

In [None]:
!cat numbers.txt

In [None]:
%%bash
echo -e "\nsum values of the third column:"
awk '{ sum += $3 } END { printf "%d\n", sum}' numbers.txt

# The PDB files

## What is a pdb file

The **Protein Data Bank (pdb)** file format is a textual file format describing the *three-dimensional structures* of molecules held in the [Protein Data Bank](https://www.rcsb.org/)

![title](imgs/pdb.png)

<div class="alert alert-block alert-success"> 
    
To have further information please visit: https://cupnet.net/pdb-format/

</div>



![pdb2vmd](imgs/pdb2vmd.png)

# Exercises

## Exercise 4

Copy the 1yzb.pdb file from the "data/" folder into a new folder named exe-03. Print the first 50 lines into a file named first50.pdb. Check that this succeeded by counting the lines of first50.pdb.

**Hints**: cp, cat, head, wc!

In [None]:
# try yourself

## Solution 4

In [None]:
%%bash
# solution
mkdir -p exe-03
cd exe-03
cp ../data/1yzb.pdb .
head -n 50 1yzb.pdb >first50.pdb
cat first50.pdb | wc -l

## Exercise 5 (intermediate) 

- Enter in the exe-03 folder you previously created
- Extract only the ATOM lines from 1yzb.pdb and save them to a file named atoms.pdb;
- Extract only the first 10 ATOM lines of 1yzb.pdb, and only the last 10 ATOMS of 1yzb.pdb: save these into first10.pdb and last10.pdb files respectively;
- Extract only ATOM lines from 10 to 20 from 1yzb.pdb file and put into a file named 10_20.pdb

**Hints**: grep, head, tail!

In [None]:
# try yourself


## Solution 5

In [None]:
%%bash
#solution
cd exe-03/
cat 1yzb.pdb | grep ATOM >atoms.pdb
head -n 10 atoms.pdb >first10.pdb
tail -n 10 atoms.pdb >last10.pdb
head -n 20 atoms.pdb | tail -n 10 >10_20.pdb

## Exercise 6 (intermediate)

- Enter in the exe-03 folder you previously created
- Create a new file named extracted.pdb consisting of ATOMs 1-10, 100-110 and the last 10 ATOMS from 1yzb.pdb 
- Extract only ATOMS with x lower than 1.5, and save them into x_below.pdb
- Extract only x y z coordinates and atom numbers from ATOMS in the 1yzb.pdb file; save this data to simple.pdb

**Hints**: cat, grep, head, tail, cut, awk!

In [None]:
# try yourself


## Solution 6

In [None]:
%%bash
#solution
cd exe-03/
cat 1yzb.pdb | grep ATOM >atoms.pdb
head -n 10 atoms.pdb >extracted.pdb
tail -n +100 atoms.pdb | head -n 11 >>extracted.pdb
tail -n 10 atoms.pdb >>extracted.pdb
# column 7 contains the x coordinates of the molecular structure
awk '$7<1.5 {print $0}' atoms.pdb >x_below.pdb
# column 2 contains the atom number
# column 7,8,9 contains the x,y,z coordinates respectively
awk '{print $2, $7, $8, $9}' atoms.pdb >simple.pdb

## Exercise 7 (advanced)

- Enter in the exe-03 folder you previously created
- Count how many ALA residues are contained in 1yzb.pdb. Save into a variable and print the variable
- Count how many trajectory steps are in 1yzb file (every step is separated by the word MODEL). Save it into a variable and print the variable
- Write a file sequence.txt with a number sequence from 1 to 1000

**Hints**: cat, grep, wc, uniq, sort, seq

In [None]:
# try yourself


## Solution 7

In [None]:
%%bash
#solution
cd exe-03/
num_ala=$(grep ALA atoms.pdb | grep CA | wc -l)
echo "The pdb collects $num_ala ALA residues"
nframes=$(grep -v REMARK 1yzb.pdb | grep MODEL | wc -l)
echo "The pdb has $nframes frames"
seq 1 1000 >sequence.txt