# Basic Introduction to Linux

## Listing the contents of a directory

We start by looking at the contents of the Linux filesystem. To do this, we list the files contained within a directory by using the command 'ls'.

Hooray

In [1]:
ls

01 - Introduction to Linux Commands.ipynb  [0m[38;5;34mwelcome.sh[0m


As you can see, the 'ls' command lists the files or directories present in the directory. However, we do not have details about the files/directories.

To display more information, we can add options after the 'ls' command:

    -l long list format (show more detail)
    -a show hidden files also (filenames starting with . in Linux are hidden)
    -h show file sizes in human-readable format (e.g. KB, GB, TB)

In [3]:
ls -lh

total 24K
-rw-rw-r-- 1 jupyter jupyter 18K May 19 16:20 01 - Introduction to Linux Commands.ipynb
-rwxrwxr-x 1 jupyter jupyter  45 May 19 11:12 [0m[38;5;34mwelcome.sh[0m


The long list option provides more information about files and directories. From left to right, the listing provides information about:

    permissions  - (whether other can read/write/execute a file)
    owner/group  - the name and group of the user of the file/directory
    file size    - size of file in bytes
    date/time    
    filename
    
    
Permissions specify who can read/write/execute a file (owner, group, others) with 3 options r/w/x. In addition, it also tell you if it is a directory    

<img src="http://linuxcommand.org/images/permissions_diagram.gif">

In this case, all the files have the permission setting or -rw-rw-r--

- The first letter is '-' and not 'd' indicating that this is a file and not a directory
- The next group of 3 specify the permissions for the owner. In this case 'rw-' mean read/write access
- The 2nd group of 3 specify permissions for a group. 'rw-' means read/write access
- The 3rd group of 3 specify permissions for all others. 'r--' means only read access

## Finding out where we are

File and directories are organized in a hierarchical structure, starting with /.

For example:

<img src="https://www.codepuppet.com/wp-content/uploads/2012/06/linux_directory_structure.png" width=640>

Let's find out where are we in the directory structure by using the 'pwd' command


In [4]:
pwd

/data/jupyter/000/01 - Getting Started


## Creating, changing and deleting directories

We can make directories or folders to organize and store our files.

Here, let's make a directory called 'mydir' using the command mkdir (short for make directory)

In [5]:
mkdir mydir



Let's see the listing again using 'ls -l'

In [6]:
ls -lh

total 24K
-rw-rw-r-- 1 jupyter jupyter 18K May 19 16:24 01 - Introduction to Linux Commands.ipynb
drwxrwxr-x 2 jupyter jupyter  10 May 19 16:24 [0m[38;5;27mmydir[0m
-rwxrwxr-x 1 jupyter jupyter  45 May 19 11:12 [38;5;34mwelcome.sh[0m


We can see that now we have a new listing call mydir and the permissions are drwxrwxr-x. The first character in the permissions is 'd' indicating that this is a directory.

Having created the directory, we can enter it to look around using the command 'cd' (for change directory)

In [7]:
cd mydir



In [8]:
ls -l

total 0


No surprise we have an empty directory since it's newly created. We can add/delete files inside this directory. For example let us use the command 'touch' to make an empty file.

In [9]:
touch myfile



In [10]:
ls -l

total 0
-rw-rw-r-- 1 jupyter jupyter 0 May 19 16:26 myfile


Let's now get out of the directory by typing 'cd ..'

The '..' characters mean to go up one level (like backtracking).

In [11]:
cd ..



In [13]:
ls -lh

total 24K
-rw-rw-r-- 1 jupyter jupyter 19K May 19 16:28 01 - Introduction to Linux Commands.ipynb
drwxrwxr-x 2 jupyter jupyter  27 May 19 16:26 [0m[38;5;27mmydir[0m
-rwxrwxr-x 1 jupyter jupyter  45 May 19 11:12 [38;5;34mwelcome.sh[0m


We are now back out of 'mydir'.

We can also list the contents of a directory by typing 'ls -l (name of directory)'

In [14]:
ls -l mydir

total 0
-rw-rw-r-- 1 jupyter jupyter 0 May 19 16:26 myfile


Let us see how we can delete a directory using the command 'rmdir' (for remove directory)

In [15]:
rmdir mydir

rmdir: failed to remove ‘mydir’: Directory not empty


Oops! We cannot remove the directory if it contains file. To delete the directory we need to delete any files within the directory using the command 'rm' (for remove)

**Be very careful with the 'rm' command as a delete function in Linux does not have an undo function.**

We can remove the file we created in the directory in 2 ways:

1. change into mydir, remove the file, move out of mydir
2. remove the file within mydir by referring to the path

Let us use the 2nd method.

In [16]:
rm mydir/myfile



In [17]:
ls -l mydir

total 0


Now that the directory is empty, we can remove it.

In [18]:
rmdir mydir



In [19]:
ls -l

total 24
-rw-rw-r-- 1 jupyter jupyter 19535 May 19 16:30 01 - Introduction to Linux Commands.ipynb
-rwxrwxr-x 1 jupyter jupyter    45 May 19 11:12 [0m[38;5;34mwelcome.sh[0m


# Input/Output and Redirection/Piping

Next, let us look at how input and output is used in Linux and how we can redirect the output to another program or to a file

The input from the keyboard or another program is called **stdin** (standard in). This input can be processed by a program and this will generate a **stdout** (standard out) as well as a **stderr** (standard error).

NOTE: The **stdout** from a program can either be redirected into a file, or passed on to another program.

<img src="https://www.vlsci.org.au/documentation/fundamentals/using_unix/img/stdin_stdout.png">

As an example, let us use the command 'echo' to print out a statement to stdout

In [20]:
echo "I love this workshop"

I love this workshop


In this case, there was no error, so no stderr output was generated

Now, this stdout can be:

1. Redirected to a file to save the output
2. Passed on to another program

Let us try (1), which is directing the output of the echo command to a file. We do this using the '>' symbol.

In [21]:
echo "I love this workshop" > myfile.txt



In [22]:
ls -l

total 28
-rw-rw-r-- 1 jupyter jupyter 20286 May 19 16:32 01 - Introduction to Linux Commands.ipynb
-rw-rw-r-- 1 jupyter jupyter    21 May 19 16:34 myfile.txt
-rwxrwxr-x 1 jupyter jupyter    45 May 19 11:12 [0m[38;5;34mwelcome.sh[0m


We can take a look at the contents of a file by using the command 'cat'

In [23]:
cat myfile.txt

I love this workshop


Now, let us see (2), which is passing the stdout to another program.

We will first use the command 'wc' to count the number of characters in the myfile.txt. We use the option '-c' to display the number of character (instead of lines/words)

In [24]:
wc -c myfile.txt

21 myfile.txt


Here, we see that we have 21 characters.

We can repeat the same process of counting the number of characters generate from the echo command directly, without first saving it into a file.

To do this, we will use the '|' symbol which passes the stdout of a program to the next program.

In [25]:
echo "I love this workshop" | wc -c

21


## Paths and Running Programs

We used a few commands like 'wc' and 'echo' which refer to programs that provide the word count and echo functions.

For Linux to execute a command it needs to know where to find them. Let us take a look at the location of the commands we used by using the command 'which'

In [26]:
which echo

/bin/echo


In [27]:
which wc

/bin/wc


We can see that both commands reside in the /bin directory. In Linux, the paths can be set in an environmental variable called PATH.

We can talk a look at PATH by using the command 'echo'. Note in this case here, when we refer to a variable, we need to use the $ prefix.

In [28]:
echo $PATH

/opt/sge/bin:/opt/sge/bin/lx-amd64:/bin:/sbin:/bin:/usr/sbin:/usr/bin


Let's now take a look at a small program 'welcome.sh' in our current directory

In [29]:
ls -l

total 32
-rw-rw-r-- 1 jupyter jupyter 21613 May 19 16:38 01 - Introduction to Linux Commands.ipynb
-rw-rw-r-- 1 jupyter jupyter    21 May 19 16:34 myfile.txt
-rwxrwxr-x 1 jupyter jupyter    45 May 19 11:12 [0m[38;5;34mwelcome.sh[0m


Try running the program

In [30]:
welcome.sh

bash: welcome.sh: command not found...


The program can't be found because it is not in the PATH variable. We can however, specify the current directory when running the program.

In [31]:
./welcome.sh

Welcome to the Workshop!


Now, let us try running the command 'bwa', a commonly used alignment program for DNA sequences.

In [32]:
bwa

bash: bwa: command not found...


Oops, it looks like the command bwa isn't in the PATH. 

In this system, bwa was installed in a different directory and it is not accessible yet until its path is included in the environmental variable PATH.

## Setting the path with environment-modules

In many server or HPC systems, programs are installed in different directories and the list of PATHS are kept organized using the environment-modules program.

To see a catalog of available programs in the system, we can use the command 'module'

In [33]:
module avail


------------------------ /usr/share/Modules/modulefiles ------------------------
bio/abyss          bio/cufflinks      bio/mummer         bio/vcftools
bio/aihunter       bio/delly          bio/opossum        dot
bio/altanalyze     bio/diamond        bio/picard         module-git
bio/amos           bio/fastqvalidator bio/pindel         module-info
bio/annovar        bio/fastx-toolkit  bio/platypus       modules
bio/bamtools       bio/flash          bio/prinseq        null
bio/bcftools       bio/freebayes      bio/rapsearch2     use.own
bio/bedtools       bio/genometools    bio/samtools       util/bpipe
bio/blat           bio/hisat2         bio/scanindel      util/nextflow
bio/bwa            bio/htslib         bio/seqtk
bio/circos         bio/inchworm       bio/snap
bio/clustal        bio/mojo           bio/stringtie


We can see that there are several programs available in this system, including bwa.

Let us load the module using the command 'module load'

In [34]:
module load bio/bwa



In [35]:
bwa


Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.12-r1044
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa 

Now we are able to execute the program. This is because the 'module load' command added the location of bwa to PATH.

Let's take a look at PATH

In [36]:
echo $PATH

/usr/local/bio/bwa:/opt/sge/bin:/opt/sge/bin/lx-amd64:/bin:/sbin:/bin:/usr/sbin:/usr/bin


We see here that the path /usr/local/bio/bwa has been added to the PATH variable.

To check which modules have been loaded, we use the command 'module list'

In [37]:
module list

Currently Loaded Modulefiles:
  1) bio/bwa


We can also unload the program but using the command 'module unload'

In [38]:
module unload bio/bwa



In [39]:
module list

No Modulefiles Currently Loaded.


Let's try bwa again

In [40]:
bwa

bash: bwa: command not found...


In [41]:
echo $PATH

/opt/sge/bin:/opt/sge/bin/lx-amd64:/bin:/sbin:/bin:/usr/sbin:/usr/bin
