# Linux for Bioinformatics

## Introduction

Unix and Unix-like operating systems are the standard operating system on most large computer systems in scientific research, in the same way that Microsoft Windows is the dominant operating system on desktop PCs. In this course we will use Linux, a Unix-like operating system which was originally created to provide a free open source operating system for PCs.

Linux and MS Windows both perform the important job of managing the computer's hardware (screen, keyboard, mouse, hard disks, network connections, etc...) on your behalf. They also provide you with tools to manage your files and to run application software. They both offer a graphical user interface (desktop). These desktop interfaces look different between the operating systems, use different names for things (e.g. directory versus folder) and have different images but they mostly offer the same functionality.

Linux is a powerful, secure, robust and stable operating system which allows dozens of people to run programs on the same computer at the same time. This is why it is the preferred operating system for large-scale scientific computing. It runs on all kinds of machines, from mobile phones (Android), desktop PCs... to supercomputers.


### Why Linux?

Increasingly, the output of biological research exists as in silico data, usually in the form of large text files. Linux is particularly suitable for working with such files and has several powerful and flexible commands that can be used to process and analyse this data. One advantage of learning Linux is that many of the commands can be combined in an almost unlimited fashion. So if you can learn just six Linux commands, you will be able to do a lot more than just six things.

Linux contains hundreds of commands, but to conduct your analysis you will probably only need 10 or so to achieve most of what you want to do. In this tutorial we will introduce you to some useful Linux commands and provide examples of how they can be used in bioinformatics analyses.

## Learning outcomes
By the end of the tutorial you can expect to be able to:

* Understand the Linux directory structure and navigate around this structure
* Extract information from large files
* Use regular expressions to search for particular patterns in a file
* Create a bash script to perform several tasks at once

## Tutorial sections
This tutorial comprises the following sections: 

1. [Basic Linux](basic.ipynb) 
2. [Commands grep and awk](grep_and_awk.ipynb) 
3. [Advanced Linux (loops and Bash scripts)](advanced_linux.ipynb) 
4. [Bash scripting](bash.ipynb) 

**Note:** We do not expect you to get through all the material in the time allocated and a good target to aim for is the end of section 3 [Advanced Linux (loops and Bash scripts)](advanced_linux.ipynb). The remaining sections are optional and are for students who would like to expand their Linux skills and can be completed outside the course hours.

## Authors and License
This tutorial was created by [Jacqui Keane](https://github.com/jacquikeane) and [Martin Hunt](https://github.com/martinghunt).

The content is licensed under a [Creative Commons Attribution 4.0 International License (CC-By 4.0)](https://creativecommons.org/licenses/by/4.0/).

## Running the commands in this tutorial
You can follow this tutorial by typing all the commands you see in a terminal window on your computer. This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, open a terminal window and type the command below followed by the `Enter` key:

In [None]:
cd ~/course_data/linux/data

Now you can follow the instructions in the tutorial from here.

## Cheat sheet

We've also included a [cheat sheet](cheat_sheet.ipynb) at the end of this tutorial. It probably won't make a lot of sense now, but it might be a useful reminder of this module later in the tutorial.

## Let’s get started!
    
To get started with the tutorial, go to the next section: [Basic Linux](basic.ipynb)