<hr style="height:0px; visibility:hidden;" />

<h1><center><b>GL4U: Introduction<b></center></h1>

<h1><center>2. Unix Introduction</center></h1>

<div class="alert alert-block alert-success">

<b>Unix is very likely the most foundational skillset we can develop for bioinformatics</b> (and much more than bioinformatics). Many of the most common and powerful bioinformatics approaches happen in this text-based environment, and having a solid foundation here can make everything we’re trying to learn and do much easier. Here we are only going to go over a very brief intro which is taken in part from a 5-module <a href="https://astrobiomike.github.io/unix/unix-intro" target="_blank">Unix introduction</a> at <a href="https://astrobiomike.github.io" target="_blank">Happy Belly Bioinformatics.</a>

</div>
    
---

<center>This is notebook 2
of 4 of GL4U's Introduction Module Set. It is expected that the previous notebook has been completed already.</center>

---

[**Previous:** 1. Jupyter intro](01-jupyter-intro.ipynb)
<br>

<div style="text-align: right"><a href="03-R-intro.ipynb"><b>Next:</b> 3. R intro</a></div>

---
---

# Table of Contents

* [1. What is Unix?](#1.-What-is-Unix?) 
* [2. Why learn Unix?](#2.-Why-learn-Unix?) 
* [3. A few foundational rules](#3.-A-few-foundational-rules) 
* [4. Accessing example files](#4.-Accessing-example-files) 
* [5. Running commands](#5.-Running-commands) 
  * [date](#date) 
  * [head](#head) 
* [6. The Unix file-system structure](#6.-The-Unix-file-system-structure) 
  * [6a. Absolute vs relative path](#6a.-Absolute-vs-relative-path) 
* [7. Navigating in Unix](#7.-Navigating-in-Unix) 
  * [cd](#cd) 
  * [pwd](#pwd) 
  * [ls](#ls) 
* [8. Additional useful commands](#8.-Additional-useful-commands) 
  * [mkdir](#mkdir) 
  * [cp](#cp) 
  * [mv](#mv) 
  * [cat](#cat) 
* [9. Variables](#9.-Variables) 
* [10. The '*' wildcard](#10.-The-*-wildcard) 
* [11. Summary](#11.-Summary) 

<br>

---
---

<a class="anchor" id="what-is-unix"></a>

## 1. What is Unix?
The term "Unix" encompasses a family of operating systems that share a common ancestor that was initially developed in the late 1960s. It was built based off a modular design philosophy, where relatively simple tools can be strung together seemlessly in order to accomplish more complicated tasks. In short, it has spread like crazy throughout the computational world over the past 5 decades. 

Many successful offshoots of Unix have been developed, which is why we might see the term "Unix-like" used in order to encompass all the systems that aren't technically "Unix". 

Here are some terms that are often used interchangeably: 

| Term     | Definition          |
|:-------------|:------------------|
| `shell` | what we use to talk to the computer; anything where we are pointing and clicking with a mouse is a **G**raphical **U**ser **I**nterface (**GUI**) shell; something with text only is a **C**ommand **L**ine **I**nterface (**CLI**) shell |  
| `command line/terminal` | A text-based environment capable of taking input and providing output |  
| `bash` | A common programming language used in a Unix CLI |  

<hr style="height:0px; visibility:hidden;" />

---

<a class="anchor" id="why-learn-unix"></a>

## 2. Why learn Unix?
Getting familiar with working at a Unix-like command-line is one of the most fundamental skillsets we can develop for bioinformatics, but also much, much more. As Brian Kerrigan (a team member of the original Unix team) puts it in his 2019 book [*Unix: A history and a memoir*](https://www.cs.princeton.edu/~bwk/memoir.html):

> *Unix and its derivatives aren't widely known outside a particular technical community, but they are at the heart of any number of systems that are part of everyone's world. Google, Facebook, Amazon, and plenty of other services are powered by Unix-like operating systems. If you have a cell phone or a Mac, it runs on some version of Unix. If you have gadgets like Alexa at home or navigation software in your car, they're powered by Unix-like systems too.*

Being the framework for so much of our world, learning to speak its language also gives us greater access to things like remote servers and cloud-computing. It can allow us to access and manipulate large datasets we otherwise couldn't, and use programs we otherwise couldn't.

Which brings us back to it being foundational to bioinformatics. Many of the most common and powerful bioinformatics approaches happen in this text-based environment, and having a solid foundation here can make everything we’re trying to learn and do *much* easier. 


**So, here are a few reasons why learning Unix is worth it:**  

* It’s the foundation for most of bioinformatics (and much more)  
* Enables the use of non-GUI (Graphical User Interface) tools  
* Improves reproducibility (GUI's are super-convenient for lots of things, but they are typically not ideal when it comes to reproducibility)  
* Enables things like quickly performing operations on large files (without needing to read them into memory)  
* Can allow us to programmatically access data  
* Helps automate repetitive tasks (e.g., need to rename 1,000 files?)  
* Enables use of higher-powered computers elsewhere (servers/cloud-computing)  

<hr style="height:0px; visibility:hidden;" />

<div class="alert alert-block alert-info">
As mentioned above, we are only going to go over a very brief intro here to give us enough background about the commands we will use to process data in the <a href="https://github.com/nasa/GeneLab-Training/tree/main/GL4U">GL4U</a>  'omics-specific module sets. To learn more about Unix, check out the full 5-module <a href="https://astrobiomike.github.io/unix/unix-intro" target="_blank">Unix introduction</a> at <a href="https://astrobiomike.github.io" target="_blank">Happy Belly Bioinformatics.</a> 🙂
</div>

---

<a class="anchor" id="rules"></a>

## 3. A few foundational rules
* **Spaces are special!** The command line uses spaces to know how to properly break things apart. This is why it's not ideal to have filenames that contain spaces, but rather it's better to use dashes (**`-`**) or underscores (**`_`**) to separete terms – e.g., "draft-v3.txt" is preferred over "draft v3.txt").  

* The general syntax working at the command line goes like this: `command argument`.  

* Arguments (which can also be referred to as "flags" or "options" or "parameters") can be **optional** or **required** based on the command being used.  

### A reminder on running cells in a Jupyter notebook

Remember, to run a cell we first need to select it, then we can either click the "Run" button, represented as a "play" icon, at the top of the notebook, or we can use shortcut keys. Here are a few:

Run a cell and keep the same cell selected:
- Mac:     `CMD + RETURN`
- Windows: `CTRL + ENTER`  

Run a cell and move to the next cell:
- Mac:     `SHIFT + RETURN`
- Windows: `SHIFT + ENTER`

---

<a class="anchor" id="files"></a>

## 4. Accessing example files

Running this next code block will change your location to the folder containing the example files we will use in this JN. This will be the <b>only</b> time I would like you to blindly run a code block without necessarily understanding what it is doing 🙂

So select the following code block, and run it either with shortcut keys as noted above, or by clicking the "Run" button at the top of the document. After doing so, you will see a path appear indicating where we are located within the GitPod environment we are using.

In [None]:
cd /workspace/GeneLab-Training/intro
pwd

<br>
<b><center>Great! Now let's get started!</center></b>

---

<a class="anchor" id="running-commands"></a>

## 5. Running commands

Using the foundational rules described above, we will begin running some commands.

<a class="anchor" id="date"></a>

### date


`date` is a command that prints out the date and time. This particular command doesn't require any arguments:

In [None]:
date

When we run `date` with no arguments, it uses some default settings, like assuming we want to know the time in our computer's currently set time zone. But we can provide optional arguments to `date`. 

Optional arguments most often require putting a dash in front of them in order for the program to interpret them properly. 

Here, we are adding the `-u` argument to tell the `date` program to report UTC time instead of the local time – which will be the same if the computer we're using happens to be set to UTC time: 

In [None]:
date -u

Note that if we try to run the command above without the dash, we get an error (ignore the message that prints out highlighted in red, we wouldn't normally see that outside of a notebook):

In [None]:
date u

<div class="alert alert-block alert-info">
<b>Note</b>
<br>

Notice that the error above comes from the program `date`. So the program we wanted to use *is* actually responding to us, but it doesn't seem to know what to do with the letter `u` we gave it. And this is because it wasn't prefixed with a dash, like `-u`.

</div>

Let's see what happens if we try to enter this without the "space" separating `date` and the optional argument `-u`, the computer won't know how to break apart the command and we get a different error (again, ignoring the red output):

In [None]:
date-u

<div class="alert alert-block alert-info">
<b>Note</b>
<br>

Notice that unlike the first error we got with `date u`, this error is not coming from the `date` program, but instead the command line is telling us it can't find a command (or program) called "date-u". And it was looking for that because by missing the space in between the command `date` and the argument `-u`, we weren't telling it how to properly break things apart.

</div>


<div class="alert alert-block alert-warning">
<b>A note on error messages</b>
<br>
Error messages can often seem obtuse and just confusing (and sometimes they are), but in many cases they can also be helpful. Being able to interpret some of them is certainly an acquired skill, but it's always worth doing our best to try to pay attention to them if we're having trouble with something.
</div>

<a class="anchor" id="head"></a>

### head

Unlike `date`, most commands require arguments and won't work without them. `head` is a command that prints the first few lines of a file, so it **requires** us to provide the file we want it to act on. Run the code block below to print out the first few lines of a file called "example.txt" that we downloaded [above](#4.-Accessing-example-files): 

In [None]:
head example.txt

Here "example.txt" is the **required** argument, and in this case it is also known as a **positional** argument (we'll see examples of what's *not* a "positional" arugment below). 

Whether things need to be provided as positional arguments or not depends on how the command or program we are using was written. Sometimes we need to specify the input file by putting something in front of it (e.g., some commands will use the `-i` flag, but it's often other things as well).

There are also optional arguments for the `head` command. The default for `head` is to print the first 10 lines of a file. We can change that by specifying the `-n` flag followed by how many lines we want:

In [None]:
head -n 5 example.txt

How would we know we needed the `-n` flag for that? There are a few ways to find out. Many standard Unix commands and other programs will have built-in help menus that we can access by providing `-h` or `--help` as the only argument. Let's try `-h` first:

In [None]:
head -h

The error message above tells us that `-h` is an invalid option, and the tool version we are using kindly prints out how to access the help menu, so let's try with `--help`:

In [None]:
head --help

The `head --help` command spits out a lot of information (and `head` is a relatively simple command compared to many others), but somewhere in there we can see "-n, --lines..." (we could have used `--lines 5` instead of `-n 5` to get the same result), but even that can be confusing if we're not familiar with how this information is presented. 

The built-in help menu is a good place to start because it's often immediately accessible and might help. But if it's not working or if it's too confusing, quickly shifting to just do an internet search will often lead to a more easily understood answer. 

What options are available for a certain command, and how to specify them, are parts of this process that are not about memorization at all. We might remember a few flags or specific options if we happen to use them a lot, but searching for options and details when needed is definitely the norm!

<div class="alert alert-block alert-info">
    What we've done so far already really is the framework for how almost all things work at the command line! Multiple commands can be strung together, and some commands can have many options, inputs, and outputs and can grow to be quite long, but this is the general framework that underlies it all.
    <br>
    <br>
    <center><b>Becoming familiar with these baseline rules is important, memorizing particular commands and options is not!</b></center>
</div>

---

<a class="anchor" id="file-system"></a>

## 6. The Unix file-system structure

Computers store file locations in a hierarchical structure. We are typically already used to navigating through this stucture by clicking on various folders (known as directories in the Unix world) in a Windows Explorer window or a Mac Finder window. Just like we need to select the appropriate files in the appropriate locations there (in a Graphical User-Interface, or GUI), we need to do the same when working at a command-line interface. What this means in practice is that each file and directory has its own "address", and that address is called its "**path**". 

Additionally, there are two special locations in all Unix-based systems, so 2 more terms we should become familiar with: 
* the "**root**" location is where the address system of the computer starts
* the "**home**" location is where the current user's location starts
  

Here is an image of an example file-system structure. Let's take a peek at it. First imagine just "clicking" through folders (called "directories" in the Unix world) in a GUI in order to reach the file we want, "processing_notes.txt". Then we'll talk about it in terms of the "path" we could use to get to the same file at the command line.

<br>

![file_system_structure.png](../images/file_system_structure.png)

<br>

We tell the command line where files and directories are located by providing their address, their "path". If we use the `pwd` command (for **p**rint **w**orking **d**irectory), we can find out what the path is for the directory (folder) we are sitting in:

In [None]:
pwd

<div class="alert alert-block alert-info">
<b>Note</b>
<br>

The output of `pwd` provides the path starting from the special **root** location, because it begins with a leading `/`, which when in front like this is the special character that denotes the start of the address system.
</div>


Next, we'll use the `ls` command (for **l**i**s**t) to see what directories and files are in the current directory we are in:

In [None]:
ls

<a class="anchor" id="path"></a>

### 6a. Absolute vs relative path

There are two ways to specify the path (address in the computer) of the file we want to find or do something with:

* An **absolute path** is an address that starts from one of those two special locations we mentioned above: either the "root" (specified with `/`) or the "home" (specified with `~/`) location. 

* A **relative path** is an address that starts from wherever we are currently sitting.

These can sound a little more confusing at first than they are, so it's best to just look at some examples.

Let's start by looking again at the **`head`** command we ran above:

In [None]:
head example.txt

**What we are actually doing here is using a *relative path* to specify where the "example.txt" file is located.** The command line automatically looks in the current working directory if we don't specify anything else about a file's location. So this works specifically because there is a file called "example.txt" in the current directory where we are running the command from.

We can also run the same command on the same file, but specifying the file's location using an **absolute path**:

In [None]:
head /workspace/GeneLab-Training/intro/example.txt

In the command above, we are using the absolute path to the `example.txt` file, which includes the full path to the directory that holds the file, starting from the root (`/`) location, then providing the name of the file. 

The previous two commands both point to the same exact file. But the first way, `head example.txt`, will only work if we are entering it while we are in the directory that holds that file, while the second way will work no matter "where" we happen to be in the computer.

<div class="alert alert-block alert-info">
    <b>Note</b>
    <br>
    The address of a file, its "path", includes the name of the file also. It doesn't stop at the directory that holds it.
</div>

It is important to always think about *where* we are in the computer when working at the command line. **One of the most common errors/easiest mistakes to make is trying to do something to a file that isn't where we think it is.** 

Let's run `head` on the "example.txt" file again, using a relative path by just providing the name of the file, and then let's try it on another file, "notes.txt":

In [None]:
head example.txt

In [None]:
head notes.txt

Here the `head` command works fine on "example.txt", but we get an error message when we call it on "notes.txt" that tells us "no such file or directory" (ignore the red highlighted line, we wouldn't see that if outside of the notebook). 

If we run the `ls` command to **l**i**s**t the contents of the current working directory, we can see the computer is absolutely right – spoiler alert: it usually is – and there is no file here named "notes.txt":

In [None]:
ls

The `ls` command by default operates on the current working directory if we don't specify any location, but we can tell it to list the contents of a different directory by providing the path to that directory as a positional argument. Here we are telling it to list the contents inside the "experiment" directory:

In [None]:
ls experiment

We can see the file we were looking for is located in this sub-directory called "experiment". Here is how we can run `head` on "notes.txt" by specifying an accurate **relative path** to that file:

In [None]:
head experiment/notes.txt

---

<a class="anchor" id="navigating"></a>

## 7. Navigating in Unix

Next, we'll practice some commands that are useful when navigating within a unix file-system structure.

<a class="anchor" id="cd"></a>

### cd

We can move into the directory containing the file we want to work with by using the `cd` command (**c**hange **d**irectory). This command takes a positional argument that is the path (or address) of the directory we want to change into. This can be a relative path or an absolute path. 

In the next cell, try using the `cd` command to change into the "experiment" directory, by providing that as a positional argument to the `cd` command:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>
This could be done with a <i>relative path</i> (if we are in the "~/intro" directory currently):<br><br>
    
`cd experiment`

Or it could be done with an <i>absolute path</i> from anywhere in the computer:<br>

`cd /workspace/GeneLab-Training/intro/experiment`

</details>
</div>


<a class="anchor" id="pwd"></a>

### pwd

Now let's use `pwd` (**p**rint **w**orking **d**irectory) to see where we are:

In [None]:
pwd

And we can see we have successfully navigated to the "experiment" sub-directory.

<a class="anchor" id="ls"></a>

### ls

Next, we'll use `ls` (for **l**i**s**t) to see what is in this directory:

In [None]:
ls

The `ls` command by default operates on the current working directory if we don't specify any location.

We can also explicitly specify the current working directory with the shortcut, `.`. So the command above can also be run as follows to list all contents in the current working directory: 

In [None]:
ls .

From the output of `ls` when we are in the "experiment" directory (our current working directory), we can see the "notes.txt" file is in there.

Now try running `head` on the "notes.txt" file by just providing the file name "notes.txt" as the positional argument:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`head notes.txt`

</details>
</div>


Great. But now how do we get back "up" to the directory above us? One way would be to provide an absolute path to the `cd` (**c**hange **d**irectory) command, like `cd /workspace/GeneLab-Training/intro`, but there is also a handy shortcut. 

`..` are special characters that act as a relative path specifying "up" one level – one directory/folder – from wherever we currently are. So we can provide that as the positional argument to `cd` to get back to where we started:

In [None]:
cd ..

And we can check out where we are and what is here again with `pwd` and `ls` (having two commands like this one line after another will just run them one after the other and print the output of each command as it goes):

In [None]:
pwd
ls

Notice that by default the items are listed left to right in addition to stacked. However, it is sometimes useful to list one item per line. Use the next code block to run the `ls` command again, but this time add `-1` as an optional argument to list the items one per line:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`ls -1`
    
Or it could be done by specifying the current working directory with the `.` shortcut:<br>

`ls -1 .`

</details>
</div>


If we didn't know we needed to add `-1`, what could we have done to view the possible `ls` arguments? Use the code block below to test your answer:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`ls --help`

Notice that the `-1` argument is listed in the help menu and indicates that `-1` lists one file per line.

</details>
</div>

---

<a class="anchor" id="additional-commands"></a>

## 8. Additional useful commands

Here are a few of the other commands that we may come across during our work.

<a class="anchor" id="mkdir"></a>

### mkdir


`mkdir` (for **m**a**k**e **dir**ectory) is the command that allows us to create new folders/directories. It takes as a positional argument the name of the directory we want to create (it can also create multiple directories by providing multiple different directory names as positional arguments, each separated with a space).

Here is how we would create a new directory called "SolarSystem":

In [None]:
mkdir SolarSystem

And now check that it exists with the `ls` command:

Notice that the "SolarSystem" directory was created in our current working directory. If you want to create a new directory somewhere other than your current working directory, you can do so by specifying either a relative or absolute path to where you want your new directory created. For example, if you wanted to create the "SolarSystem" directory in your home directory, you could do so by providing an absolute path to `mkdir` from any where in your computer as follows:

`mkdir ~/SolarSystem`

Let's look at the contents inside of the "SolarSystem" directory we made above with the `ls` command, but specifying the new directory as a positional argument:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

This will list the contents of the current working directory by default:

`ls`


And this will list the contents of the "SolarSystem" directory (which is empty right now):

`ls SolarSystem`

</details>
</div>


Next, we'll learn about the **cp** and **mv** commands.

<div class="alert alert-block alert-warning">
<b><center>WARNING</center></b>
Using commands that do things like create, copy, and move files at the command line will overwrite files if they have the same name in the same location. And using commands that delete things will do so permanently. Use caution while getting used to things – and then forever after 🙂
</div>

<a class="anchor" id="cp"></a>

### cp

The `cp` (**c**o**p**y) command requires at least two positional arguments – the first is the file we want to act on, and the second is where we want it to go (which can include the name we want to give it). 

Here's how we can make a copy of "example.txt" and put it in the same directory just with a different name, while running `ls` before and after:

In [None]:
ls

In [None]:
cp example.txt example_copy.txt

In [None]:
ls

Remember the file name by itself in those two positional arguments above is actually a *relative path* that is just starting from the directory we are sitting in. 

So if we wanted to make a copy of "example.txt" and put it in the "SolarSystem" directory we just created above, we would just need to modify the second positional argument above to be `SolarSystem/example_copy.txt`. Try that out here, and then check that the copied file ended up in the "SolarSystem" directory by running `ls` on that directory:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

This will copy the file into the directory, giving it the name being specified here:

`cp example.txt SolarSystem/example_copy.txt`


And this will list the contents of the "SolarSystem" directory:

`ls SolarSystem`
    
If you are getting an error, it may be that the "SolarSystem" directory wasn't created in the above `mkdir` step.

</details>
</div>

If we want to copy (or move) something from a different directory into our current working directory, and keep the same original name of the file, we can use the special character of one period (`.`) as the second positional argument (as we saw above, the `.` is a shorthand for specifying the current working directory). 

Here is how we could copy a file from the "data" directory into our current working directory while keeping the original name:

In [None]:
cp data/example.csv .

In [None]:
ls

How could you change the command above if you wanted to copy the "example.csv" file to a directory one level up from your current working directory?

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

Here is one way:

`cp data/example.csv ..`

You don't need to create a cell and run that though.

</details>
</div>

<a class="anchor" id="mv"></a>

### mv

Like the `cp` command, the `mv` (**m**o**v**e) command has the same basic structure. It requires at least two positional arguments – the first is the file we want to act on, and the second is where we want it to go (which can include the name we want to give it). 

If we want to move (or copy) something and keep the same name, so long as it's going into a different directory than it's starting in, we can just leave the name off. For example, here is how we can move a file called "example.csv" from the "data" directory into the "SolarSystem" directory:

In [None]:
mv data/example.csv SolarSystem/

In [None]:
ls SolarSystem/

Since SolarSystem is a directory, the above commands would both work the same whether we included the `/` (as in "SolarSystem/") or not ("SolarSystem").

And since we *moved* the file that time, the original file no longer exists in the "data" directory:

In [None]:
ls data/

<div class="alert alert-block alert-info">
<b>Note</b>
<br>
The <code>mv</code> command is also what is used to rename files. This may seem strange at first, but remember that the path (address) of a file actually includes its name too (otherwise everything in the same directory would have the same path).
</div>

Use the command block below to rename the "example.csv" file in the "SolarSystem" directory to "example_original.csv", keeping it in that same directory. Then look in the "SolarSystem" directory to see if you were successful:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`mv SolarSystem/example.csv SolarSystem/example_original.csv`  
`ls SolarSystem`

</details>
</div>


Next, use the command block below to make a copy of the "example_original.csv" file in the "SolarSystem" directory and name the copy "example.csv", keeping the copy in that same directory. Then look in the "SolarSystem" directory to see if you were successful:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`cp SolarSystem/example_original.csv SolarSystem/example.csv`  
`ls SolarSystem`

</details>
</div>

<a class="anchor" id="cat"></a>

### cat

The `cat` command (which stands for con**cat**enate) will print all the contents of the specified file(s) to the terminal. It requires one or more positional arguments that it will print out in order, so if we gave it more than one, it would be concatenating them together for us. Similar to the `head` program, here is how we can use it using a relative path to the "notes.txt" file:

In [None]:
cat experiment/notes.txt

Use the command block below to concatenate the "notes.txt" file followed by the "example_original.csv" file (that we renamed above):
> Remember that the "notes.txt" file is in the "experiment" directory and the "example_original.csv" file is in the "SolarSystem" directory.

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`cat experiment/notes.txt SolarSystem/example_original.csv`  

</details>
</div>


As seen above, the contents of these files are printed to the terminal. However, if we wanted to save the output in a new file instead, we would need to use what's know as a **redirector**. We can re-direct the output from the `cat` command to a new file using the `>` symbol followed by the name of the new file as follows:

In [None]:
cat experiment/notes.txt SolarSystem/example_original.csv > new_file.txt
ls

<div class="alert alert-block alert-warning">
<b><center>WARNING</center></b>

Using the `>` redirector will overwrite the file we specify if it exists already.

</div>

Use the code block below to run `cat` on the new file we just created to see if indeed it contains the contents of the "notes.txt" file followed by the "example_original.csv" file:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`cat new_file.txt`  

</details>
</div>


How could we change the `cat` command above where we created "new_file.txt" if we wanted to create it in our home directory?

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

Here is one way: 

`cat experiment/notes.txt SolarSystem/example_original.csv > ~/new_file.txt`  

</details>
</div>

---

<a class="anchor" id="variables"></a>

## 9. Variables

A variable is a named object that holds data stored in memory. Variables can be helpful in making our code more readable and modifiable, and they are also inherently essential components of things like loops. 

To set a variable at a Unix-like command line, we need to provide the variable name we want, an equals sign, and then the value we want the variable to hold (with no spaces in between any of that). 

Here is an example:

In [None]:
my_var=Europa

Nothing prints out when the variable is set, but the value "Europa" has now been stored in the variable "my_var". 

To access what's been stored in a variable in a Unix-like environment, the variable name needs to be preceded by a `$` so the terminal knows to *evaluate* what follows, rather than just treating it as generic characters. 

To see this, we'll use the `echo` command. `echo` is a command that prints out whatever is provided to it (which turns out to be very useful in some places – like in a program, for example, to report information to the user). Here we'll use it to check what's being stored in our variable:

In [None]:
echo $my_var

In the next code block, try running `echo` on "my_var" without preceding it with the dollar sign:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`echo my_var`

If we don't tell the command line to evaluate "my_var" by preceding it with a dollar sign, it will just print that back to us.
    
</details>
</div>

Recall that spaces are special characters at the command line. If we wanted to set a variable that contained spaces, we would need to surround it in quotes to tell the command line that it should be considered as one thing:

In [None]:
my_new_var="Europa is awesome."

Use `echo` again below to print out the contents of this new variable:

<div class="alert alert-block alert-success">

<details>
<summary><b>Solution</b></summary>

<br>

`echo $my_new_var`
    
</details>
</div>

One last note about using variables at the Unix-like command line, it is generally good practice when trying to access our variable term to surround it with squiggly brackets `{}` after the dollar sign. This tells the command line specifically where our variable name starts and ends. Sometimes this doesn't matter, like in the cases above, but sometimes it does. 

For example, let's say we want to append the text "\_table" to the end of our variable name. If we try to do that without the brackets, it won't work:

In [None]:
echo $my_var_table

That prints back nothing, because the command line is looking for a variable called "my_var_table", which doesn't exist.

But if we surround the variable in squiggly brackets following the dollar sign, the command line knows exactly which part to try to evaluate:

In [None]:
echo ${my_var}_table

And that will produce what we expect in this case, "Europa_table". As mentioned above, sometimes this doesn't matter, but it never hurts to use the squiggly brackets when trying to access a variable, so it's a good habit to form.

---

<a class="anchor" id="wildcard"></a>

## 10. The `*` wildcard

Wildcards as used at the command line are special characters that enable us to use pattern-matching expressions. The most commonly used is the `*`, which represents any character any number of times.

As we've seen, running `ls` by itself lists the contents of the current working directory: 

In [None]:
ls

By default it assumes we want everything. But we can be more specific about what we're interested in by giving it a positional argument that narrows things down. 

If we wanted to only list all the files that end with ".txt", we could use the `*` wildcard like so:

In [None]:
ls *.txt

It's not just for `ls` that this works, but for many things, such as copying or moving files, or specifying what files to give a program. More on wildcards, and the also powerful redirectors, can be found [here](https://astrobiomike.github.io/unix/wild-redirectors).

---

<a class="anchor" id="summary"></a>

## 11. Summary

The information above is the foundation needed to start utilizing the Unix-like command line – which then gives us the capability to use lots of tools that only work at a command line, manipulate large files rapidly, access and work with remote computers, and more. This is all we will be covering as our foundation for the commands we will use to process data in the <a href="https://github.com/nasa/GeneLab-Training/tree/main/GL4U">GL4U</a>  'omics-specific module sets. To learn more about Unix, we recommend going through the full 5-part <a href="https://astrobiomike.github.io/unix/unix-intro" target="_blank">Unix introduction</a> at <a href="https://astrobiomike.github.io" target="_blank">Happy Belly Bioinformatics</a> if you plan to spend more time doing general bioinformatics work.

---

Next, we are going to move on to a brief [introduction to R](03-R-intro.ipynb).

---


**Terms introduced:**

| Term     | Definition          |
|:----------:|------------------|
| `path` | the address system the computer uses to keep track of files and directories |
| `root` | where the address system of the computer starts, `/` |
| `home` | where the current user's location starts, `~/` |
| `absolute path` | an address that starts from a specified location, i.e. root, or home |
| `relative path` | an address that starts from wherever we are |


<br>

**Commands introduced:**

|Command     |Function          |
|:----------:|------------------|
|`date`| prints out information about the current date and time |
|`head`| prints out the first lines of a file |
|`pwd` | prints out where we are in the computer (print working directory) |
|`ls`  | lists contents of a directory (list) |
|`cd`| change directories |
|`echo`  | print out whatever we tell it |
|`cp`  | copy a file or directory |
|`mv`  | move/rename a file or directory |
|`cat`  | print out contents of one or more files |


<br>

**Special characters introduced:**

|Characters     | Meaning          |
|:----------:|------------------|
| `/` | the computer's root location |
| `~/` | the user's home location |
| `./` or `.` | specifies the current working directory |
| `../` or `..` | specifies a directory one level "above" the current working directory |
| `>` | re-directs the output to a new file |
| `*` | a wildcard that represents any character, any number of times |

---

Next, we are going to move on to a brief [introduction to R](03-R-intro.ipynb).


---
---

[**Previous:** 1. Jupyter intro](01-jupyter-intro.ipynb)
<br>

<div style="text-align: right"><a href="03-R-intro.ipynb"><b>Next:</b> 3. R intro</a></div>
