# Introduction

Welcome to command-line fundamentals! At the end of this tutorial you will be able to:  


*   Move between folders and browse their contents.
*   Manipulate and view files.
*   Execute programs and customize their parameters.
*   Control the input and output of commands.
*   Connect to a network computer.
*   Assess a computer's current usage/load.
*   Open and use a text editor from the command-line.







## What is the Command-Line?

The command-line interface (CLI) is a program for you to communicate directly with your computer via text. The command-line operates quite differently from how we typically use our computers, as there are no buttons or menus to click through with a mouse. Instead, we issue commands by typing them out (ex. "firefox").

## Why use the Command-Line?

There might not be any obvious benefits to leaving your beautiful, visual desktop environment behind might. Here are a few reasons that you might be considering learning to use the command-line:

1. The programs you want to use are only available as command-line programs.
2. High Performance Computer (HPC) clusters or remote computers typically don't have a graphical desktop. The command-line is your only option.  
3. Automation of repetitive tasks, so that you don't have to click/type the same things over and over again.
4. The command-line is standardized for almost all operating systems. It's like discovering a universal language to communicate with all machines!
5. Command-line programs are compatible with each other. The output of one program can fluidly become the input of the next. 
6. Precise control and authority over your computer. The command-line can be forced to bypass safety restrictions on operating system files.

[Source for inspiration for a few items on this list.](https://www.howtogeek.com/437682/command-lines-why-do-people-still-bother-with-them/)

## Using this Tutorial

This document is a Google Colab/Jupyter Notebook. It consists of interspersed blocks of text for reading, and code that is executed on a machine in the cloud (not on your computer). That means you don't have to worry about installing anything, and can play with code without any consequences!  

If you are in Google Colab, to execute and edit code, click the button in the top left that says "Open in playground". The first time you execute code you will have to acknowledge a warning that this notebook was authored by me (Katherine Eaton) not Google.

The format of this tutorial is written in the following style:  
1.   The topic (phrased as a question) with instructions.
2.   The code to execute.
3.   The explanation and interpretation.  

For example:
***

**How can I print text out to the screen?** 

Code that can be executed has a grey background, like the line below. In Google Colab, if you hover your mouse over the square brackets on the left-hand side, a play button will appear. Click it, and a new box should appear below containing the words "Hello there!". If you are in Binder running this as a Jupyter notebook, click into the grey code box, then select the "Run" button frmo the menu at the top. You've executed your first command-line code! 

In [0]:
%%bash
echo "Hello there!"

Echo statements provide a useful way to print helpful text to the screen, and organize complicated output. 
***

If you're lucky enough to have access to the "terminal" application on your computer (macOS and Linux), it's highly recommended that you also work through this tutorial afterwards (or in parallel) on your own machine. Note that Windows does not have a compatible terminal, and you will need to install an additional application to do so: https://mobaxterm.mobatek.net/

*The first line of all code blocks in this notebook begins with %%bash.  
This is a Google Colab/Jupyter specific instruction, and tells the notebook to use the "bash" program which executes it in the terminal). If you are working in an actual terminal on your computer, do not type %%bash, as it will be an unrecognized command.*

## The Computer in the Cloud

During this tutorial, we will manipulate files and move through folders that are present in the cloud computer but not on your local computer. At the start of each code block in this tutorial, we begin in a folder called "content". When the code block finishes, we are always returned to the folder "content" (even if we've moved somewhere else). If you are working on your local computer, you will have different folders present and your location will not reset unless you log out or close the terminal application.

If you make a permanent change to the cloud computer (ie. accidentally delete a file), do not panic! To regenerate a "fresh" computer, go to File -> Save a copy in Drive. This will make a new copy of this notebook elsewhere in your Google Drive, with a refreshed computer with the filesystem intact.

# Part 1: Navigation
***

### **Where am I currently located in the computer's folder system?**

In [1]:
%%bash
pwd

/content


You are in a folder called *content*.  
*pwd* stands for **P**rint **W**orking **D**irectory.  
Directory is the alternate name for folder.
***

### **What files or folders are in the current directory?**

In [2]:
%%bash
ls

sample_data


Just a single folder called *sample_data*.  
*ls* **lists** a directory's contents.
***

### **How do I move to a different location?**  

1. Navigate down into the *sample_data* folder.  
2. Confirm that you have succesfully changed places.  
3. Then check what files are in it.

In [0]:
%%bash
cd sample_data/
pwd
ls

These three commands are executed in order:  
1.   First we **C**hange **D**irectores, but this command prints no output so we have no idea if it actually worked. So we then,
2.   **P**rint our **W**orking **D**irectory. Which confirms that yes, we are are in the folder sample_data which is within the folder content (represented as /content/sample_data).
3.   Finally, we **L**ist the folder's contents. You should see several files, including one called README.md. This is a file that is written in the [Markdown Language](https://en.wikipedia.org/wiki/Markdown), hence the .md suffix. But it's actually just a text file, and we will take a look at it's contents shortly!
***

### **What if I want to list the contents of a folder I'm not currently in?**  

(Remember our location gets reset to "/content" each time).

In [0]:
%%bash
ls /content/sample_data

The folder contents should be the same as the previous code we just ran. We just didn't have to *cd* (**C**hange **D**irectories) first!
***

### **My computer's File Explorer/Finder has a button to move back/up just one level. How do I do that?**  

1. Navigate down to the sample_data folder.  
2. Check your location.  
3. Then navigate back up to the content folder.  
4. Check your location.

In [0]:
%%bash
cd sample_data/
pwd
cd ../
pwd

The notation *../* tells the computer to move up one folder level from where you are currently.  
The notation *../../* would mean move up two levels.  
This style of navigating is called using **Relative Paths** (because it is relative to where you are currently).  
A **Path** is a chain of folder names linked with a separator (in this case,  a "/") that tells the computer how to get to a particular destination.
***

### **What if I know exactly where I want to go to but don't want to list its path relative to me?**

In [0]:
%%bash
cd /content/sample_data/
pwd
cd /content/
pwd

This style of navigating is called using **Absolute Paths** (because they do not depend on where you are currently).
Note how the folder names after cd begin with "/". This is called the **root** because it is the first folder in the computer. On Windows, the equivalent is the C Drive (C:\).
***

### **What is inside the mysterious, nameless "root" (/) folder?**

In [0]:
%%bash
ls /

Lots of things! You will see folders such as "bin", and "home". "bin" is where programs are stored (otherwise known as **bin**aries because they're binary code). "home" is where your user account files would be. On your own computer, you could find something like /home/username/Documents.

Using the code below, edit the path after the *cd* command to explore the mysterious cloud computer.

In [0]:
%%bash
cd /usr/bin
ls

If you are on your local computer, be very careful as you navigate the root (/) folders as this is where files crucial to the operating system are contained. While just listing the contents of these folders are harmless, soon we will be able to manipulate files and folders...
***

# Part 2: File and Folder Manipulation
***

### **How do I move a file into a different folder?**

1.  Move down to the sample_data folder.  
2.  Move the file README.md up one level into the main content/ folder.  
3.  List the contents of sample_data/ and content/ to check if it worked.

In [0]:
%%bash
cd sample_data/
mv README.md ../
echo "These files are in the sample_data folder:"
ls
echo "These files are in the content folder:"
ls ../


*mv* stands for **M**o**V**e files from one place to the other. The first part is the command name (mv), the second is the origin file (README.md) and the third part is the destination (../ or one folder level up).

*Note: If you try to run this command again, you might get an error because the README.md file has already been moved from it's original destination!
***

### **How do I move a file to a specific folder?**  

Move the file README.md back down into the sample_data/ folder using an Absolute Path.   
Check the contents of sample_data.

In [0]:
%%bash
mv README.md /content/sample_data/
echo "These files are in the sample_data folder:"
ls sample_data

The file README.md has moved back into it's original folder.  
Note: An alternative way to achieve the same goal would have been: *mv README.md sample_data/*  
However, this uses a Relative Path, and the instructions asked for an Absolute Path.
***

### **How do I rename a file?**  

Rename the file README.md as READYOU.md

In [0]:
%%bash
cd sample_data/
mv README.md READYOU.md
ls

If the third part of the *mv* command is a filename rather than a folder, the origin file (ex. README.md) will be renamed.

Two variations that would have achieved the same goal:  
1. mv sample_data/README.md sample_data/READYOU.md
2. mv /content/sample_data/README.md /content/sample_data/READYOU.md  

The first option uses relative paths, the second uses absolute paths.
***

### **Make a copy of READYOU.md and call it README.md**

In [0]:
%%bash
cd sample_data/
cp READYOU.md README.md
ls

There are now two files, README.md and READYOU.md

Using the explanation from the previous step, can you write the two variations of the *cp* command using relative and absolute paths?
***

### **How do I delete a file?**

Delete the READYOU.md file.

In [0]:
%%bash
cd sample_data/
rm READYOU.md
ls

anscombe.json		      mnist_test.csv
california_housing_test.csv   mnist_train_small.csv
california_housing_train.csv




*rm* stands for **R**e**M**ove and it deletes files. Be very careful when using this command, as you can accidentally delete important files!

If you make a permanent change to the cloud computer (ie. accidentally delete a file), do not panic! To regenerate a "fresh" computer, go to File -> Save a copy in Drive. This will make a new copy of this notebook elsewhere in your Google Drive, with a refreshed computer with the filesystem intact. On your own computer, there is no undo!

***

### **How do I create a file?**

Create a file called READUS.md in the sample_data folder.

In [0]:
%%bash
touch sample_data/READUS.md
ls sample_data/

head_output.txt  READUS.md  sample_data  test_folder




*touch* is a command that **touch**es a file (opens/closes it) just to change the last modified date. However, when given the name of a file that does not exist, *touch* creates it instead.
***

### **How do I create a folder/directory?**

In [0]:
%%bash
mkdir test_folder/
ls

*mkdir* is a command that **M**a**K**es **Dir**ectories.

To prepare for the next command, move the READUS.md file into the newly created test_folder.
***


In [0]:
%%bash
mv sample_data/READUS.md test_folder/
ls test_folder/

mv: cannot stat 'sample_data/READUS.md': No such file or directory
READUS.md




### **How do I delete a folder/directory?**

We cannot simply use the *rm* command, as the code below will generate an error.

In [0]:
%%bash
rm test_folder/

We instead need to supply a flag to *rm*, the -r flag for **R**ecursion. This will allow the command *rm* to recursively walk through the contents of the folder, beginning with the contents, and deleting the folder itself last.

In [0]:
%%bash
rm -r test_folder
ls 

As you can imagine, this is wandering into dangerous territory, as entire folders could disappear in in the blink of an eye.

What would happen if you ran (**rm -r /**), to delete the root folder which is the entire computer? If you are on your own computer, DO NOT DO THIS. I repeat, never use the *rm* command unless you are absolutely certain of what you are doing.

But if you are on Google Colab, I dare you to run this ;)

In [0]:
%%bash
rm -r /

***

# Part 3: Programs and Parameters
***

### **How do I look at the beginning of a file?**

Take a look at the first few lines of the file README.md

In [0]:
%%bash
cd sample_data/
head README.md

*head* prints out the beginning of the file (or the head of a document). By default it prints the first 10 lines.
***

### **How do I look at the first 20 lines of a file?**

In [0]:
%%bash
cd sample_data/
head -n 20 README.md

The added part "-n 20" is called a *switch* or a *flag*. This is a standardized format in which commands can be customized. Flags begin with a dash (-) followed by a letter specifying the option selected. In this case we are changing the **n**umber of lines and thus an "n" is used. The part after the -n gives specifity to the option selected, in that we want 20 lines printed.

Flags will begin with a double dash (--) if they are preceeded by multiple letters (ex. --help).

Flags almost always preceed filenames (ex. README.md) and come after the command name (ex. head).  
***

### **How do I know what flags are possible for a given command?**

Investigate the possible flags for the command *head*.

In [0]:
%%bash
head --help

For almost every command, you can provide the flag *--help* or alternatively *-h* afterwards. Note that no additional options come afterwards (no number, like in the previous example -n 20) because we don't need to give further specifity to our option. We just need "help", plain and simple. We also don't need to give head a filename, we just want to know how the command works.
***

### **What other commands might we customize with flags?**

We previously used *ls* to **l**i**s**t the contents of a folder. Let's give it a flag to see the contents plus information about the date they were last modified.

In [0]:
%%bash
cd sample_data/
ls -l

*ls* run with the flag -l means **l**ong listing format. You get additional information, including the date and time last modified. The README.md file should show todays date and time if you ran the commands in Part 2. Otherwise it might say Jan 1 2000. Y2K, hmmm suspicious...
***

#Part 4: Input and Output
***

### **What is Input and Output?**

We have actually already seen command-line input and output in action so far.  
In the following code-block, the text file README.md is the **input** to the *head* command.  
After the code is executed, the first 10 lines of the file are the **output**.

In [0]:
%%bash
head sample_data/README.md

Sometimes it is useful to have output printed to the screen as we have been doing so far, but othertimes we want to redirect and manipulate that output.
***

### **How do I save the output of a command to a file?**

Instead of printing the output of *head* to the screen, we will redirect it to a file called: head_output.txt

In [0]:
%%bash
head sample_data/README.md > head_output.txt

The symbol ">" is the redirection operator, is tells the computer to take the output of the previous command (head sample_data/README.md) and redirect it into the filename that comes after (head_output.txt).  
Note that nothing was printed to the screen when you executed that code.  

What is inside head_output.txt?

In [0]:
%%bash
cat head_output.txt

The *cat* command prints a file's entire contents (and if multiple files were supplied con-**cat**-enates, their output).  
The contents of head_output.txt is just 10 lines, the output of our previous code!

*less* is also an alternative to *cat*. It allows you to scroll through the file page by page, moving forward and backwards with the arrow keys. The following code does now show its power in Google Colab as there is a line limit, but on a local computer you can scroll with the up and down arrow keys. Then hit the "q" key to quit.

In [0]:
%%bash
less sample_data/README.md

***

### **How do I send the output of a command to another command?**

How often does the word "sample" appear in the first 20 lines of the README.md file?

In [0]:
%%bash
head -n 20 sample_data/README.md | grep -c "sample"


Twice!  

Alright let's break it down.  

The first part should seem famililar ( *head -n 20 sample_data/README.md* ) will *output* the first 20 lines of the README.md file.  

Next is a new symbol, the **pipe** ( | ) , which is like the redirect ( > ) except instead of being used to send output to a file, it sends the output to a program. A series of connected commands is called a **pipeline**.

The new command that is in this example is *grep* which stands for **G**lobal **R**egular **E**xpression **P**rint. Ugh. grep is simply a search function for looking within files or input. In this case, we are searching for the word "sample", and using the flag -c which means report the **C**ount of the matches.  

So in summary, we use the head command to output the first 20 lines of README.md, then pipe ( | ) that output into the next command *grep* which counts the occurences "sample" just in those first 20 lines. 
 
**But what were the matches to "sample"?**  
In that case, just remove the -c flag from *grep*.

In [0]:
%%bash
head -n 20 sample_data/README.md | grep "sample"


***

### **What is stdout?**

stdout is **St**andard **Out**put or the default place where the output of a program gets written. It is a physical file on the computer (/dev/stdout) and also gets written to your screen when code finishes executing.

The following code:

In [0]:
%%bash
head sample_data/README.md

Is equivalent to:

In [0]:
%%bash
head sample_data/README.md > /dev/stdout

Just in the first example, it is implied (by default) that we are writing the output to stdout (and also our terminal screen).  
You may encounter the notation */dev/stdout* as you read and write more complicated **pipelines**.
***

### **What is stdin?**

stdin is **St**andard **In**put and it is the default place in which programs read input. It is also a physical file on the computer (/dev/stdin).

The following code (that we've seen before):

In [0]:
%%bash
head -n 20 sample_data/README.md | grep -c "sample"

Is equivalent to:

In [0]:
%%bash
head -n 20 sample_data/README.md | grep -c "sample" /dev/stdin

Just in the first example, it is implied (by default) that the grep command is reading its input from stdin (ie. input that is being fed to it through the pipe).  
You may encounter the notation */dev/stdin* as you read and write more complicated **pipelines**.

#Part 5: Network Computers
***

You may at some point need to connect to a different computer over the network. This may be a remote server, or a High Performance Computing (HPC) cluster with greater computational power for your analysis. The following instructions apply if:
1. The network computer is running an SSH server
2. The system administrator has created a user account and password for you.  

**The following code examples WILL NOT EXECUTE through Google Colab/Jupyter and are for explanation only. To practice execution, you will need to login through your own computer's terminal application.**
***

### **How do I connect to the remote computer?**

Login using ssh and your user credentials.

In [0]:
%%bash
ssh myusername@domain.com

As long as you have a verified account with the server you are trying to connect to, you should be prompted next for your password.
***

### **How do I disconnect from the remote computer?**

In [0]:
%%bash
exit

Executing *exit* will disconnect you from the remote computer, and return you to the location on your computer right before you executed the *ssh* command.
***

### **How do I copy files to the remote computer?**

Copying files over an ssh connection is done with the *scp* command (**S**ecure **C**o**P**y).

In [0]:
%%bash
scp sample_data/README.md myusername@domain.com:/home/myusername/

The format of this command is:
1. First - program (scp),
2. Second - file to be copied (sample_data/README.md)
3. Three - destination for the file, a combination of the computer ssh address (myusername@domain.com), a colon (:) and the absolute path of the desired destination on the remote computer (/home/myusername/)  
***

### **How do I copy files from the remote computer?**

In [0]:
%%bash
scp myusername@domain.com:/home/username/README.md sample_data/

The format of this command is:
1. First - program (scp),
2. Second - origin of the file, a combination of the computer ssh address (myusername@domain.com), a colon (:) and the absolute path of the origin on the remote computer (/home/myusername/README.md) 
3. Three - destination for the file on the local computer sample_data/
***

### **How do I check how busy/overloaded a computer is?**

A network computer is typically shared by multiple people. Therefore it important not to overload these machines by running too many computationally-intensive commands.  

Note: This command can be run in Google Colab/Jupyter.

In [0]:
%%bash
top | head -n 5 

[?1h=[H[2J[mtop - 04:07:55 up  2:16,  0 users,  load average: 0.00, 0.00, 0.00[m[m[m[m[K
Tasks:[m[m[1m   8 [m[mtotal,[m[m[1m   1 [m[mrunning,[m[m[1m   7 [m[msleeping,[m[m[1m   0 [m[mstopped,[m[m[1m   0 [m[mzombie[m[m[m[m[K
%Cpu(s):[m[m[1m  0.5 [m[mus,[m[m[1m  0.3 [m[msy,[m[m[1m  0.0 [m[mni,[m[m[1m 99.1 [m[mid,[m[m[1m  0.2 [m[mwa,[m[m[1m  0.0 [m[mhi,[m[m[1m  0.0 [m[msi,[m[m[1m  0.0 [m[mst[m[m[m[m[K
KiB Mem :[m[m[1m 13335192 [m[mtotal,[m[m[1m 10962620 [m[mfree,[m[m[1m   545944 [m[mused,[m[m[1m  1826628 [m[mbuff/cache[m[m[m[m[K
KiB Swap:[m[m[1m        0 [m[mtotal,[m[m[1m        0 [m[mfree,[m[m[1m        0 [m[mused.[m[m[1m 12527316 [m[mavail Mem [m[m[m[m[K




*top* shows the current state of the computer system. And since its shows A LOT of output by default, we've piped it to head so that we just see the first five lines.

On the first line, we see information about the "load average:" followed by 3 number. Let's use [these numbers](https://www.tecmint.com/understand-linux-load-averages-and-monitor-performance/) as the example (1.00, 0.40, 3.35). These numbers represent how overloaded the computer was over the last 1 minute, 5 minutes, and 15 minutes respectively.  

On a computer with a single CPU, this means:
1. Last minute (1.00). On average the CPU was used fully (100%). 1 processes was running (1.00.
2. Last 5 minutes (0.40). On average the CPU was idle by 60%. 0 processes were waiting for time (0.40).
3. Last 15 minutes (3.35). On average the CPU was overloaded by 235%. 235 processes were waiting for time (3.35).

Alternatively, we can examine current usage in the moment (instead of over time):


In [0]:
%%bash
top | head -n 15

[?1h=[H[2J[mtop - 04:11:43 up  2:20,  0 users,  load average: 0.00, 0.00, 0.00[m[m[m[m[K
Tasks:[m[m[1m   8 [m[mtotal,[m[m[1m   1 [m[mrunning,[m[m[1m   7 [m[msleeping,[m[m[1m   0 [m[mstopped,[m[m[1m   0 [m[mzombie[m[m[m[m[K
%Cpu(s):[m[m[1m  0.5 [m[mus,[m[m[1m  0.3 [m[msy,[m[m[1m  0.0 [m[mni,[m[m[1m 99.1 [m[mid,[m[m[1m  0.2 [m[mwa,[m[m[1m  0.0 [m[mhi,[m[m[1m  0.0 [m[msi,[m[m[1m  0.0 [m[mst[m[m[m[m[K
KiB Mem :[m[m[1m 13335192 [m[mtotal,[m[m[1m 10958996 [m[mfree,[m[m[1m   548508 [m[mused,[m[m[1m  1827688 [m[mbuff/cache[m[m[m[m[K
KiB Swap:[m[m[1m        0 [m[mtotal,[m[m[1m        0 [m[mfree,[m[m[1m        0 [m[mused.[m[m[1m 12524680 [m[mavail Mem [m[m[m[m[K
[K
[7m    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND   [m[m[K
[m      1 root      20   0   39192   6484   4988 S   0.0  0.0   0:00.03 run.sh    [m[m[K
[m      8 ro



With a few extra lines of top, we begin to see the start of a table. The rows are processes, or commands that are run by users (in this case, a user named "root"). The %CPU column indicates how much of the computer that particular command is actively occupying.  
***

If you are anticipating running a very computationally-intensive program, it is wise (and courteous) to use these methods to investigate the state of the network computer before carrying out your tasks.
***

### **I checked and the computer's not overloaded, how do I use extra CPU Power?**

Certain programs will give you the ability to perform tasks in parallel, or to use multiple CPUs simultaneously (also called "cores" or "threads"). This is referred to as **multi-threading** and is very commonly encountered in high-performance computing analysis.  

If a program has this feature enabled, it can be investigated using the --help flag. Let's check out the *make* command (*hint: look for the -j or --jobs flag).

In [0]:
%%bash
make --help

*make* is actually used to **make** or build programs. It's crucial in compiling human-readable programming code into a language that computers can understand.

So if one wanted to run the program *make* with 3 processes in parallel, it would be executed as:

In [0]:
%%bash
make --jobs 3
# OR
make -j 3

The flags --jobs 3 and -j 3 are equivalent. Note that when using a flag with multiple letters ("jobs"), that a double dash (--) is used.

Note: this block of code includes a "comment", the symbol (#). This tells the computer to ignore that line of code, it is just a note for humans to read, not machines to execute.

#Part 6: Text Editors
***

Eventually, you will reach a point where you need to edit or write text on the command-line. This may be because you are writing your own program, creating some data, editing some configuration files, etc. If you are connected to a network computer, you will not have access to a graphical text editor like Notepad/Text Edit/gedit. Instead, most people choose between one of these three programs:  

**nano, emacs, vim**

This code cannot execute in Google Colab/Jupyter so we cannot practice with these text editors. For guides on this subject please check out the following. I highly recommend **nano** as the simplest editor for beginners. Note that the nano basics guide is only about 4 paragraphs long as opposed to emacs and vim which are... not that short.

[nano](https://wiki.gentoo.org/wiki/Nano/Basics_Guide)  
[emacs](http://www.jesshamrick.com/2012/09/10/absolute-beginners-guide-to-emacs/)  
[vim](https://scotch.io/tutorials/getting-started-with-vim-an-interactive-guide)


In [0]:
%%bash
nano

In [0]:
%%bash
emacs

In [0]:
%%bash
vim

#Summary
***

In Part 1, we learned to navigate around the computer's file system efficiently and **list** folder contents. We covered the difference between **relative paths** and **absolute paths**.

In Part 2, we manipulated the file system by **moving**, **renaming**, and **deleting** files and folders.  

In Part 3, we learned about **command** syntax by specifying **parameters** using **flags**.

In Part 4, we learned about **redirecting** output to files and downstream programs. We also discussed **stdin** and **stdout** notation, and the useful viewing commands **cat** and **less**.

In Part 5, we previewed how to connect to a **remote computer** and transfer files over **ssh**. We explored how to check if a computer is **overloaded** before beginning our analysis.  

In Part 6, we introduced command-line **text editors** for manipulating the content of text files. 