This one-hour workshop introduces the basics of interacting with a computer using Bash (the Bourne Again Shell). You may have heard the terms "command line", "terminal", "console", "shell", "interactive prompt", "git-bash", etc. These all refer to using the keyboard to control your computer by typing commands.
The material for this workshop is based on a UofT Coders lesson and the Software Carpentry Unix Shell lesson.
Preparation: To follow along with this workshop on your own computer, you
will need to have bash
installed. For Linux and Mac users, bash
is included
as part of the operating system. In OSX, it can be accessed with terminal
.
For Windows users, I recommend git bash
, which is packaged with
git (a common version control system). Follow the
instructions here
to install git bash. Another option
is Cygwin. As of August 2016, the Windows Subsystem
for Linux (WSL) provides access to most command line tools provided by Ubuntu.
See here for
instructions to enable/install the WSL.
After this workshop, you will have learned:
- what Bash is and when and why to use it
- how to navigate your computer from the command line using commands like
pwd
,cd
, andls
- how to work with files from the command line using commands like
cat
,cp
,mv
,mkdir
, andrm
- handy commands for working with files like
wc
,head
,tail
,history
, and the wildcard character*
- using "pipes" to combine commands and
>
to redirect output - where to get more information and help
- time permitting:
cut
,sort
, anduniq
At a high level, computers do four things:
- run programs
- store data
- communicate with each other, and
- interact with us
Computers can interact with us in many different ways, such as through a keyboard and mouse, touch screen interfaces, or using speech recognition systems. While touch and voice interfaces are becoming more commonplace, most interaction is still done using traditional screens, mice, touchpads and keyboards.
The graphical user interface (GUI) is the most widely used way to interact with personal computers. We give instructions (to run a program, to copy a file, to create a new folder/directory) with the convenience of a few mouse clicks. This way of interacting with a computer is intuitive and easy to learn, but it can become slow and frustrating if we want to repeat similar actions many times. For example, if we want to copy the third line of each of a thousand text files and paste them into a single file, using the traditional GUI approach of clicks will take several hours to do this.
This is where we take advange of the shell - a command-line interface to make such repetitive tasks automatic and fast. It can take a single instruction and repeat it as is or with some modification as many times as we want.
Typing instructions to the computer might at first seem counterintuitive and unnecessarily difficult, but there are several advantages to learning how to do this.
-
Repetitive tasks can be automated. Like in the example above, text-based commands will make it easy for us to repeat the same process many times.
-
Familiar tasks can be done very quickly. Think about learning a language: in the beginning it's nice to look things up in a dictionary (or a menu in a graphical program), and slowly string together sentences one word at a time. But once we become more proficient in the language and know what we want to say, it is easier to say or type it directly, instead of having to look up every word in the dictionary first.
-
Text-based interfaces are less resource-intensive and faster to develop than GUIs, which means some programs you might want to use don't have a GUI at all. Graphical components of programs require more resources from your computer to be run, and they also take more work to create in the first place. Many powerful programs are written without a graphical user interface (which makes it faster to create these programs) and to use these programs you often need to know how to use a text interface. For example, many of the best data analysis and machine learning packages are written in Python or R, and you need to know a bit about these languages to use them. Even if the program or package you want to use is in a different language, much of the knowledge you gain from understanding one programming language (like Bash) can be transferred to others. In addition, most powerful computers that you can log into remotely might only give you a text interface to navigate the computer with.
-
A record of text commands facilitates reproducibility. It is much easier to automate and reproduce / repeat any task once you have all the instructions written down. Compare being shown how to perform a certain analysis in spreadsheet software, where the instructions will be "first you click here, then here, then here...", with being handed the same workflow written down in several lines of codes which you can analyse and understand at your own pace.
The Shell is a program which runs other programs rather than doing calculations itself. Those programs can be as complicated as climate modeling software and as simple as a program that creates a new directory. The simple programs which are used to perform stand-alone tasks are usually refered to as commands. The most popular Unix shell is Bash, (the Bourne Again SHell — so-called because it’s derived from a shell written by Stephen Bourne).
When the shell is first opened, you are presented with a prompt, indicating that
the shell is waiting for input. It might have some text about your computer and
your user identity, and it probably ends with a $
symbol. Sometimes bash
commands are written down as $ ls
(for example): you don't have to type the
$
, only the command itself (ls
).
The prompt:
$
Let's try a command: ls
lists the contents of the current directory.
ls
If you type a command that Bash doesn't recognize, you'll get an error message:
ks: command not found
.
ks
Stuff on your computer is organized into files and folders (also called directories). You might be used to navigating through your filesystem using a navigator application and clicking on directory icons; we can do an equivalent thing from the command line.
First, the command pwd
(print working directory) tells us where we are in
the filesystem.
pwd
The output you see will likely be different depending on your operating system. We're all likely in our home directory, where terminal or Git Bash opens by default.
ls
lists the files and directories inside the current directory. We can add
flags or options to commands to modify their behaviour: try adding -F
and
see if you notice what changes about the output.
ls -F
In general, a Bash command begins with the command name (ls
), is followed by a
list of options that begin with -
or --
(-F
), and ends with arguments.
We can add an argument to ls
that specifies which directory we want ls
to
look in. Try the following commands and see what they do.
ls -a
ls Downloads
ls -F Downloads
ls -F -a /
ls -Fa /
ls *.txt
The flag -a
stands for "all": ls -a
shows all files and directories, even
hidden ones. The hidden directory ..
is shorthand for the parent directory,
and the hidden directory .
is shorthand for the current directory.
The character *
is a wildcard - it matches any character. The command
ls *.txt
will list any file that ends in .txt
, since *
matches anything
that comes before. This is handy if you're looking for something and you know
a bit about it, like that it ends in .jpg
or starts with a
, for example.
ls
has many options - we've just seen two. To get more information about a
command, you can use the option --help
, which will print out the manual page
for the command.
ls --help
Another way to access the manual page is with man
, which stands for manual:
man ls
It's possible that only one of these two will work depending on your environment.
Challenge
What does
ls
do when used with the-l
option? What about if you use both the-l
and the-h
option?
pwd
and ls
tell us about where we are and what's there, but we can also
move around through our filesystem. The command cd
(change directory) is
how to do this.
cd Downloads
pwd
ls
What if we want to go back up one level? We might try using the name of the higher directory:
cd madeleine # or whatever your home directory is called
But this gives an error: bash: cd: madeleine: No such file or directory
.
Path names are relative: commands like cd
and ls
try to relate file and
directory names to ones that can be seen from the current location. We have a
few options to go up one level:
- Specify the full path
cd /home/madeleine
pwd
- Use the shortcut
..
to go up one level
..
is shorthand notation for the parent directory,
the directory that contains the current directory.
cd Downloads
cd ..
pwd
- Use the shortcut
-
to go back to the previous directory
-
is shorthand for "the directory we were in last," which in this case also
happens to be the directory above us.
cd Downloads
cd -
pwd
- Use the shortcut
~
to go to the home directory
'~' is shorthand for the home directory, which in this case is where we want to go.
cd Downloads
cd ~
pwd
All of these options have the same result in our case where we want to go from a directory below the home directory back to home. But if we were somewhere else, they wouldn't necessarily be equivalent.
Challenge
Starting from
/Users/amanda/Desktop/data
, which of the following commands could Amanda use to navigate to her home directory, which is/Users/amanda
?Hint: play around with these choices on your own computer by creating and navigating to a folder called
data
on your desktop using the following commands:
cd Desktop
mkdir data
cd data
Remember that
cd -
will take you back to the previous directory after each trial solution you use, and thatpwd
will show you where you are to check if it worked.
- cd .
- cd /
- cd /home/amanda
- cd ../..
- cd ~
- cd home
- cd ~/Desktop/data/../..
- cd
- cd ..
Handy tip: tab completion
Whenever you're typing in your terminal, you can hit Tab
to auto-complete a
command, filename, or directory name. If there's only one possibility, it will
auto-complete; if there are more than one, hitting Tab
a second time will show
all the possibilities.
cd Des # and hit "Tab"
Handy tip: up-arrow
Hitting the up
arrow key will fill in the most recent command you've run.
Hitting Enter
will run it again. Hitting up
more than once will cycle back
in time through all the previous commands you've run.
We've seen how to navigate our computer using commands, and now it's time to actually make some changes.
Let's work on the Desktop so that we're all in the same place.
cd ~/Desktop
pwd
ls
The command mkdir
makes a directory in the current working directory. We'll
make a directory called 'thesis'.
mkdir thesis
ls
You could also create a directory using whatever file navigator you're used to, and it would turn out exactly the same.
Now let's go into the thesis directory and create a file.
cd thesis
touch intro.txt
ls
The command touch
will either update the timestamp of a file that exists, or
create a new file if the filename you give as an argument doesn't exist yet.
The file intro.txt
is blank, which we know because we just made it and haven't
done anything to it. In general, how can you tell if a file is empty? There are
many ways, and we'll see more that could be applied to this question as we go.
ls -lh intro.txt
The flag -l
stands for long, and it causes ls
to display more information
about a file: the permissions, user information, size, and time information. The
flag -h
stands for human-readable: it converts the file size from just bytes
to human-readable sizes like KB or MB. Using ls -lh
tells us that the file
intro.txt
has size 0.
Another way to see if a file is empty is to check directly what's inside it. The
command cat
(concatenate) will print the contents of a file to the screen.
cat intro.txt
There's no output because the file is empty.
You can copy a file using the command cp
. The first argument to cp
is the
filename you want to copy, and the second argument is the name of the destination
file.
cp intro.txt intro_v1.txt
ls
The command mv
moves a file to a new location or filename. Since renaming is
fundamentally the same thing as moving, you can use mv to rename files.
mv intro_v1.txt intro_backup.txt
ls
With mv
and cp
, you can specify different paths for the filenames. In this
example, supplying a destination directory but not a filename will tell mv
to
re-use the same filename.
mv intro_backup.txt ../
ls
ls ../
Challenge
Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it:
statstics.txt
After creating and saving this file you realize you misspelled the filename! You want to correct the mistake; which of the following commands could you use to do so?
cp statstics.txt statistics.txt
mv statstics.txt statistics.txt
mv statstics.txt .
cp statstics.txt .
The command rm
will remove the file whose name you supply as the argument.
Be careful: this is not reversible! The file will be deleted forever, bypassing
the recycling bin. If you do want a bit more warning, however, you can use the
flag -i
, which will ask for confirmation before removing the file.
touch test.txt
rm test.txt
touch test.txt
rm -i test.txt
Handy tip: setting an alias
You can define an alias
for any command that redefines
what happens when you type something. If you want the command rm
to always
ask for confirmation, for example, you can set an alias that runs rm -i
every
time you type rm
by running the command alias rm="rm -i"
. Running the
command alias
is not permanent: this alias will last only as long as your
current terminal window is open. If you want to set an alias permanently, you
can do this by editing your bash configuration file, which is beyond the scope
of today's material. This blog post
has more information about modifying your bash configuration, and
this one has
step-by-step instructions.
We've seen how to create, delete, move, copy, and rename files and directories. Now we'll learn some commands for working with and analyzing the contents of files.
First, a handy command that shows you the most recent commands you've run in reverse chronological order:
history
Using history
is a great way to quickly make a task reproducible - if you've been
trying different things and you find the one that works, you can find it back
again using history
. Better yet, you can redirect the output of a command from
the screen to a file, and doing this with history
saves the command history to
a file.
history > history.txt
ls
The symbol >
tells Bash to take the output of the first command and send it
to a file.
Now we have a file called 'history.txt' that contains the output of the history
command. Let's have a look at the contents of this file.
cat history.txt
cat
is great for seeing what's in a file, but if the file is very big, cat
will still try to print the entire thing. Two commands that are very helpful for
getting a taste of a file without a huge amount of output are head
and tail
.
head history.txt
tail history.txt
Both of these commands have an option -n
that specifies the number of lines to
return:
tail -n 20 history.txt
A command I use a lot is wc
, word count, which returns the number
of lines, words, and characters in a file.
wc history.txt
wc
has options to turn off and on each of its outputs: wc -l
just returns
the number of lines in the file.
wc -l history.txt
Much of the power of Bash comes from combining a few commands to make powerful pipelines. To do this, there's one more operation to learn: the pipe.
cat history.txt | tail
The vertical bar |
is a pipe: it tells Bash to take the output of the first
command and use it as the input to the next command. Remember that tail
takes
a file as its argument; now it's taking the output of cat history.txt
in place
of a filename. The result here is the same as tail history.txt
, which means
we probably don't need to use a pipe in this example. But it can be very useful
to skip intermediate steps sometimes: for example, instead of saving history.txt
,
we could have directly piped history
into the command tail
to see the most
recent commands we've run.
history | tail
You can combine multiple pipes in a chain as long as you want.
history | head | wc -l
Challenge
Which line of
history.txt
will be returned by the following command?
cat history.txt | head -n 5 | tail -n 1
Let's combine what we've learned so far with a few new commands to do something
useful: figure out which Bash commands we used most often. We already have a
file history.txt
which has the ~2000 most recent Bash commands in it, so we'll
use this file to do our analysis.
head history.txt
history.txt
has several columns: the first column is a number that gives the
order in which each command was run, and the second column is the command itself.
But we don't want to know all the specific options and arguments of each time we ran ls
, we want
to count just ls
by itself. We can think of there being at least
three columns: the first with the command number, the second with the command, and the
third with options and arguments.
My second all-time favourite Bash command is cut
: cut
cuts columns in a file
by a delimiter that you specify, and you can return one or more of the resulting
columns by itself.
cat history.txt | cut -d' ' -f4
-d
specifies which character delimits the columns; here it's a space
character. -f4
tells cut to return the 4th column. It's the 4th column and not
the 2nd because it looks like there's a few extra whitespace characters that cut
is counting as their own columns. You might need to change the number by trial
and error until you get a list of just commands, the output we want.
Next, we want a list of all the unique commands and how many times they appear.
There's a command uniq
that can do just that!
cat history.txt | cut -d' ' -f4 | uniq
But wait: some commands are still showing up multiple times. It turns out that
uniq
only collapses repeated lines if they are adjacent: if you have the
sequence ls, cd, ls
, uniq
will still return all three. We need an intermediate
step where we sort all the commands using the command sort
.
cat history.txt | cut -d' ' -f4 | sort | uniq
That's more like it! To count how many times each command turns up, we can add
the flag -c
to uniq
:
cat history.txt | cut -d' ' -f4 | sort | uniq -c
A final pipe to sort
will arrange these in order:
cat history.txt | cut -d' ' -f4 | sort | uniq -c | sort
sort
by itself sorts based on the first character; we can add the -n
flag
to sort by the numeric value of each line.
cat history.txt | cut -d' ' -f4 | sort | uniq -c | sort -n
If you wanted, you could save this output to a file as a record of your command use:
cat history.txt | cut -d' ' -f4 | sort | uniq -c | sort -n > common_commands.txt
When you're getting familiar with the command line, it can be hard to know what's possible and where to get help. It's not that easy to make an irreversible error, so a good way to learn is to try things. If you're wondering how to do something in particular, a Google search for your question, like "Bash how to get the second column of a text file", will usually return a StackOverflow page with the same or similar question. There's usually more than one way to do something, and it's up to you which methods you end up being comfortable with.
If you'd like a more in-depth tutorial, I recommend Software Carpentry's Bash lesson. This workshop is based on the Software Carpentry material, but there's much more in the online notes than we had time to cover today.
- searching: through history, files, file contents, etc.
- loops: repeating the same action on many different cases
- writing Bash scripts: saving your commands in a file to be re-used
- setting aliases to modify commands and save time typing