Advanced UNIX: Shell scripting {#chap:sscripting}
==============================

Shell scripting: What and Why
-----------------------------

Instead of typing all the UNIX commands we need to perform one after the
other, we can save them all in a file (a “script”) and execute them all
at once.

The <span>bash</span> shell we are using provides a proper syntax that
can be used to build complex command sequences and scripts.

Scripts can be used to automate repetitive tasks, to do simple data
manipulation or to perform maintenance of your computer (e.g., backup).
Indeed, most data manipulation can be handled by scripts without the
need of writing a proper program.

Scripting: How
--------------

There are two ways of running a script, say <span>myscript.sh</span>:

1.  The first is to call the interpreter bash to run the file (try this,
    but won’t work as you don’t have a <span>myscript.sh</span>
    script !)

In [None]:
$ bash myscript.sh # OR sh myscript.sh

(A script that does something specific in a given project)

2.  OR, make the script executable and execute it:

In [None]:
$ chmod +x myscript.sh
$ myscript.sh

(A script that does something generic, and is likely to be reused
    again and again – can you think of examples?)

The generic scripts of type (2) can be saved in
<span>username/bin/</span> and made executable (the <span>.sh</span>
extension not needed)

In [None]:
$ mkdir ~/bin
$ PATH=$PATH:$HOME/bin #Tell UNIX to look in /home/bin for commands

Your first shell script
-----------------------

Let’s write our first shell script! For starters,

\[$\quad\star$\]

Write and save <span>boilerplate.sh</span> in <span>
CMEECourseWork/Week1/Code</span>, and add the following script to it
(type it in a code editor like geany):

The first line is a “shebang” (or sha-bang or hashbang or pound-bang or
hash-exclam or hash-pling! – Wikipedia). It can also can be written as
<span>\#!/bin/sh</span>. It tells the bash iterpreter that this is a
bash script and that it should be interpreted and run as such. The hash
marks in the following lines tell the interpreter that it should ignore
the lines following them (that’s how you put in script documentation
(who wrote the script and when, what the script does, etc.) and comments
on particular line of script.

Geany users can enable send lines of code directly to terminal using a
keyboard key combination through two configuration steps:

1.  Enable “Send Selection to Terminal” with the <span>
    &lt;Primary&gt;Return</span> keys by going to geany’s
    <span>Edit &gt; Preferences &gt; Keybindings</span> menu item.

2.  Now edit (e.g., using geany) the file <span>geany.conf</span>. You
    can use geany itself to this:

In [None]:
$ geany ~/.config/geany/geany.conf

This will open <span>geany.conf</span> in geany. In this file, set
    <span> send\_selection\_unsafe=true</span>, then close the file, and
    restart geany.

Now run your boilerplate shell script by typing in the terminal:

In [None]:
$ bash boilerplate.sh

A useful shell-scripting example
--------------------------------

Let’s write a shell script to transform comma-separated files (csv) to
tab-separated files and vice-versa. This can be handy — for example, in
certain computer languages, it is much easier to read tab or space
separated files than csv (e.g., <span>C</span>)

To do this, in the bash we can use <span>tr</span>, which deletes or
substitute characters. Here are some examples.

In [None]:
$ echo "Remove    excess      spaces." | tr -s "\b" " "
Remove excess spaces.
$ echo "remove all the as" | tr -d "a"
remove ll the s
$ echo "set to uppercase" | tr [:lower:] [:upper:]
SET TO UPPERCASE
$ echo "10.00 only numbers 1.33" | tr -d [:alpha:] | tr -s " " ","
10.00,1.33

Now write a shell script to substitute all tabs with commas called
<span>tabtocsv.sh</span> in <span>Week1/Code</span>:

Now test it (note where the output file gets saved)

In [None]:
echo -e "test \t\t test" >> ../SandBox/test.txt
bash tabtocsv.sh ../SandBox/test.txt

Variables in shell scripting
----------------------------

There are three ways to assign values to variables (note lack of
spaces!):

1.  Explicit declaration: <span>MYVAR=myvalue</span>

2.  Reading from the user: <span>read MYVAR</span>

3.  Command substitution: <span>MYVAR=\$( (ls | wc -l) )</span>

Here are some examples of assignments (try it out save as <span>
Week1/Code/variables.sh</span>):

And also (save as <span>Week1/Code/MyExampleScript.sh</span>):

Some more Examples
------------------

Here are a few more illustrative examples (test each one out, save in
<span>Week1/Code/</span> with the given name):

<span>CountLines.sh</span>:

<span>ConcatenateTwoFiles.sh</span>:

Practical
---------

1.  <span>**Some instructions**</span>:

In [None]:
Along with the completeness of the practicals/exercises themselves,
you will be marked on the basis of how complete and well-organized
your directory structure and content is.

Review (especially if you got lost along the way) and make sure all
your shell scripts are functional: <span>boilerplate.sh</span>,
<span> ConcatenateTwoFiles.sh</span>, <span>CountLines.sh</span>,\
<span> MyExampleScript.sh</span>, <span>tabtocsv.sh</span>,
<span>variables.sh</span>

Don’t worry about how some of these scripts will run on my computer
without explicit inputs (e.g., <span>ConcatenateTwoFiles.sh</span>
needs two input files) — I will run them with my own test files.

Make sure you have your weekly directory organized with
<span>Data</span>, <span> Sandbox</span>, <span>Code</span> with the
necessary files, under <span> CMEECourseWork/Week1</span>.
<span>*All scripts should run on any other Unix/Linux
machine*</span> — for example, always call data from the <span>
Data</span> directory using relative paths.

**Make sure there is a <span>readme</span> file in every
week’s directory. This file should give an overview of the weekly
directory contents, listing all the scripts and what they do. This
is different from the <span>readme</span> for your overall git
repository, of which <span>Week 1</span> is a part. You will write a
similar <span> readme</span> for each subsequent
weekly submission.**

Don’t put any scripts that are part of the submission in your
<span>home/bin</span> directory! You can put a copy there, but a
working version should be in your repository.

2.  Finally, a small exercise: write a <span>csvtospace.sh</span> shell
    script that takes a <span>c</span>omma <span>s</span>eparated
    <span>v</span>alues and converts it to a space separated
    values file. However, it must not change the input file — it should
    save it as a differently named file.

In [None]:
Save the script in <span>CMEECourseWork/Week1/Code</span>, and run
it on the <span>csv</span> data files that are in
<span>Temperatures</span> in the master repository’s
<span>Data</span> directory.

<span>*Don’t modify anything (or refer to anything) in your local
copy of the master repository. All changes you make in the master
repository will be lost. Copy whatever you need from the master
repository to your own repository.*</span>

*Commit and push everything by next Wednesday 5 PM.*

This includes <span>UnixPrac1.txt</span>! Check the updated instructions
from Chapter \[chap:unix1\] on this practical.

Readings & Resources
--------------------

-   Plenty of shell scripting resources and tutorials out there; in
    particular, look up
    <http://www.tutorialspoint.com/unix/unix-using-variables.htm>