# Linux shell commands using Bash

In [1]:
# Two numbers should be produced and they should be identical
commands="tree find ls mv mkdir cp rm rmdir cat head tail wc fgrep cut seq shuf sed awk"
<<< ${commands} tr ' ' '\n' | wc -l
which ${commands} | wc -l

18
18


## Files and the Filesystem

Computer operating systems use a number of different data structures in order to operate.  We will start exploring two of those data structures: the filesystem and files.

Using [the CRUD model]( https://en.wikipedia.org/wiki/Create,_read,_update_and_delete ), we will start with the filesystem then later we'll explore files.  And we are going to start by using two "Read" commands: `tree` and `find`.


### tree and find commands

The filesystem is a hierarchy of files and directories ( aka folders ).
Folders can contain both files and folders ( called subfolders or subdirectories ).
The commands `tree` and `find` allow us to traverse ( aka descend or recurse ) this hierarchy.

> Note: As a shorthand, when I write `file`, I will often mean both files and folders.

Starting from a specified location, both commands will show all the files that are "lower" in the hierarchy.

In these two examples, I am specifying the starting location as `/etc/apt`.

In [2]:
tree /etc/apt

/etc/apt
├── apt.conf.d
│   ├── 01autoremove
│   ├── 01autoremove-kernels
│   ├── 01-vendor-ubuntu
│   ├── 70debconf
│   ├── docker-autoremove-suggests
│   ├── docker-clean
│   ├── docker-gzip-indexes
│   └── docker-no-languages
├── auth.conf.d
├── preferences.d
├── sources.list
├── sources.list.d
└── trusted.gpg.d
    ├── ubuntu-keyring-2012-archive.gpg
    ├── ubuntu-keyring-2012-cdimage.gpg
    └── ubuntu-keyring-2018-archive.gpg

5 directories, 12 files


Those are all the files that are "below" `/etc/apt`.

The `tree` command provides a nice, graphic representation of the hierarchy, 
with lower levels indented from upper levels and connected by lines.
We can distinguish between files and folders by using the `-F` flag ( aka option ), which adds a `/` to the end of folders.  

In [3]:
tree -F /etc/apt

/etc/apt
├── apt.conf.d/
│   ├── 01autoremove
│   ├── 01autoremove-kernels
│   ├── 01-vendor-ubuntu
│   ├── 70debconf
│   ├── docker-autoremove-suggests
│   ├── docker-clean
│   ├── docker-gzip-indexes
│   └── docker-no-languages
├── auth.conf.d/
├── preferences.d/
├── sources.list
├── sources.list.d/
└── trusted.gpg.d/
    ├── ubuntu-keyring-2012-archive.gpg
    ├── ubuntu-keyring-2012-cdimage.gpg
    └── ubuntu-keyring-2018-archive.gpg

5 directories, 12 files


Notice that all the folders end with a `/`, but files do not.

In contrast to the `tree` command, the `find` command provides a **listing** of all the files in the heirarchy. 

In [4]:
find /etc/apt

/etc/apt
/etc/apt/sources.list.d
/etc/apt/sources.list
/etc/apt/auth.conf.d
/etc/apt/apt.conf.d
/etc/apt/apt.conf.d/docker-clean
/etc/apt/apt.conf.d/01autoremove
/etc/apt/apt.conf.d/docker-no-languages
/etc/apt/apt.conf.d/docker-autoremove-suggests
/etc/apt/apt.conf.d/01autoremove-kernels
/etc/apt/apt.conf.d/01-vendor-ubuntu
/etc/apt/apt.conf.d/70debconf
/etc/apt/apt.conf.d/docker-gzip-indexes
/etc/apt/preferences.d
/etc/apt/trusted.gpg.d
/etc/apt/trusted.gpg.d/ubuntu-keyring-2012-archive.gpg
/etc/apt/trusted.gpg.d/ubuntu-keyring-2018-archive.gpg
/etc/apt/trusted.gpg.d/ubuntu-keyring-2012-cdimage.gpg


There's no easy equivalent option to `-F` with `find`.


### paths

The `/etc/apt/` notation specifies what is known as a `path`, which is a sequence of folder names separated by the `/` character.  For example, we can go a level deeper by appending `trusted.gpg.d` to the path.


In [5]:
tree -F /etc/apt/trusted.gpg.d

/etc/apt/trusted.gpg.d
├── ubuntu-keyring-2012-archive.gpg
├── ubuntu-keyring-2012-cdimage.gpg
└── ubuntu-keyring-2018-archive.gpg

0 directories, 3 files


### ls command

`tree` and `find` are not the only ways to traverse and display the filesystem hierarchy. 
`ls` is most commonly used to display the contents of a single folder.  And like `tree` it can use the `-F` option to append a `/` to folders.

In [6]:
ls -F /etc/apt/

apt.conf.d/
auth.conf.d/
preferences.d/
sources.list
sources.list.d/
trusted.gpg.d/


`ls` can also traverse the filesystem hierarchy, but I find the output a bit more challenging to interpret

In [7]:
ls -FR /etc/apt/

/etc/apt/:
apt.conf.d/
auth.conf.d/
preferences.d/
sources.list
sources.list.d/
trusted.gpg.d/

/etc/apt/apt.conf.d:
01autoremove
01autoremove-kernels
01-vendor-ubuntu
70debconf
docker-autoremove-suggests
docker-clean
docker-gzip-indexes
docker-no-languages

/etc/apt/auth.conf.d:

/etc/apt/preferences.d:

/etc/apt/sources.list.d:

/etc/apt/trusted.gpg.d:
ubuntu-keyring-2012-archive.gpg
ubuntu-keyring-2012-cdimage.gpg
ubuntu-keyring-2018-archive.gpg


`ls` can not only provide us with a listing of the contents of a folder, that is, a list of the names of files and folders.  It can also display meta-data, that is, information about the files and folders such as permissions, ownership, membership, size, and dates, when combined with the `-l` option.

In [8]:
ls -Fl /etc/apt/

total 24
drwxr-xr-x 2 root root 4096 Jan 26 08:32 apt.conf.d/
drwxr-xr-x 2 root root 4096 Jun 15  2021 auth.conf.d/
drwxr-xr-x 2 root root 4096 Apr 20  2018 preferences.d/
-rw-r--r-- 1 root root 2765 Jan 26 08:32 sources.list
drwxr-xr-x 2 root root 4096 Apr 20  2018 sources.list.d/
drwxr-xr-x 2 root root 4096 Jan 26 08:32 trusted.gpg.d/


To use this entry as an example:
```
-rw-r--r-- 1 root root 2765 Jan 26 08:32 sources.list
```

- `-rw-r--r--` are the permissions
- `1` the reference count
- `root` the owner
- `root` the group membership
- `2765` the size
- `Jan 26` the date of last modification
- `08:32` the time of last modification
- `sources.list` the name of the file




Within a folder there often "hidden" files and folders.  Files that begin with a dot ( "." ) are not displayed unless the `-a` option is used with `ls`.

In [9]:
ls -Fla /etc/apt/

total 32
drwxr-xr-x 7 root root 4096 Jan 26 08:32 ./
drwxr-xr-x 1 root root 4096 Feb 26 03:10 ../
drwxr-xr-x 2 root root 4096 Jan 26 08:32 apt.conf.d/
drwxr-xr-x 2 root root 4096 Jun 15  2021 auth.conf.d/
drwxr-xr-x 2 root root 4096 Apr 20  2018 preferences.d/
-rw-r--r-- 1 root root 2765 Jan 26 08:32 sources.list
drwxr-xr-x 2 root root 4096 Apr 20  2018 sources.list.d/
drwxr-xr-x 2 root root 4096 Jan 26 08:32 trusted.gpg.d/


Notice that we now see two additional folders: ./ and ../.  Those are actually "pointer" folders in that they point to the current folder ( ./ ) and to the parent folder ( ../ ).

There is much more to mention about the file system, in particular the remaing CRUD operations.  But we'll change our focus to files and return to the file system later.

## Files

### File contents

#### cat

Just like `ls` displays the contents of a folder, `cat` displays the contents of a file ... with some interpretation.  We'll use the file ` /etc/debian_version` as an example.

In [10]:
cat /etc/debian_version

buster/sid


### xxd

`cat` displays the characters that are in the file. But characters are not what is really stored on the filesystem.  Rather computers store everything as numbers, specifically 1's and 0's.  And we can display that using the `xxd` command.

In [11]:
xxd -b -g1 /etc/debian_version

00000000: 01100010 01110101 01110011 01110100 01100101 01110010  buster
00000006: 00101111 01110011 01101001 01100100 00001010           /sid.


We see three fields.  The field on the left is the address offset, ending in a colon ( : ). Then we see a series of 1's and 0's.  Those are called bits from "binary digits."  Those bits are the actual content of the file. On the right, we see the character interpretation of those 1's and 0's.

Here's the same output but where the bits have been put into groups of 8 called bytes.

In [12]:
xxd -b -g1 /etc/debian_version

00000000: 01100010 01110101 01110011 01110100 01100101 01110010  buster
00000006: 00101111 01110011 01101001 01100100 00001010           /sid.


On the far right of the output, we see that the first group of 8 bits ( the first byte ) gets interpreted as the letters 'b', the second as letter 'u', etc.  Also notice that the last byte '00001010' is being displayed as a dot ( '.' ). That byte is actually a non-printable character, one of many.  This one happens to be the end-of-line ( aka newline or '\n' ) character, which we will encounter more of later on.  Other non-printable characters frequently encountered include tab ( '\t' ) , carriage return ( '\r' ), and null ( '\0' ).

As an alternative to binary, we can display the bytes in a file in hexadecimal by using the `-u` option.

In [13]:
xxd -u -g1 /etc/debian_version

00000000: 62 75 73 74 65 72 2F 73 69 64 0A                 buster/sid.


We see that hex 62 maps to 'b', hex 75 to 'u', etc.

### Lines

The interpretation of the bytes in a file is left to programs.  




One way to interpret a file stream is as a collection of "lines" with each line being a collection of "text" characters.  Many files use this approach, e.g. CSV, YAML, HTML.  For these "text" files, a program can read the file one character at a time until it gets to an "end-of-line" character, then it can operate on that line, then read the next line.  `cat` does this with every line in a file. For example, we can have `cat` prefix each line with the line number:

In [14]:
cat -n /etc/debian_version

     1	buster/sid


That's not very exciting with a file that has only one line.  So, here's the same command run on a file with multiple lines:

In [15]:
cat -n /etc/os-release

     1	NAME="Ubuntu"
     2	VERSION="18.04.6 LTS (Bionic Beaver)"
     3	ID=ubuntu
     4	ID_LIKE=debian
     5	PRETTY_NAME="Ubuntu 18.04.6 LTS"
     6	VERSION_ID="18.04"
     7	HOME_URL="https://www.ubuntu.com/"
     8	SUPPORT_URL="https://help.ubuntu.com/"
     9	BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    10	PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    11	VERSION_CODENAME=bionic
    12	UBUNTU_CODENAME=bionic


In this case, `cat` reads each character until it gets to the end of the line, prints the line number followed by the line, then repeats the process until it gets to the end of the file.

Many other commands use this pattern of reading a line, operate on it, then repeat.  Let's look at a few:
- head
- tail
- cut
- wc
- file

BTW, for all these commands, much more details on options and how they work can be found using a Google search for "unix man " followed by the command.  For example, "[unix man head](https://www.google.com/search?q=unix+man+head)"

In [16]:
# head displays the first 10 lines of a file if not given any options
## you can specify more or fewer lines by giving it the option -n X, where X is a whole number
## here we get the first 4 words from a dictionary file
head -n 4 /usr/share/dict/words

A
A'asia
A's
AATech


In [17]:
# tail displays the last 10 lines of a file if not given any options
## you can specify more or fewer lines by giving it the option -n X, where X is a whole number
## here we get the last 4 words from a dictionary file
tail -n 4 /usr/share/dict/words

évolué
évolués
événement
événements


In [18]:
# cut displays the character range specified by the -c option or a field range specified by the -f option
## range is specified using 1-based counting
## here we get the first 6 characters from the /etc/debian_version file.
cat /etc/debian_version
cut -c 1-6 /etc/debian_version

buster/sid
buster


In [19]:
## here we get characters 7-10 from the /etc/debian_version file.
cut -c 7-10 /etc/debian_version

/sid


In [20]:
# wc gives a summary of how many lines, words, and characters there are in a file
wc /usr/share/dict/words

 654749  654749 6876726 /usr/share/dict/words


In [21]:
# file gives you a reasonable guess as to what type of file it is.
file /usr/share/dict/words
file /etc/dictionaries-common/words
file /usr/share/dict/american-english-insane
file /etc/debian_version
file /bin/grep

/usr/share/dict/words: symbolic link to /etc/dictionaries-common/words
/etc/dictionaries-common/words: symbolic link to /usr/share/dict/american-english-insane
/usr/share/dict/american-english-insane: UTF-8 Unicode text
/etc/debian_version: ASCII text
/bin/grep: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=4806f6fd2346800fffcaeedb877623aa54cf94e8, stripped


The caveat is that these programs only work if the file is organized as a "text" file.  That is, the bytes are interpreted as alpha-numeric characters with line endings.  When some other convention is used, then the file is termed a "binary" file. This can lead to some confusion as all files are "binary".  The difference is in how the bytes are organized in the file and interpreted by some program.

## Creating a file

The previous examples used pre-existing files.  Now we will use some commands that will create data and then put them into a file. We will explore the following commands:
- date
- echo
- seq
- curl

In [22]:
# date prints the date
date

Sun Feb 26 06:18:31 UTC 2023


We can tell a command to put the data into a file by redirecting its output.  That is done using the '>' symbol.  For example, to save the output from the `date` to a file called `date.txt`:

In [23]:
# show that date.txt does not exist
ls -F

abq.air-quality.dat
bash.intro.ipynb
code.kata.data-munging.bash.ipynb
code.kata.data-munging.python.ipynb
CRUD.ipynb
data-structures.files.folders.ipynb
date.txt
env.rc
hw.txt
notes.md
README.md
seq.txt


In [24]:
# generate a date and redirect the output to the date.txt file
date > date.txt


In [25]:
# show that date.txt now does exist
ls -F

abq.air-quality.dat
bash.intro.ipynb
code.kata.data-munging.bash.ipynb
code.kata.data-munging.python.ipynb
CRUD.ipynb
data-structures.files.folders.ipynb
date.txt
env.rc
hw.txt
notes.md
README.md
seq.txt


In [26]:
# display the contents of the date.txt file with a line number
cat -n date.txt

     1	Sun Feb 26 06:18:31 UTC 2023


In [27]:
# echo displays the provided text
echo 'Hello, world!'

Hello, world!


In [28]:
# to save output to a file
echo 'Hello, world!' > hw.txt

In [29]:
# display the contents
cat -n hw.txt

     1	Hello, world!


In [30]:
# seq generates a range of numbers
seq 1 10 > seq.txt
cat -n seq.txt

     1	1
     2	2
     3	3
     4	4
     5	5
     6	6
     7	7
     8	8
     9	9
    10	10


In [31]:
# curl GETs a webpage
## here it downloads a file containing air quality data from the city of Albuquerque
curl -s http://data.cabq.gov/airquality/aqindex/history/042222.0017 > abq.air-quality.dat
head abq.air-quality.dat


BEGIN_FILE
FORMAT_VERSION,2
AGENCY,0017
FILENAME,042222.0017
DATA_VERSION,201904222215
TZONE,MST,7
BEGIN_GROUP
VARIABLE,CO
DATA_TYPE,POINT
MEASUREMENT_TYPE,SAMPLE


## Command pipeline

Much like one can do method chaining in Python, Ruby, JavaScript, and other languages, commands can be piped together using a vertical bar '|'.  In this way, the output of one command can be piped as input into the next command.  For example:

In [32]:
# here the first ten lines of a file are numbered
head abq.air-quality.dat | cat -n

     1	BEGIN_FILE
     2	FORMAT_VERSION,2
     3	AGENCY,0017
     4	FILENAME,042222.0017
     5	DATA_VERSION,201904222215
     6	TZONE,MST,7
     7	BEGIN_GROUP
     8	VARIABLE,CO
     9	DATA_TYPE,POINT
    10	MEASUREMENT_TYPE,SAMPLE


In [33]:
# here only the first field is displayed from the first ten lines and then numbered
head abq.air-quality.dat | cut -d, -f 1 | cat -n

     1	BEGIN_FILE
     2	FORMAT_VERSION
     3	AGENCY
     4	FILENAME
     5	DATA_VERSION
     6	TZONE
     7	BEGIN_GROUP
     8	VARIABLE
     9	DATA_TYPE
    10	MEASUREMENT_TYPE


In [34]:
cat -n abq.air-quality.dat | shuf -n 10

   171	Del Norte HS 2      ,350010023,13.5,12.8,11.6,10.8,10.5,9.3,11.7,14.1,17.4,19.7,18.7,18.8,19.7,19.7,19.7,19.5,18.5,16.7,11.6,10.3,9.5,9.5
    63	Del Norte HS 1      ,350010023,36,33,26,35,33,20,16,28,-999,-999,50,51,52,51,50,50,48,49,45,44,43,43
    60	Foothills           ,350011012,G,G,G,G,G,G,G,G,G,B,B,G,G,G,G,G,G,G,G,G,G,G
     1	BEGIN_FILE
   198	MEASUREMENT_TYPE,SAMPLE
    69	DATA_TYPE,POINT
    70	MEASUREMENT_TYPE,SAMPLE
   239	START_REF,0
   185	START_REF,0
   150	BEGIN_DATA


In [35]:
shuf -n 100 /usr/share/dict/words | cut -c1-10 | head | cat -n


     1	orchillas
     2	architect
     3	Asterias
     4	stannate's
     5	justment
     6	gopak
     7	lithomarge
     8	turboexcit
     9	gladstone'
    10	adynamy


In [36]:
ls -la

total 124
drwxr-xr-x 1 jovyan jovyan  4096 Feb 26 06:18 ./
drwxr-xr-x 1 jovyan jovyan  4096 Feb 26 03:40 ../
-rw-r--r-- 1 jovyan jovyan  8508 Feb 26 06:18 abq.air-quality.dat
-rw-r--r-- 1 jovyan jovyan 23073 Feb 26 06:18 bash.intro.ipynb
-rw-r--r-- 1 jovyan jovyan  6889 Feb 26 03:09 code.kata.data-munging.bash.ipynb
-rw-r--r-- 1 jovyan jovyan  5189 Feb 26 03:09 code.kata.data-munging.python.ipynb
-rw-r--r-- 1 jovyan jovyan  1288 Feb 26 04:51 CRUD.ipynb
-rw-r--r-- 1 jovyan jovyan 21878 Feb 26 03:09 data-structures.files.folders.ipynb
-rw-r--r-- 1 jovyan jovyan    29 Feb 26 06:18 date.txt
-rw-r--r-- 1 jovyan jovyan   324 Feb 26 03:09 env.rc
-rw-r--r-- 1 jovyan jovyan    14 Feb 26 06:18 hw.txt
drwxr-xr-x 2 jovyan jovyan  4096 Feb 26 04:48 .ipynb_checkpoints/
-rw-r--r-- 1 jovyan jovyan    82 Feb 26 03:09 notes.md
-rw-r--r-- 1 jovyan jovyan   231 Feb 26 03:09 README.md
-rw-r--r-- 1 jovyan jovyan    21 Feb 26 06:18 seq.txt


In [37]:
file "4b1 - Naive Bayes.pdf"

4b1 - Naive Bayes.pdf: cannot open `4b1 - Naive Bayes.pdf' (No such file or directory)
