# Fundamental shell commands



## References

- [Linux Phrasebook, Second Edition](https://learning.oreilly.com/library/view/linux-phrasebook-second/9780133038576/cover.html)

## Setup

Run `bash jupyter.light.sh` from the base folder in the terminal to start a Jupyter lab session with the bash Jupyter kernel.

In [None]:
apt-get update
apt-get install -y wamerican-insane bc

ls -la /usr/share/dict/
wc /usr/share/dict/words


## Data flow


- redirection( <<<, <<, <, <() , |, >, >>)
- grouping (), {}
- /dev/null
- /dev/urandom

These are used to stich together other commands.
We'll use these in examples below.

## Generators


- echo
- printf
- seq
- date
- yes
- bc
- xargs


In [None]:
# Generate some output
echo "Hello, world"

In [None]:
printf "Hello, world"

In [None]:
# Generate a sequence of numbers
seq 1 10

In [None]:
# Output the current date and time, often in the local time zone
date

In [None]:
# 'yes' goes on forever until stoped, i.e. an infinite loop
# here 'head' is used to stop 'yes'
yes 10 | head -5

In [None]:
# pipe commands into 'bc', a calculator
echo 1 + 1 | bc

In [None]:
# turns rows into columns
# more accurately, turns a list of lines into arguments to commands
seq 1 10 | xargs echo

## Filesystem


- tree
- find
- df 
- du


In [None]:
# Create a graphical tree of a folder and its files and subfolders
tree /etc/apt

In [None]:
# generate a list of files by traversing the filesystem, starting at a specific folder
find /etc/apt -type f

In [None]:
find /etc/apt -type f | xargs wc
# [ wc(x) for x in find("/etc/apt", type="file") ]
# find /etc/apt -type f | xargs wc | tr -s ' ' '\t' | cut -f3,5



In [None]:
# Display the information about the filesystem given a file or folder on that filesystem
df -h -T -P -l /etc/apt

In [None]:
# show disk usage for descendent files given a folder or file
du -m -a -c /etc/apt

## Whole file - metadata


- wc
- file
- ls
- stat


In [None]:
# Display the lines, words, and characters in a file
wc /usr/share/dict/words

In [None]:
# redirect input from a file
wc -l < /usr/share/dict/words

In [None]:
# guess the file type
file /usr/share/dict/american-english-insane

In [None]:
# list a file or files
ls -la /usr/share/dict/american-english-insane

In [None]:
# display the meta-data about a file
stat /usr/share/dict/american-english-insane

In [None]:
find /etc/apt -type f | xargs stat | egrep -o '[a-zA-Z ]+: ([a-zA-Z0-9)(/+.:-]+ ?[a-zA-Z0-9)(/+.:-]*)'

In [None]:
find /etc/apt -type f | xargs stat | sed -r -e '{ s/([a-zA-Z ]+: )/\n\1/g } ' | grep : | sed '{ s/^ *// } ' | sed -e '{ s/^File/\nFile/ }'

## Line-by-line, usually


- cat
- head
- tail
- rev
- tac
- cut
- sort
- uniq
- shuf
- grep
- column
- sed


In [None]:
# display the contents of a file or files
# 'cat' comes from concatenate
cat -n /etc/os-release


In [None]:
# input redirection from string
<<< "Hello, world" cat

In [None]:
# echo "Hello, world" > /tmp/pseudo-file
# cat /tmp/pseudo-file
# rm /tmp/pseudo-file


In [None]:
# here-doc
<< 'agoobi' cat
Hello, world
agoobi

In [None]:
< /etc/os-release cat

In [None]:
# diplay the lines at the top (head) of a file
head -5 /etc/os-release


In [None]:
# diplay the lines at the bottom (tail) of a file
tail -5 /etc/os-release


In [None]:
# reverse the sequence of characters on each line
rev /etc/os-release

In [None]:
# display the lines in a file from bottom to top
tac /etc/os-release | rev


In [None]:
# display the characters at a range of byte positions per line
cut -c3-7 /etc/os-release

In [None]:
< /etc/os-release rev | cut -c1-4 | rev

In [None]:
# sort and group lines
sort /etc/os-release


In [None]:
# display the unique characters
# often used in conjuction with sort, which groups similar lines together
grep -o . /etc/os-release | sort | uniq -c | sort -r -n | head -5


In [None]:
# randomly select and display lines
cat -n /etc/os-release | shuf -n 5


In [None]:
# display lines that match a pattern, i.e. regular expression
grep -i 'ver' --color=always /etc/os-release


In [None]:
grep -o -i 'vers.*=' /etc/os-release | rev | cut -c2- | rev

In [None]:
# creates space padded tabular data
column -s= -t /etc/os-release


In [None]:
# sed == stream editor
# used most often for search/replace patterns
# format is 'range { action }',
#   where range can be a single line number, a range of line numbers ( e.g. 1,3 ), a pattern ( e.g. /id/ ), or a combination
#   and an actions are usually single letters, e.g. s (search), p (print), d (delete)
sed '4,12 { s/ubuntu/stuff/i }' /etc/os-release | cat -n


In [None]:
sed -r -e '/URL/ { s/(.*)(=.*)$/foo_\1\2/i }' /etc/os-release | cat -n


## Line+character


- head
- cut
- awk


In [None]:
# first characters
head -c50 /etc/os-release

In [None]:
# last characters
tail -c10 /etc/os-release

In [None]:
cut -c1-3 /etc/os-release

In [None]:
# similar to sed, awk splits a line into fields
# format is 'pattern { action }'
# also has arrays, hashes ( associtive arrays ), and flow control ( if, while, for )
cat -n /etc/passwd
echo
cat -n /etc/passwd | grep gnats
echo
awk -F: '/gnats/ { print $5 }' /etc/passwd

# [ x[4] for x in passwd_file.split("\n") if "gnats" in x ]


## Character


- dd
- tr
- od
- hexdump
- xxd


In [None]:
# displays characters, but has options to skip, specify block-size, and count
# often used to "image" a filesystem or create sparse files
dd if=/etc/os-release bs=1c skip=10 count=10

In [None]:
# transliterate: map, compress, remove characters
# only works by redirection
# oftern used to remove undesirable characters or implement Ceasar cipher ( e.g. rot13 )
< /etc/os-release tr [a-zA-Z] [n-za-mN-ZA-m]


In [None]:
< /etc/os-release tr [a-zA-Z] [n-za-mN-ZA-m] | tr [a-zA-Z] [n-za-mN-ZA-m]


In [None]:
# "GACT" original
# "CTGA" rev-comp
echo "GA[GCAT]TC" | tr ACTG TGAC | rev
echo "CAGCAG[GCAT]{25}[GCAT][GCAT]"
echo "AGCTAGCTAGACTGGTACCTAGCGAGCTAGC" | sed -re  '{ s/^(.*GGTAC)(C.*$)/\1 --- \2/ }' 

In [None]:
# display numerical encodings
od -bc /etc/os-release | head

In [None]:
# display numerical encodings
hexdump -bc /etc/os-release | head

In [None]:
# display numerical encodings
xxd -b -g1 /etc/os-release | head

## Multi-file by line, usually


- diff
- paste
- comm
- join
- split


In [None]:
# show differences between two files
# often used to create a 'patch'
diff -y <( seq 1 10 ) <( seq 5 15 )

In [None]:
# combine files side-by-side
paste <( seq 1 10 ) <( seq 11 20 )

In [None]:
# show which lines are in which file
comm <( seq 1 10 ) <( seq 6 15 ) 2> /dev/null

In [None]:
# it can get kind of ugly because comm wants numbers sorted as though they are text
comm <( { seq 1 5 ; seq 11 15 ; } | sort ) <( seq 6 15 | sort ) | sort -n

In [None]:
# perform an inner join between two files
head -3 /etc/passwd /etc/group
echo
echo '==> join'
join -t: -1 4 -2 3 <( sort -t: -k4,4 /etc/passwd) <( sort -t: -k3,3 /etc/group ) | head -3

In [None]:
# split files into smaller files
## split /etc/password into files with two lines each (`-l 2`), having the name prefixed with "zfoo.", 
## and having numerical suffixes (`-d`) that are 3 digits (`-a 3`)
cd /tmp
split -l 2 -a 3 -d /etc/passwd zfoo.
wc zfoo*
wc /etc/passwd
md5sum /etc/passwd <( cat zfoo.* )

## Example of adding up 1..1e7, i.e. 10 million numbers


This generates parameters (`1 10_000_000`) for the `seq` command, removes the `_` from the input, generates a list of 10 million numbers with one per line,
combines them with a `+` character, and pipes them to `bc` to add up.

Notice all the tools that are being used:
- `echo` and `seq` to generate data
- `|` to pipe the output of one command as input to the next
- `tr` to modify input data
- `xargs` to read input and apply a command to it
- `paste` to combine lines with a delimeter
- `bc` to act on the input data



```
$ time -p echo 1 10_000_000 | tr -d _ | xargs seq  | paste -sd + | bc
5000000050000000
real 82.98
user 93.41
sys 17.36
```

In [None]:
# a sample of what happens below before piping into `bc`
echo 1 1_0 | tr -d _ | xargs seq | paste -sd +

time -p echo 1 10_000_000 | tr -d _ | xargs seq  | paste -sd + | bc
