# Stupid Shell Tricks

----

First things first... let's talk about shell redirection, STDIN, STDOUT, STDERR, all that fun stuff. In Unix, streams are how the shell communicates with you and most other processes. These streams are assigned file descriptors

 * STDOUT is the standard output stream and is fd(1)
 * STDIN is the standard input stream and is fd(3)
 * STDERR is where errors are displayed and is fd(2) <br />
     some people erroneously dump errors to STDOUT. Those people should be publically mocked

In [1]:
echo "This is going to STDOUT" >&1
echo "This is going to STDERR" >&2

This is going to STDOUT
This is going to STDERR


There are three basic redirection operators. They are ">", "<" and "|"<br />
 * `> redirects an output stream`
 * `< redirects an input stream`
 * `| connects STDOUT of one process to STDIN of another process`

That means we can use the redirect operator to dump output of a command to a file, or even throw it away by piping it
to /dev/null:

In [None]:
cat /etc/passwd > /dev/null

See? No output. This can be very handy for LARGE commands where you don't care about the output

In [None]:
find -L /tmp -name \* -exec touch {} \; >/dev/null

Wait? What? Where did all that come from? Oh, those are the STDERR stream. Yeah, I might want to see those errors.
That's why dumping error output to STDOUT instead of STDERR is a crime against humanity. What if I don't want
to see those errors after all? Simple, just redirect fd(2) to fd(1). Ordering is IMPORTANT here.

In [None]:
find -L /tmp -name \* -exec touch {} \; >/dev/null 2>&1

Voila, no output. There is an important thing to note about the distinction of '>' and '|'

The > operator writes directly to a file descriptor. If you do not specify a stream, it will create or overwrite an existing file in the file system. The | operator connects two process, causing one process to read from another. There is a VERY large difference between these two commands:<br />

```
echo "foo" > test
echo "foo" | test
```

It is also important to note that > is destructive. If you want an append operation instead, use ">>"

In [None]:
Now we have the basics. We can use the above knowledge to start doing some fairly powerful things. The philosophy
of Unix is that you should have a lot of tiny tools that do one or two things and do them very well. They should
be able to read from STDIN and write to STDOUT and this allows us to build "pipelines" by connecting a series of
these tools. Let's find out all the home directories that are in use right now that are legit (non-legit ones will
be /var/empty). The home directories are in field 6 of the GECOS in the passwd file

In [None]:
cat /etc/passwd | cut -f1,6 -d: | egrep -ve "(^#|empty)" 

In [None]:
Now, the above command would earn me the "egregious use of cat" award and prove that I don't truly understand shell
redirection. Cat is an entirely unnecessary addition to the above pipeline, the better way to write it would be to
use the < operator:

In [None]:
egrep -ve "(^#|empty)" < /etc/passwd | cut -f1,6 -d:

In [None]:
The above redirects the STDIN stream of egrep to read from the /etc/passwd file instead. However, since I
also know that egrep accepts a filename on the commandline and only reads from STDIN if it's not present,
I can actually remove the first redirect completely:

In [None]:
egrep -ve "(^#|empty)" /etc/passwd | cut -f1,6 -d:

In [None]:
Ok, how about something a bit more useful? First, let's set up some files.

In [None]:
# First, let's get our PID and use that to set up a temp directory
my_pid=$$
tmp_dir=/tmp/sst.$my_pid
rm -Rf $tmp_dir

mkdir -p $tmp_dir

for i in 2 3 4 5 6 7 8 9 0; do echo "foo" > $tmp_dir/foo$i ; echo "bar" > $tmp_dir/bar$i ; done

Now we should have a directory that contains 5 numbered "foo" and 5 numbered "bar" files.

In [None]:
ls $tmp_dir

Let's do something fun with it. Let's play with the find command, which allows us to find files that match patterns,
types, etc.

In [None]:
find $tmp_dir -name foo\* -print

In [None]:
grep 'foo' $tmp_dir/*

Let's change the contents of all the foo files to "baz"

In [None]:
find $tmp_dir -name foo\* -exec sed -e "s/foo/baz/" -i '' {} \;
grep 'baz' $tmp_dir/*

There are multiple ways to do the above (for example you could use grep -R to recursively search). Find is an  **extremely** powerful swiss army knife, and is by default recursive. You can limit it to certain depths in the directory tree, find only files of certain types (such as directory nodes), etc.

When you are manipulating files like this, any search method (grep, find, etc) can return a LARGE amount of files. Sometimes the list of files it returns will exceed the command line length limit for whatever you are passing them to . There's a way around this, and that is to use xargs which automatically breaks up long command lines and does multiple invocations where necessary.

In [None]:
grep -R "bar" $tmp_dir

In [None]:
grep -R "bar" $tmp_dir | cut -d: -f1 | xargs sed -e "s/bar/quux/" -i ''

In [None]:
grep -R "quux" $tmp_dir

I used this exact technique to correct a lot of typos in the ansible directory in one shot
```
grep -R sentimel ~/projects/taskrabbit/tr_ansible/* | cut -d: -f1 | sort -u | xargs sed -i '.bak' 's/sentimel/sentinel/g'
```

I used this exact technique to correct a lot of typos in the ansible directory in one shot with a typical
command-line junkie pipeline.

First we'll use a recursive grep to return a list of files that had the word "sentimel" in them. The output looks vaguely like this:

```
$ grep -R sentimel ~/projects/taskrabbit/tr_ansible/*
/dir1/dir2/file1:"some random sentence with sentimel"
/dir3/dir4/file4:"hahahaha sentimel ho ho ho"
/dir1/dir2/file1:"oh look another random sentimel sighting"
/dir2/dir3/file2:"blah blah sentimel"
```

First thing is that we need JUST the file name. We can get this with the ```cut``` utility, telling it to delimit fields with a : character and give us the second field. That means we now have a list that looks like:

```
$ grep -R sentimel ~/projects/taskrabbit/tr_ansible/* | cut -f1 -d:
/dir1/dir2/file1
/dir3/dir4/file4
/dir1/dir2/file1
/dir2/dir3/file2
```

Now you'll notice we have a some duplicate files. We only need to run sed on a file once to correct everything in that file, so let's dedupe the list  with ```sort -u``` which will sort the list and then remove any dupes. Now we have:

```
$ grep -R sentimel ~/projects/taskrabbit/tr_ansible/* | cut -f1 -d: | sort -u

/dir1/dir2/file1
/dir2/dir3/file2
/dir3/dir4/file4
```

Now we tell sed to edit in-place each of these files, changing every occurance of "sentimel" to "sentinel". We keep a backup file of the original file by copying it in place to a new file with the extension ".bak" on it. We pass this all through xargs because we may get a potentially huge list of files back from the grep.

```
$ grep -R sentimel ~/projects/taskrabbit/tr_ansible/* | cut -f1 -d: | sort -u | xargs sed -i '.bak' 's/sentimel/sentinel/g'
```

There you go, we corrected every occurance with one shell pipeline. Once we have verified our work, we can remove all the .bak files:

```
$ find ~/projects/taskrabbit/tr_ansible -name \*.bak -exec rm {} \;
```