From 0cc2f2596148b01ca182170b987f5a945a2e9a58 Mon Sep 17 00:00:00 2001 From: Noam Ross Date: Wed, 11 Feb 2015 10:28:07 -0800 Subject: [PATCH] Adding HTML pages --- 01-filedir.html | 34 ++++--- 02-create.html | 8 +- 06-find.html | 239 +++++++++++++++++++++++++++++------------------- 3 files changed, 171 insertions(+), 110 deletions(-) diff --git a/01-filedir.html b/01-filedir.html index 1183749d1..b117588d3 100644 --- a/01-filedir.html +++ b/01-filedir.html @@ -61,10 +61,14 @@

Alphabet Soup

To understand what a "home directory" is, let's have a look at how the file system as a whole is organized. At the top is the root directory that holds everything else. We refer to it using a slash character / on its own; this is the leading slash in /users/nelle.

Inside that directory are several other directories: bin (which is where some built-in programs are stored), data (for miscellaneous data files), users (where users' personal directories are located), tmp (for temporary files that don't need to be stored long-term), and so on:

-

The Filesystem

+
+The Filesystem

The Filesystem

+

We know that our current working directory /users/nelle is stored inside /users because /users is the first part of its name. Similarly, we know that /users is stored inside the root directory / because its name begins with /.

Underneath /users, we find one directory for each user with an account on this machine. The Mummy's files are stored in /users/imhotep, Wolfman's in /users/larry, and ours in /users/nelle, which is why nelle is the last part of the directory's name.

-

Home Directories

+
+Home Directories

Home Directories

+

Notice that there are two meanings for the / character. When it appears at the front of a file or directory name, it refers to the root directory. When it appears inside a name, it's just a separator.

@@ -73,7 +77,9 @@

Alphabet Soup

creatures  molecules           pizza.cfg
 data       north-pacific-gyre  solar.pdf
 Desktop    notes.txt           writing
-

Nelle's Home Directory

+
+Nelle's Home Directory

Nelle's Home Directory

+

ls prints the names of the files and directories in the current directory in alphabetical order, arranged neatly into columns. We can make its output more comprehensible by using the flag -F, which tells ls to add a trailing / to the names of directories:

$ ls -F
creatures/  molecules/           pizza.cfg
@@ -149,9 +155,11 @@ 

Nelle's Pipeline: Organizing Files

and then presses tab, the shell automatically completes the directory name for her:

$ ls north-pacific-gyre/

If she presses tab again, Bash will add 2012-07-03/ to the command, since it's the only possible completion. Pressing tab again does nothing, since there are 1520 possibilities; pressing tab twice brings up a list of all the files, and so on. This is called tab completion, and we will see it in many other tools as we go on.

-

Filesystem for Challenge Questions

-
-

FIXME

+
+Filesystem for Challange Questions

Filesystem for Challange Questions

+
+
+

Relative path resolution

If pwd displays /users/thing, what will ls ../backup display?

  1. ../backup: No such file or directory
  2. @@ -160,8 +168,8 @@

    FIXME

  3. original pnas_final pnas_sub
-
-

FIXME

+
+

ls reading comprehension

If pwd displays /users/backup, and -r tells ls to display things in reverse order, what command will display:

pnas-sub/ pnas-final/ original/
    @@ -171,8 +179,8 @@

    FIXME

  1. Either #2 or #3 above, but not #1.
-
-

FIXME

+
+

Default cd action

What does the command cd without a directory name do?

  1. It has no effect.
  2. @@ -181,9 +189,9 @@

    FIXME

  3. It produces an error message.
-
-

FIXME

-

What does the command ls do when used with the -s and -h arguments?

+
+

Exploring more ls arguments

+

What does the command ls do when used with the -s and -h arguments?

diff --git a/02-create.html b/02-create.html index 701e90f67..b0c58bf7a 100644 --- a/02-create.html +++ b/02-create.html @@ -27,7 +27,7 @@

The Unix Shell

Creating Things

-
+

Learning Objectives

  • Create a directory hierarchy that matches a given diagram.
  • @@ -35,7 +35,7 @@

    Learning Objectives

  • Display the contents of a directory using the command line.
  • Delete specified files and/or directories.
-
+

We now know how to explore files and directories, but how do we create them in the first place? Let's go back to Nelle's home directory, /users/nelle, and use ls -F to see what it contains:

$ pwd
/users/nelle
@@ -71,7 +71,9 @@

Which Editor?

No matter what editor you use, you will need to know where it searches for and saves files. If you start it from the shell, it will (probably) use your current working directory as its default location. If you use your computer's start menu, it may want to save files in your desktop or documents directory instead. You can change this by navigating to another directory the first time you "Save As..."

Let's type in a few lines of text, then use Control-O to write our data to disk:

-

Nano in Action

+
+Nano in action

Nano in action

+

Once our file is saved, we can use Control-X to quit the editor and return to the shell. (Unix documentation often uses the shorthand ^A to mean "control-A".) nano doesn't leave any output on the screen after it exits, but ls now shows that we have created a file called draft.txt:

$ ls
draft.txt
diff --git a/06-find.html b/06-find.html index 693bd9fa1..896b1cc64 100644 --- a/06-find.html +++ b/06-find.html @@ -1,43 +1,81 @@ - - - - - Software Carpentry: The Unix Shell - - - - - - - - - - - -
- -
-
-

The Unix Shell

-

Finding Things

-
+ + + + + + + + + + + +The Unix Shell + + + + + + + + + + + + + + + + + + + + + +
+ + + + + +

Learning Objectives

  • Use grep to select lines from text files that match simple patterns.
  • Use find to find files whose names match simple patterns.
  • Use the output of one command as the command-line parameters to another command.
  • -
  • Explain what is meant by "text" and "binary" files, and why many common tools don't handle the latter well.
  • +
  • Explain what is meant by “text” and “binary” files, and why many common tools don’t handle the latter well.
-
-

You can guess someone's age by how they talk about search: young people use "Google" as a verb, while crusty old Unix programmers use "grep". The word is a contraction of "global/regular expression/print", a common sequence of operations in early Unix text editors. It is also the name of a very useful command-line program.

-

grep finds and prints lines in files that match a pattern. For our examples, we will use a file that contains three haikus taken from a 1998 competition in Salon magazine. For this set of examples we're going to be working in the writing subdirectory:

+ +

You can guess someone’s age by how they talk about search: young people use “Google” as a verb, while crusty old Unix programmers use “grep”. The word is a contraction of “global/regular expression/print”, a common sequence of operations in early Unix text editors. It is also the name of a very useful command-line program.

+

grep finds and prints lines in files that match a pattern. For our examples, we will use a file that contains three haikus taken from a 1998 competition in Salon magazine. For this set of examples we’re going to be working in the writing subdirectory:

$ cd
 $ cd writing
 $ cat haiku.txt
@@ -52,30 +90,30 @@

Learning Objectives

Yesterday it worked Today it is not working Software is like that.
-
+

Forever, or Five Years

-

We haven't linked to the original haikus because they don't appear to be on Salon's site any longer. As Jeff Rothenberg said, "Digital information lasts forever --- or five years, whichever comes first."

-
-

Let's find lines that contain the word "not":

+

We haven’t linked to the original haikus because they don’t appear to be on Salon’s site any longer. As Jeff Rothenberg said, “Digital information lasts forever — or five years, whichever comes first.”

+ +

Let’s find lines that contain the word “not”:

$ grep not haiku.txt
Is not the true Tao, until
 "My Thesis" not found
 Today it is not working
-

Here, not is the pattern we're searching for. It's pretty simple: every alphanumeric character matches against itself. After the pattern comes the name or names of the files we're searching in. The output is the three lines in the file that contain the letters "not".

-

Let's try a different pattern: "day".

+

Here, not is the pattern we’re searching for. It’s pretty simple: every alphanumeric character matches against itself. After the pattern comes the name or names of the files we’re searching in. The output is the three lines in the file that contain the letters “not”.

+

Let’s try a different pattern: “day”.

$ grep day haiku.txt
Yesterday it worked
 Today it is not working
-

This time, the output is lines containing the words "Yesterday" and "Today", which both have the letters "day". If we give grep the -w flag, it restricts matches to word boundaries, so that only lines with the word "day" will be printed:

+

This time, the output is lines containing the words “Yesterday” and “Today”, which both have the letters “day”. If we give grep the -w flag, it restricts matches to word boundaries, so that only lines with the word “day” will be printed:

$ grep -w day haiku.txt
-

In this case, there aren't any, so grep's output is empty.

+

In this case, there aren’t any, so grep’s output is empty.

Another useful option is -n, which numbers the lines that match:

$ grep -n it haiku.txt
5:With searching comes loss
 9:Yesterday it worked
 10:Today it is not working
-

Here, we can see that lines 5, 9, and 10 contain the letters "it".

-

We can combine flags as we do with other Unix commands. For example, since -i makes matching case-insensitive and -v inverts the match, using them both only prints lines that don't match the pattern in any mix of upper and lower case:

+

Here, we can see that lines 5, 9, and 10 contain the letters “it”.

+

We can combine flags as we do with other Unix commands. For example, since -i makes matching case-insensitive and -v inverts the match, using them both only prints lines that don’t match the pattern in any mix of upper and lower case:

$ grep -i -v the haiku.txt
You bring fresh toner.
 
@@ -84,7 +122,7 @@ 

Forever, or Five Years

Yesterday it worked Today it is not working Software is like that.
-

grep has lots of other options. To find out what they are, we can type man grep. man is the Unix "manual" command: it prints a description of a command and its options, and (if you're lucky) provides a few examples of how to use it:

+

grep has lots of other options. To find out what they are, we can type man grep. man is the Unix “manual” command: it prints a description of a command and its options, and (if you’re lucky) provides a few examples of how to use it:

$ man grep
GREP(1)                                                                                              GREP(1)
 
@@ -119,19 +157,19 @@ 

Forever, or Five Years

Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.) ... ... ...
-
+

Wildcards

-

grep's real power doesn't come from its options, though; it comes from the fact that patterns can include wildcards. (The technical name for these is regular expressions, which is what the "re" in "grep" stands for.) Regular expressions are both complex and powerful; if you want to do complex searches, please look at the lesson on our website. As a taster, we can find lines that have an 'o' in the second position like this:

-
$ grep -E '^.o' haiku.txt
+

grep‘s real power doesn’t come from its options, though; it comes from the fact that patterns can include wildcards. (The technical name for these is regular expressions, which is what the “re” in “grep” stands for.) Regular expressions are both complex and powerful; if you want to do complex searches, please look at the lesson on our website. As a taster, we can find lines that have an ’o’ in the second position like this:

+
$ grep -E '^.o' haiku.txt
 You bring fresh toner.
 Today it is not working
 Software is like that.
-

We use the -E flag and put the pattern in quotes to prevent the shell from trying to interpret it. (If the pattern contained a '*', for example, the shell would try to expand it before running grep.) The '^' in the pattern anchors the match to the start of the line. The '.' matches a single character (just like '?' in the shell), while the 'o' matches an actual 'o'.

-
-

While grep finds lines in files, the find command finds files themselves. Again, it has a lot of options; to show how the simplest ones work, we'll use the directory tree shown below.

-

File Tree for Find Example

-

Nelle's writing directory contains one file called haiku.txt and four subdirectories: thesis (which is sadly empty), data (which contains two files one.txt and two.txt), a tools directory that contains the programs format and stats, and an empty subdirectory called old.

-

For our first command, let's run find . -type d. As always, the . on its own means the current working directory, which is where we want our search to start; -type d means "things that are directories". Sure enough, find's output is the names of the five directories in our little tree (including .):

+

We use the -E flag and put the pattern in quotes to prevent the shell from trying to interpret it. (If the pattern contained a ‘*’, for example, the shell would try to expand it before running grep.) The ‘^’ in the pattern anchors the match to the start of the line. The ‘.’ matches a single character (just like ‘?’ in the shell), while the ‘o’ matches an actual ‘o’.

+ +

While grep finds lines in files, the find command finds files themselves. Again, it has a lot of options; to show how the simplest ones work, we’ll use the directory tree shown below.

+

File Tree for Find Example

+

Nelle’s writing directory contains one file called haiku.txt and four subdirectories: thesis (which is sadly empty), data (which contains two files one.txt and two.txt), a tools directory that contains the programs format and stats, and an empty subdirectory called old.

+

For our first command, let’s run find . -type d. As always, the . on its own means the current working directory, which is where we want our search to start; -type d means “things that are directories”. Sure enough, find’s output is the names of the five directories in our little tree (including .):

$ find . -type d
./
 ./data
@@ -147,7 +185,7 @@ 

Wildcards

./thesis/empty-draft.md ./data/one.txt ./data/two.txt
-

find automatically goes into subdirectories, their subdirectories, and so on to find everything that matches the pattern we've given it. If we don't want it to, we can use -maxdepth to restrict the depth of search:

+

find automatically goes into subdirectories, their subdirectories, and so on to find everything that matches the pattern we’ve given it. If we don’t want it to, we can use -maxdepth to restrict the depth of search:

$ find . -maxdepth 1 -type f
./haiku.txt

The opposite of -maxdepth is -mindepth, which tells find to only report things that are at or below a certain depth. -mindepth 2 therefore finds all the files that are two or more levels below us:

@@ -156,49 +194,50 @@

Wildcards

./data/two.txt ./tools/format ./tools/stats
-

Now let's try matching by name:

+

Now let’s try matching by name:

$ find . -name *.txt
./haiku.txt

We expected it to find all the text files, but it only prints out ./haiku.txt. The problem is that the shell expands wildcard characters like * before commands run. Since *.txt in the current directory expands to haiku.txt, the command we actually ran was:

$ find . -name haiku.txt

find did what we asked; we just asked for the wrong thing.

-

To get what we want, let's do what we did with grep: put *.txt in single quotes to prevent the shell from expanding the * wildcard. This way, find actually gets the pattern *.txt, not the expanded filename haiku.txt:

-
$ find . -name '*.txt'
+

To get what we want, let’s do what we did with grep: put *.txt in single quotes to prevent the shell from expanding the * wildcard. This way, find actually gets the pattern *.txt, not the expanded filename haiku.txt:

+
$ find . -name '*.txt'
./data/one.txt
 ./data/two.txt
 ./haiku.txt
-
-

Listing vs. Finding

+
+

Listing vs. Finding

ls and find can be made to do similar things given the right options, but under normal circumstances, ls lists everything it can, while find searches for things with certain properties and shows them.

-
-

As we said earlier, the command line's power lies in combining tools. We've seen how to do that with pipes; let's look at another technique. As we just saw, find . -name '*.txt' gives us a list of all text files in or below the current directory. How can we combine that with wc -l to count the lines in all those files?

+ +

As we said earlier, the command line’s power lies in combining tools. We’ve seen how to do that with pipes; let’s look at another technique. As we just saw, find . -name '*.txt' gives us a list of all text files in or below the current directory. How can we combine that with wc -l to count the lines in all those files?

The simplest way is to put the find command inside $():

-
$ wc -l $(find . -name '*.txt')
+
$ wc -l $(find . -name '*.txt')
11 ./haiku.txt
 300 ./data/two.txt
 70 ./data/one.txt
 381 total
-

When the shell executes this command, the first thing it does is run whatever is inside the $(). It then replaces the $() expression with that command's output. Since the output of find is the three filenames ./data/one.txt, ./data/two.txt, and ./haiku.txt, the shell constructs the command:

+

When the shell executes this command, the first thing it does is run whatever is inside the $(). It then replaces the $() expression with that command’s output. Since the output of find is the three filenames ./data/one.txt, ./data/two.txt, and ./haiku.txt, the shell constructs the command:

$ wc -l ./data/one.txt ./data/two.txt ./haiku.txt
-

which is what we wanted. This expansion is exactly what the shell does when it expands wildcards like * and ?, but lets us use any command we want as our own "wildcard".

-

It's very common to use find and grep together. The first finds files that match a pattern; the second looks for lines inside those files that match another pattern. Here, for example, we can find PDB files that contain iron atoms by looking for the string "FE" in all the .pdb files above the current directory:

-
$ grep FE $(find .. -name '*.pdb')
+

which is what we wanted. This expansion is exactly what the shell does when it expands wildcards like * and ?, but lets us use any command we want as our own “wildcard”.

+

It’s very common to use find and grep together. The first finds files that match a pattern; the second looks for lines inside those files that match another pattern. Here, for example, we can find PDB files that contain iron atoms by looking for the string “FE” in all the .pdb files above the current directory:

+
$ grep FE $(find .. -name '*.pdb')
../data/pdb/heme.pdb:ATOM     25 FE           1      -0.924   0.535  -0.518
-
+

Binary Files

-

We have focused exclusively on finding things in text files. What if your data is stored as images, in databases, or in some other format? One option would be to extend tools like grep to handle those formats. This hasn't happened, and probably won't, because there are too many formats to support.

-

The second option is to convert the data to text, or extract the text-ish bits from the data. This is probably the most common approach, since it only requires people to build one tool per data format (to extract information). On the one hand, it makes simple things easy to do. On the negative side, complex things are usually impossible. For example, it's easy enough to write a program that will extract X and Y dimensions from image files for grep to play with, but how would you write something to find values in a spreadsheet whose cells contained formulas?

-

The third choice is to recognize that the shell and text processing have their limits, and to use a programming language such as Python instead. When the time comes to do this, don't be too hard on the shell: many modern programming languages, Python included, have borrowed a lot of ideas from it, and imitation is also the sincerest form of praise.

-
-

Conclusion

-

The Unix shell is older than most of the people who use it. It has survived so long because it is one of the most productive programming environments ever created --- maybe even the most productive. Its syntax may be cryptic, but people who have mastered it can experiment with different commands interactively, then use what they have learned to automate their work. Graphical user interfaces may be better at the first, but the shell is still unbeaten at the second. And as Alfred North Whitehead wrote in 1911, "Civilization advances by extending the number of important operations which we can perform without thinking about them."

-
-

FIXME

+

We have focused exclusively on finding things in text files. What if your data is stored as images, in databases, or in some other format? One option would be to extend tools like grep to handle those formats. This hasn’t happened, and probably won’t, because there are too many formats to support.

+

The second option is to convert the data to text, or extract the text-ish bits from the data. This is probably the most common approach, since it only requires people to build one tool per data format (to extract information). On the one hand, it makes simple things easy to do. On the negative side, complex things are usually impossible. For example, it’s easy enough to write a program that will extract X and Y dimensions from image files for grep to play with, but how would you write something to find values in a spreadsheet whose cells contained formulas?

+

The third choice is to recognize that the shell and text processing have their limits, and to use a programming language such as Python instead. When the time comes to do this, don’t be too hard on the shell: many modern programming languages, Python included, have borrowed a lot of ideas from it, and imitation is also the sincerest form of praise.

+ +
+

Conclusion

+

The Unix shell is older than most of the people who use it. It has survived so long because it is one of the most productive programming environments ever created — maybe even the most productive. Its syntax may be cryptic, but people who have mastered it can experiment with different commands interactively, then use what they have learned to automate their work. Graphical user interfaces may be better at the first, but the shell is still unbeaten at the second. And as Alfred North Whitehead wrote in 1911, “Civilization advances by extending the number of important operations which we can perform without thinking about them.”

+
+

find pipeline reading comprehension

Write a short explanatory comment for the following shell script:

-
find . -name '*.dat' | wc -l | sort -n
-
-
-

FIXME

+
find . -name '*.dat' | wc -l | sort -n
+ +
+

Matching ose.dat but not temp

The -v flag to grep inverts pattern matching, so that only lines which do not match the pattern are printed. Given that, which of the following commands will find all files in /data whose names end in ose.dat (e.g., sucrose.dat or maltose.dat), but do not contain the word temp?

  1. find /data -name '*.dat' | grep ose | grep -v temp

  2. @@ -206,22 +245,34 @@

    FIXME

  3. grep -v temp $(find /data -name '*ose.dat')

  4. None of the above.

-
-
+ +

Little Women

You and your friend, having just finished reading Little Women by Louisa May Alcott, are in an argument. Of the four sisters in the book, Jo, Meg, Beth, and Amy, your friend thinks that Jo was the most mentioned. You, however, are certain it was Amy. Luckily, you have a file LittleWomen.txt containing the full text of the novel. Using afor loop, how would you tabulate the number of times each of the four sisters is mentioned? Hint: one solution might employ the commands grep and wc and a |, while another might utilize grep options.

+
-
- - - - - - - + + + + + + + + + +