Reword "Pipes and Filters" #225

Open
wants to merge 7 commits into
from

Projects

None yet

8 participants

@rgaiacs
Collaborator
rgaiacs commented Aug 30, 2015

When preparing for one workshop I thought of reword Pipes and Filters a little.

DONE

  • Move the files from data/Users/nelle/data/pdb to data/Users/nelle/molecules.
  • Lower the bar of the examples for complete novice learners
  • Update the diagram

TODO

  • Review the exercises (better leave for another pull request)
@jduckles
Contributor
jduckles commented Sep 1, 2015

I'm -1 for writing the files to /tmp as that will cause confusion. They're already trying to understand what wildcards and redirection mean and you're introducing a new concept /tmp at the same time which just distracts from the core concepts of the lesson. Lets keep the redirects to a file in the current directory, otherwise I like the PR.

@rgaiacs rgaiacs changed the title from WIP: Reword "Pipes and Filters" to Reword "Pipes and Filters" Sep 9, 2015
@rgaiacs
Collaborator
rgaiacs commented Sep 9, 2015

I addressed @jduckles suggestions.

@ChristinaLK ChristinaLK commented on an outdated diff Sep 13, 2015
03-pipefilter.md
~~~
-Let's go into that directory with `cd` and run the command `wc *.pdb`.
-`wc` is the "word count" command:
-it counts the number of lines, words, and characters in files.
-The `*` in `*.pdb` matches zero or more characters,
-so the shell turns `*.pdb` into a complete list of `.pdb` files:
+> ## `.pdb` Extension {.callout}
@ChristinaLK
ChristinaLK Sep 13, 2015 Contributor

omit this callout? I'm not sure it's necessary.

OR, I would make it not a callout - most of the callouts are technical asides and this isn't so much...

@ChristinaLK ChristinaLK commented on an outdated diff Sep 13, 2015
03-pipefilter.md
~~~
-We can put the sorted list of lines in another temporary file called `sorted-lengths.txt`
-by putting `> sorted-lengths.txt` after the command,
-just as we used `> lengths.txt` to put the output of `wc` into `lengths.txt`.
-Once we've done that,
-we can run another command called `head` to get the first few lines in `sorted-lengths.txt`:
+The `*` in `*nol.pdb` matches zero or more characters,
+so the shell turns `*nol.pdb` into a complete list
+of files that end with `nol.pdf`
+
+> ## Wildcards {.callout}
@ChristinaLK
ChristinaLK Sep 13, 2015 Contributor

I would move this back to the top, to line 30/77 - we use the wildcard at the very start, so we should explain what it is then too. One thought: the first paragraph could be outside the callout (as it's necessary information) and the rest of the callout can remain, but be entitled "more about wildcards"

@ChristinaLK ChristinaLK commented on an outdated diff Sep 13, 2015
03-pipefilter.md
~~~ {.bash}
-$ cd molecules
-$ wc *.pdb
-~~~
-~~~ {.output}
- 20 156 1158 cubane.pdb
- 12 84 622 ethane.pdb
- 9 57 422 methane.pdb
- 30 246 1828 octane.pdb
- 21 165 1226 pentane.pdb
- 15 111 825 propane.pdb
- 107 819 6081 total
+$ ls *.pdb > content-of-molecules
@ChristinaLK
ChristinaLK Sep 13, 2015 Contributor

this is detailed, but i'd prefer something shorter than content-of-molecules, maybe molecule-list? less to type is better for teaching (and learning)

@ChristinaLK ChristinaLK commented on an outdated diff Sep 13, 2015
03-pipefilter.md
-If we run `wc -l` instead of just `wc`,
-the output shows only the number of lines per file:
+For (2) we will use `wc`
@ChristinaLK
ChristinaLK Sep 13, 2015 Contributor

Re-explain what (2) is, something like "now we want to be able to count the lines of content-of-molecules, as they will tell us how many files were in the directory"

@ChristinaLK ChristinaLK commented on an outdated diff Sep 13, 2015
03-pipefilter.md
~~~ {.bash}
-$ wc -l *.pdb > lengths.txt
+$ wc -l content-of-molecules
+~~~
+~~~ {.output}
+48
~~~
@ChristinaLK
ChristinaLK Sep 13, 2015 Contributor

Maybe add a concluding sentence like "we've answered our question! There are 48 pdb files"

@ChristinaLK ChristinaLK commented on an outdated diff Sep 13, 2015
03-pipefilter.md
~~~
-The vertical bar between the two commands is called a **pipe**.
@ChristinaLK
ChristinaLK Sep 13, 2015 Contributor

I like this previous explanation - could we move it up to line 143/163 and then after the "we don't have to know or care," maybe refer people to the "Process" callout, if they DO care how it works.

@ChristinaLK ChristinaLK commented on an outdated diff Sep 13, 2015
03-pipefilter.md
~~~ {.bash}
-$ wc -l *.pdb | sort -n | head -1
+$ wc -l *.pdb | sort
@ChristinaLK
ChristinaLK Sep 13, 2015 Contributor

need -n. :)

@ChristinaLK ChristinaLK commented on an outdated diff Sep 13, 2015
03-pipefilter.md
~~~
-This is exactly like a mathematician nesting functions like *log(3x)*
@ChristinaLK
ChristinaLK Sep 13, 2015 Contributor

I'd like to keep this paragraph as well - right before the Process callout, maybe?

@ChristinaLK
Contributor

Sorry for leaving this for so long @rgaiacs . I've made several inline comments; my only other question is: is it possible to truncate the very long output (now that we have lots of files)? I like that there are more files for data handling purposes, but it's going to make the lesson very tedious to scroll through.

Also, my comments so far are purely content + organization. For grammar, do you want me to do inline comments, submit a PR against your branch/repo, or just edit when I merge this in?

@rgaiacs
Collaborator
rgaiacs commented Sep 13, 2015

I've made several inline comments;

Thanks for all the comments. I will need one-two weeks to work on it.

my only other question is: is it possible to truncate the very long output (now that we have lots of files)? I like that there are more files for data handling purposes, but it's going to make the lesson very tedious to scroll through.

Yes.

Also, my comments so far are purely content + organization. For grammar, do you want me to do inline comments, submit a PR against your branch/repo, or just edit when I merge this in?

I prefer inline comments so that I will learn when fixing it. =)

@ChristinaLK
Contributor

That's what I thought, re: grammar. :) I'll do another round of comments once you've added the changes - no rush!

@rgaiacs
Collaborator
rgaiacs commented Sep 20, 2015

@ChristinaLK I rebased this branch and addressed your comments. Could you review it again?

@lexnederbragt lexnederbragt commented on the diff Oct 27, 2015
03-pipefilter.md
~~~
+The `*` in `*n.pdb` matches zero or more characters,
+so the shell turns `*nol.pdb` into a complete list
@lexnederbragt
lexnederbragt Oct 27, 2015 Member

These three lines mention *n.pdb, *nol.pdb and *.pdb. I'm confused as to whether that is the intention? In addition, there is already an explanation in the callout just below...

@lexnederbragt lexnederbragt commented on the diff Oct 27, 2015
03-pipefilter.md
~~~ {.bash}
-$ ls molecules
+$ cd molecules
+~~~
+~~~ {.bash}
+$ ls
~~~
~~~ {.output}
@lexnederbragt
lexnederbragt Oct 27, 2015 Member

When the terminal is wide enough, a simple ls will in fact result in two or more columns, as in the original file. Maybe keep that as it is the behavior the learners will experience?

@lexnederbragt lexnederbragt commented on the diff Oct 27, 2015
03-pipefilter.md
~~~ {.bash}
-$ wc -l *.pdb > lengths.txt
+$ ls *.pdb > molecule-list
@lexnederbragt
lexnederbragt Oct 27, 2015 Member

I would suggest using the best-practice of adding extensions also here, so using molecule-list.txt as filename

@lexnederbragt lexnederbragt commented on the diff Oct 27, 2015
03-pipefilter.md
@@ -79,119 +86,86 @@ $ wc *.pdb
> themselves. It is the shell, not the other programs, that deals with
@lexnederbragt
lexnederbragt Oct 27, 2015 Member

In line 84, wcis used, but it has not been used before in your modified lesson. This may confuse learners.

@lexnederbragt lexnederbragt commented on the diff Oct 27, 2015
03-pipefilter.md
-~~~ {.bash}
-$ ls lengths.txt
-~~~
-~~~ {.output}
-lengths.txt
-~~~
+> ## Redirecting Input {.callout}
+>
+> As well as using `>` to redirect a program's output, we can use `<` to
+> redirect its input, i.e., to read from a file instead of from standard
+> input. For example, instead of writing `wc ammonia.pdb`, we could write
@lexnederbragt
lexnederbragt Oct 27, 2015 Member

Again use of wc before it is used in the lesson

@lexnederbragt lexnederbragt commented on the diff Oct 27, 2015
03-pipefilter.md
+ 49 sucrose.pdb
+ 51 norethindrone.pdb
+ 51 strychnine.pdb
+ 52 quinine.pdb
+ 53 lsd.pdb
+ 53 testosterone.pdb
+ 54 ethylcyclohexane.pdb
+ 54 tuberin.pdb
+ 55 vitamin-a.pdb
+ 78 cholesterol.pdb
+ 79 heme.pdb
+ 7 ammonia.pdb
+ 9 methane.pdb
+~~~
+
+Ops. Something went catastrophic wrong.
@lexnederbragt
lexnederbragt Oct 27, 2015 Member

On my filesystem the sort operation works fine, due to the extra spaces the wc command puts in from of the numbers. The result is

$ wc -l *.pdb | sort
       7 ammonia.pdb
       9 methane.pdb
      10 methanol.pdb
      10 vinyl-chloride.pdb
      12 ethane.pdb
....
      78 cholesterol.pdb
      79 heme.pdb
     248 lanoxin.pdb
    1808 total

This has been a thing I didn't like about the lesson before, and it hops over the need for sort -n when sorting numbers...

@lexnederbragt lexnederbragt commented on the diff Oct 27, 2015
03-pipefilter.md
-we only want the first line of the file;
-`-20` would get the first 20,
-and so on.
-Since `sorted-lengths.txt` contains the lengths of our files ordered from least to greatest,
-the output of `head` must be the file with the fewest lines.
-
-If you think this is confusing,
-you're in good company:
-even once you understand what `wc`, `sort`, and `head` do,
-all those intermediate files make it hard to follow what's going on.
-We can make it easier to understand by running `sort` and `head` together:
+each time that you want to know the number of files inside a directory
+will be tedious.
+If we could drop the file our life will be more easy.
+Again,
+we are luck and we can acomplish it with
@lexnederbragt
lexnederbragt Oct 27, 2015 Member

acomplish -> accomplish

@lexnederbragt
Member

in fig/redirects-and-pipes.png, the bottom command seems mangled, maybe two commands in the same place?

@wking
Member
wking commented Nov 23, 2015

Adding to Lex's comment, the redirects-and-pipes image also still has wc as the first step in the middle (redirect to file) entry, but the command-line version there is now talking about ls.

@gdevenyi
Contributor

Hi, this needs a rebase to be reviewed. Thanks

@shwina shwina commented on the diff Jun 14, 2016
03-pipefilter.md
@@ -16,46 +16,53 @@ Now that we know a few basic commands,
we can finally look at the shell's most powerful feature:
the ease with which it lets us combine existing programs in new ways.
We'll start with a directory called `molecules`
-that contains six files describing some simple organic molecules.
-The `.pdb` extension indicates that these files are in Protein Data Bank format,
-a simple text format that specifies the type and position of each atom in the molecule.
+so lets go into that directory and take a look at that directory:
@shwina
shwina Jun 14, 2016 Member

"Let's start by going into the molecules directory and having a look at what's in there"

@gvwilson
Member

@rgaiacs please rebase

@rgaiacs rgaiacs was assigned by gvwilson Aug 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment