Rename to chapter order

infochimps-labs · Feb 17, 2013 · 561b91b · 561b91b
1 parent 3b1798b
commit 561b91b
Show file tree

Hide file tree

Showing 101 changed files with 209 additions and 241 deletions.
diff --git a/00-preface.asciidoc b/00-preface.asciidoc
@@ -0,0 +1,3 @@
+[[preface]]
+== Preface
+
diff --git a/00a-about.asciidoc b/00a-about.asciidoc
@@ -1,5 +1,3 @@
-== Preface
-
 // :author:        Philip (flip) Kromer
 // :doctype: 	book
 // :toc:
@@ -41,7 +39,7 @@ This is the plan. We'll roll material out over the next few months. Should we fi
 
 5. *The Hadoop Toolset*
   - toolset overview
-  - launching jobs
+  - launching and debugging jobs
   - overview of wukong
   - overview of pig
 
@@ -60,53 +58,54 @@ This is the plan. We'll roll material out over the next few months. Should we fi
   - Pointwise Mutual Information
   - K-means Clustering
 
-9. Interlude I: *Data Models, Data Formats, Data Management*:
+9. *Statistics*:
-  - How to design your data models
+  - Summarizing: Averages, Percentiles, and Normalization
-  - How to serialize their contents (orig, scratch, prod)
-  - How to organize your scripts and your data
-
-10. *Statistics*:
-  - Averages, Percentiles, and Normalization
   - Sampling responsibly: it's harder and more important than you think
   - Statistical aggregates and the danger of large numbers
 
-11. *Time Series*
+10. *Time Series*
 
-12. *Geographic Data*:
+11. *Geographic Data*:
   - Spatial join (find all UFO sightings near Airports)
   -
 
-13. *`cat` herding*
+12. *`cat` herding*
   - total sort
   - transformations from the commandline (grep, cut, wc, etc)
   - pivots from the commandline (head, sort, etc)
   - commandline workflow tips
   - advanced hadoop filesystem (chmod, setrep, fsck)
 
-14. *Data Munging (Semi-Structured Data)*: The dirty art of data munging. It's a sad fact, but too often the bulk of time spent on a data exploration is just getting the data ready. We'll show you street-fighting tactics that lessen the time and pain. Along the way, we'll prepare the datasets to be used throughout the book:
+13. *Data Munging (Semi-Structured Data)*: The dirty art of data munging. It's a sad fact, but too often the bulk of time spent on a data exploration is just getting the data ready. We'll show you street-fighting tactics that lessen the time and pain. Along the way, we'll prepare the datasets to be used throughout the book:
   - Wikipedia Articles: Every English-language article (12 million) from Wikipedia.
   - Wikipedia Pageviews: Hour-by-hour counts of pageviews for every Wikipedia article since 2007.
   - US Commercial Airline Flights: every commercial airline flight since 1987
   - Hourly Weather Data: a century of weather reports, with hourly global coverage since the 1950s.
   - "Star Wars Kid" weblogs: large collection of apache webserver logs from a popular internet site (Andy Baio's waxy.org).
 
-15. Interlude II: *Best Practices and Pedantic Points of style*
+14. Interlude I: *Organizing Data*:
-  - Pedantic Points of Style 
+  - How to design your data models
-  - Best Practices
+  - How to serialize their contents (orig, scratch, prod)
-  - How to Think: there are several design patterns for how to pivot your data, like Message Passing (objects send records to meet together); Set Operations (group, distinct, union, etc); Graph Operations (breadth-first search). Taken as a whole, they're equivalent; with some experience under your belt it's worth learning how to fluidly shift among these different models.
+  - How to organize your scripts and your data
-  - Why Hadoop
-  - robots are cheap, people are important
 
-16. *Graph Processing*:
+15. *Graph Processing*:
+  - Graph Representations
   - Community Extraction: Use the page-to-page links in Wikipedia to identify similar documents
   - Pagerank (centrality): Reconstruct pageview paths from web logs, and use them to identify important pages
 
-17. *Machine Learning without Grad School*: We'll combine the record of every commercial flight since 1987 with the hour-by-hour weather data to predict flight delays using
+16. *Machine Learning without Grad School*: We'll combine the record of every commercial flight since 1987 with the hour-by-hour weather data to predict flight delays using
   - Naive Bayes
   - Logistic Regression
   - Random Forest (using Mahout)
   We'll equip you with a picture of how they work, but won't go into the math of how or why. We will show you how to choose a method, and how to cheat to win.
 
+17. Interlude II: *Best Practices and Pedantic Points of style*
+  - Pedantic Points of Style 
+  - Best Practices
+  - How to Think: there are several design patterns for how to pivot your data, like Message Passing (objects send records to meet together); Set Operations (group, distinct, union, etc); Graph Operations (breadth-first search). Taken as a whole, they're equivalent; with some experience under your belt it's worth learning how to fluidly shift among these different models.
+  - Why Hadoop
+  - robots are cheap, people are important
+
 
 PRACTICAL
 
@@ -147,6 +146,8 @@ APPENDIX
   - Sizes of the Universe
   - Hadoop Tuning & Configuration Variables
 
+25. *Appendix*:
+
 ==== Not Contents ====
 
 I'm not currently planning to cover Hive -- I believe the pig scripts will translate naturally for folks who are already familiar with it.  There will be a brief section explaining why you might choose it over Pig, and why I chose it over Hive. If there's popular pressure I may add a "translation guide".

diff --git a/00b-topics.asciidoc b/00b-topics.asciidoc
@@ -29,8 +29,8 @@
   - (visualize)
 
 INTERMEDIATE
-  
+
-5. *The Hadoop Toolset*
+5. *The Toolset*
   - toolset overview
     - pig vs hive vs impala
     - hbase & elasticsearch (not accumulo or cassandra)
@@ -226,3 +226,5 @@ APPENDIX
   - Regular Expressions
   - Sizes of the Universe
   - Hadoop Tuning & Configuration Variables
+
+25. *Appendix*  
diff --git a/01-first_exploration.asciidoc b/01-first_exploration.asciidoc
@@ -0,0 +1,2 @@
+[[first_exploration]]== First Exploration
+
diff --git a/014-organizing_data.asciidoc b/014-organizing_data.asciidoc
@@ -0,0 +1,2 @@
+[[data_management]]== Data Management
+
diff --git a/02-simple_transform.asciidoc b/02-simple_transform.asciidoc
@@ -0,0 +1,2 @@
+[[simple_transform]]== Simple Transform
+
diff --git a/03-transform_pivot.asciidoc b/03-transform_pivot.asciidoc
@@ -0,0 +1,2 @@
+[[transform_pivot]]== Transform Pivot
+
diff --git a/03a-locality-pivot.asciidoc → 03a-locality.asciidoc b/03a-locality-pivot.asciidoc → 03a-locality.asciidoc
diff --git a/03b-locality-saving_christmas.asciidoc → 03b-saving_christmas.asciidoc b/03b-locality-saving_christmas.asciidoc → 03b-saving_christmas.asciidoc
diff --git a/03c-locality-simple_reshape.asciidoc → 03c-simple_reshape.asciidoc b/03c-locality-simple_reshape.asciidoc → 03c-simple_reshape.asciidoc
diff --git a/03d-locality-efficient_santa.asciidoc → 03d-efficient_santa.asciidoc b/03d-locality-efficient_santa.asciidoc → 03d-efficient_santa.asciidoc
diff --git a/...locality-partition_and_sort_keys.asciidoc → 03f-partition_and_sort_keys.asciidoc b/...locality-partition_and_sort_keys.asciidoc → 03f-partition_and_sort_keys.asciidoc
diff --git a/04-geographic_flavor.asciidoc b/04-geographic_flavor.asciidoc
@@ -0,0 +1,2 @@
+[[geographic_flavor]]== Geographic Flavor
+
diff --git a/05-toolset.asciidoc b/05-toolset.asciidoc
@@ -0,0 +1,2 @@
+[[toolset]]== Toolset
+
diff --git a/08a-tools.asciidoc → 05a-tools.asciidoc b/08a-tools.asciidoc → 05a-tools.asciidoc
diff --git a/05b-launching_and_debugging.asciidoc b/05b-launching_and_debugging.asciidoc
diff --git a/08b-tools-intro_to_wukong.asciidoc → 05c-intro_to_wukong.asciidoc b/08b-tools-intro_to_wukong.asciidoc → 05c-intro_to_wukong.asciidoc
diff --git a/08c-tools-intro_to_pig.asciidoc → 05d-intro_to_pig.asciidoc b/08c-tools-intro_to_pig.asciidoc → 05d-intro_to_pig.asciidoc
diff --git a/06-filesystem_mojo.asciidoc b/06-filesystem_mojo.asciidoc
@@ -0,0 +1,2 @@
+[[filesystem_mojo]]== Filesystem Mojo
+
diff --git a/07-server_logs.asciidoc b/07-server_logs.asciidoc
@@ -0,0 +1,2 @@
+[[server_logs]]== Server Logs
+
diff --git a/05a-server_logs.asciidoc → 07a-server_logs.asciidoc b/05a-server_logs.asciidoc → 07a-server_logs.asciidoc
diff --git a/08-text_processing.asciidoc b/08-text_processing.asciidoc
@@ -0,0 +1,2 @@
+[[text_processing]]== Text Processing
+
diff --git a/05a-processing_text.asciidoc → 08a-processing_text.asciidoc b/05a-processing_text.asciidoc → 08a-processing_text.asciidoc
diff --git a/09-statistics.asciidoc b/09-statistics.asciidoc
@@ -0,0 +1,2 @@
+[[statistics]]== Statistics
+
diff --git a/11b-statistics.asciidoc → 09a-summarizing.asciidoc b/11b-statistics.asciidoc → 09a-summarizing.asciidoc
diff --git a/11f-sampling.asciidoc → 09b-sampling.asciidoc b/11f-sampling.asciidoc → 09b-sampling.asciidoc
diff --git a/...ribution_of_weather_measurements.asciidoc → ...ribution_of_weather_measurements.asciidoc b/...ribution_of_weather_measurements.asciidoc → ...ribution_of_weather_measurements.asciidoc
diff --git a/11e-statistics-exercises.asciidoc → 09e-exercises.asciidoc b/11e-statistics-exercises.asciidoc → 09e-exercises.asciidoc
diff --git a/10-time_series.asciidoc b/10-time_series.asciidoc
@@ -0,0 +1,2 @@
+[[time_series]]== Time Series
+
diff --git a/14a-time_series_data.asciidoc → 10a-time_series_data.asciidoc b/14a-time_series_data.asciidoc → 10a-time_series_data.asciidoc
diff --git a/11-geographic.asciidoc b/11-geographic.asciidoc
@@ -0,0 +1,2 @@
+[[geographic]]== Geographic
+
diff --git a/09a-geographic_data.asciidoc → 11a-spatial_join.asciidoc b/09a-geographic_data.asciidoc → 11a-spatial_join.asciidoc
diff --git a/..._elephants_eye_view_of_the_world.asciidoc → ..._elephants_eye_view_of_the_world.asciidoc b/..._elephants_eye_view_of_the_world.asciidoc → ..._elephants_eye_view_of_the_world.asciidoc
diff --git a/12-cat_herding.asciidoc b/12-cat_herding.asciidoc
@@ -0,0 +1,2 @@
+[[cat_herding]]== Cat Herding
+
diff --git a/08e-herding_cats.asciidoc → 12a-herding_cats.asciidoc b/08e-herding_cats.asciidoc → 12a-herding_cats.asciidoc
diff --git a/13-data_munging.asciidoc b/13-data_munging.asciidoc
@@ -0,0 +1,2 @@
+[[data_munging]]== Data Munging
+
diff --git a/..._structured_data-wikipedia_other.asciidoc → 13a-wikipedia_other.asciidoc b/..._structured_data-wikipedia_other.asciidoc → 13a-wikipedia_other.asciidoc
diff --git a/...structured_data-wikipedia_corpus.asciidoc → 13c-wikipedia_corpus.asciidoc b/...structured_data-wikipedia_corpus.asciidoc → 13c-wikipedia_corpus.asciidoc
diff --git a/06d-semi_structured_data-patterns.asciidoc → 13d-patterns.asciidoc b/06d-semi_structured_data-patterns.asciidoc → 13d-patterns.asciidoc
diff --git a/..._structured_data-airline_flights.asciidoc → 13e-airline_flights.asciidoc b/..._structured_data-airline_flights.asciidoc → 13e-airline_flights.asciidoc
diff --git a/...mi_structured_data-daily_weather.asciidoc → 13f-daily_weather.asciidoc b/...mi_structured_data-daily_weather.asciidoc → 13f-daily_weather.asciidoc
diff --git a/..._structured_data-truth_and_error.asciidoc → 13g-truth_and_error.asciidoc b/..._structured_data-truth_and_error.asciidoc → 13g-truth_and_error.asciidoc
diff --git a/...structured_data-other_strategies.asciidoc → 13h-other_strategies.asciidoc b/...structured_data-other_strategies.asciidoc → 13h-other_strategies.asciidoc
diff --git a/07a-data_formats.asciidoc → 14a-data_formats.asciidoc b/07a-data_formats.asciidoc → 14a-data_formats.asciidoc
diff --git a/24c-data_modeling.asciidoc → 14c-data_modeling.asciidoc b/24c-data_modeling.asciidoc → 14c-data_modeling.asciidoc
diff --git a/15-graphs.asciidoc b/15-graphs.asciidoc
@@ -0,0 +1,2 @@
+[[graphs]]== Graphs
+
diff --git a/12a-processing_graphs.asciidoc → 15a-representing_graphs.asciidoc b/12a-processing_graphs.asciidoc → 15a-representing_graphs.asciidoc
diff --git a/12c-processing_graphs-community.asciidoc → 15b-community_extractions.asciidoc b/12c-processing_graphs-community.asciidoc → 15b-community_extractions.asciidoc
diff --git a/12c-processing_graphs-pagerank.asciidoc → 15c-pagerank.asciidoc b/12c-processing_graphs-pagerank.asciidoc → 15c-pagerank.asciidoc
diff --git a/16-machine_learning.asciidoc b/16-machine_learning.asciidoc
@@ -0,0 +1,2 @@
+[[machine_learning]]== Machine Learning
+
diff --git a/15a-simple_machine_learning.asciidoc → 16a-simple_machine_learning.asciidoc b/15a-simple_machine_learning.asciidoc → 16a-simple_machine_learning.asciidoc
diff --git a/22d-misc.asciidoc → 16d-misc.asciidoc b/22d-misc.asciidoc → 16d-misc.asciidoc
diff --git a/17-best_practices.asciidoc b/17-best_practices.asciidoc
@@ -0,0 +1,2 @@
+[[best_practices]]== Best Practices
+
diff --git a/24a-why_hadoop.asciidoc → 17a-why_hadoop.asciidoc b/24a-why_hadoop.asciidoc → 17a-why_hadoop.asciidoc
diff --git a/24b-how_to_think.asciidoc → 17b-how_to_think.asciidoc b/24b-how_to_think.asciidoc → 17b-how_to_think.asciidoc
diff --git a/24d-cloud-vs-static.asciidoc → 17d-cloud-vs-static.asciidoc b/24d-cloud-vs-static.asciidoc → 17d-cloud-vs-static.asciidoc
diff --git a/24e-rules_of_scaling.asciidoc → 17e-rules_of_scaling.asciidoc b/24e-rules_of_scaling.asciidoc → 17e-rules_of_scaling.asciidoc
diff --git a/...ces_and_pedantic_points_of_style.asciidoc → ...ces_and_pedantic_points_of_style.asciidoc b/...ces_and_pedantic_points_of_style.asciidoc → ...ces_and_pedantic_points_of_style.asciidoc
diff --git a/24g-tao_te_chimp.asciidoc → 17g-tao_te_chimp.asciidoc b/24g-tao_te_chimp.asciidoc → 17g-tao_te_chimp.asciidoc
diff --git a/18-java_api.asciidoc b/18-java_api.asciidoc
@@ -0,0 +1,2 @@
+[[java_api]]== Java Api
+
diff --git a/19a-hadoop_api.asciidoc → 18a-hadoop_api.asciidoc b/19a-hadoop_api.asciidoc → 18a-hadoop_api.asciidoc
diff --git a/19-advanced_pig.asciidoc b/19-advanced_pig.asciidoc
@@ -0,0 +1,2 @@
+[[advanced_pig]]== Advanced Pig
+
diff --git a/20a-advanced_pig.asciidoc → 19a-advanced_pig.asciidoc b/20a-advanced_pig.asciidoc → 19a-advanced_pig.asciidoc
diff --git a/20b-pig_udfs.asciidoc → 19b-pig_udfs.asciidoc b/20b-pig_udfs.asciidoc → 19b-pig_udfs.asciidoc
diff --git a/20-hbase_data_modeling.asciidoc b/20-hbase_data_modeling.asciidoc
@@ -0,0 +1,2 @@
+[[hbase_data_modeling]]== Hbase Data Modeling
+
diff --git a/21b-hbase_and_databases.asciidoc → 20b-hbase_and_databases.asciidoc b/21b-hbase_and_databases.asciidoc → 20b-hbase_and_databases.asciidoc
diff --git a/21-hadoop_internals.asciidoc b/21-hadoop_internals.asciidoc
@@ -0,0 +1,2 @@
+[[hadoop_internals]]== Hadoop Internals
+
diff --git a/16a-hadoop_internals.asciidoc → 21a-hadoop_internals.asciidoc b/16a-hadoop_internals.asciidoc → 21a-hadoop_internals.asciidoc
diff --git a/16b-hadoop_internals-logs.asciidoc → 21b-hadoop_internals-logs.asciidoc b/16b-hadoop_internals-logs.asciidoc → 21b-hadoop_internals-logs.asciidoc
diff --git a/22-hadoop_tuning.asciidoc b/22-hadoop_tuning.asciidoc
@@ -0,0 +1,2 @@
+[[hadoop_tuning]]== Hadoop Tuning
+
diff --git a/17a-tuning-wise_and_lazy.asciidoc → 22a-tuning-wise_and_lazy.asciidoc b/17a-tuning-wise_and_lazy.asciidoc → 22a-tuning-wise_and_lazy.asciidoc
diff --git a/22-datasets_and_scripts.asciidoc → 22b-scripts.asciidoc b/22-datasets_and_scripts.asciidoc → 22b-scripts.asciidoc
diff --git a/17b-tuning-pathology.asciidoc → 22b-tuning-pathology.asciidoc b/17b-tuning-pathology.asciidoc → 22b-tuning-pathology.asciidoc
diff --git a/17c-tuning-brave_and_foolish.asciidoc → 22c-tuning-brave_and_foolish.asciidoc b/17c-tuning-brave_and_foolish.asciidoc → 22c-tuning-brave_and_foolish.asciidoc
diff --git a/17d-use_method_checklist.asciidoc → 22d-use_method_checklist.asciidoc b/17d-use_method_checklist.asciidoc → 22d-use_method_checklist.asciidoc
diff --git a/23-datasets_and_scripts.asciidoc b/23-datasets_and_scripts.asciidoc
@@ -0,0 +1,2 @@
+[[datasets_and_scripts]]== Datasets And Scripts
+
diff --git a/08d-overview_of_datasets.asciidoc → 23a-overview_of_datasets.asciidoc b/08d-overview_of_datasets.asciidoc → 23a-overview_of_datasets.asciidoc
diff --git a/25a-datasets.asciidoc → 23c-datasets.asciidoc b/25a-datasets.asciidoc → 23c-datasets.asciidoc
diff --git a/25c-wikipedia_dbpedia.asciidoc → 23c-wikipedia_dbpedia.asciidoc b/25c-wikipedia_dbpedia.asciidoc → 23c-wikipedia_dbpedia.asciidoc
diff --git a/25d-airline_flights.asciidoc → 23d-airline_flights.asciidoc b/25d-airline_flights.asciidoc → 23d-airline_flights.asciidoc
diff --git a/25e-access_logs.asciidoc → 23e-access_logs.asciidoc b/25e-access_logs.asciidoc → 23e-access_logs.asciidoc
diff --git a/25f-data_formats-arc.asciidoc → 23f-data_formats-arc.asciidoc b/25f-data_formats-arc.asciidoc → 23f-data_formats-arc.asciidoc
diff --git a/25g-other_datasets_on_the_web.asciidoc → 23g-other_datasets_on_the_web.asciidoc b/25g-other_datasets_on_the_web.asciidoc → 23g-other_datasets_on_the_web.asciidoc
diff --git a/25h-notes_for_chimpmark.asciidoc → 23h-notes_for_chimpmark.asciidoc b/25h-notes_for_chimpmark.asciidoc → 23h-notes_for_chimpmark.asciidoc
diff --git a/24-cheatsheets.asciidoc b/24-cheatsheets.asciidoc
@@ -0,0 +1,2 @@
+[[cheatsheets]]== Cheatsheets
+
diff --git a/24a-unix_cheatsheet.asciidoc b/24a-unix_cheatsheet.asciidoc
@@ -0,0 +1,76 @@
+== Cheatsheets ==
+
+=== Terminal Commands ===
+
+[[hadoop_filesystem_commands]]
+.Hadoop Filesystem Commands
+[options="header"]
+|=======
+| action			| command
+|				|
+| list files			| `hadoop fs -ls`
+| list files' disk usage	| `hadoop fs -du`
+| total HDFS usage/available	| visit namenode console
+|				|
+|				|
+| copy local -> HDFS		|
+| copy HDFS -> local		|
+| copy HDFS -> remote HDFS	|
+|				|
+| make a directory		| `hadoop fs -mkdir ${DIR}`
+| move/rename			| `hadoop fs -mv ${FILE}`
+| dump file to console		| `hadoop fs -cat ${FILE} \| cut -c 10000 \| head -n 10000`
+|				|
+|				|
+| remove a file			|
+| remove a directory tree	|
+| remove a file, skipping Trash	|
+| empty the trash NOW		|
+|				|
+| health check of HDFS		|
+| report block usage of files	|
+|				|
+| decommission nodes		|
+|				|
+|				|
+| list running jobs		|
+| kill a job			|
+| kill a task attempt		|
+|				|
+|				|
+| CPU usage by process		| `htop`, or `top` if that's not installed
+| Disk activity			|
+| Network activity		|
+|				|
+|				| `grep -e '[regexp]'`
+|				| `head`, `tail`
+|				| `wc`
+|				| `uniq -c`
+|				| `sort -n -k2`
+| tuning                        | csshX, htop, dstat, ulimit
+|
+| also useful:                  | cat, echo, true, false, yes, tee, time, watch, time
+| dos-to-unix line endings	| `ruby -ne 'puts $_.gsub(/\r\n?/, "\n")'`
+|				|
+|				|
+|======
+
+[[commandline_tricks]]
+.UNIX commandline tricks
+[options="header"]
+|=======
+| action			| command             		| Flags
+| Sort data                     | `sort`              		| reverse the sort: `-r`; sort numerically: `-n`; sort on a field: `-t [delimiter] -k [index]` 
+| Sort large amount of data     | `sort --parallel=4 -S 500M` 	| use four cores and a 500 megabyte sort buffer
+| Cut delimited field           | `cut -f 1,3-7 -d ','`   	| emit comma-separated fields one and three through seven
+| Cut range of characters       | `cut -c 1,3-7`          	| emit characters one and three through seven
+| Split on spaces               | `| ruby -ne 'puts $_.split(/\\s+/).join("\t")'` | split on continuous runs of whitespace, re-emit as tab-separated
+| Distinct fields               | `| sort | uniq`      		| only dupes: `-d`
+| Quickie histogram             | `| sort | uniq -c`   		| TODO: check the rendering for backslash
+| Per-process usage             | `htop`                        | Installed 
+| Running system usage          | `dstat -drnycmf -t 5`  	| 5-second rolling system stats. You likely will have to http://dag.wieers.com/home-made/dstat/[install dstat] yourself. If that's not an option, use `iostat -x 5 & sleep 3 ; ifstat 5` for an interleaved 5-second running average.
+|======
+
+For example: `cat * | cut -c 1-4 | sort | uniq -c` cuts the first 4-character
+
+Not all commands available on all platforms; OSX users should use Homebrew, Windows users should use Cygwin.
diff --git a/23a-cheatsheets.asciidoc → 24b-regular_expression_cheatsheet.asciidoc b/23a-cheatsheets.asciidoc → 24b-regular_expression_cheatsheet.asciidoc
@@ -1,79 +1,3 @@
-== Cheatsheets ==
-
-=== Terminal Commands ===
-
-[[hadoop_filesystem_commands]]
-.Hadoop Filesystem Commands
-[options="header"]
-|=======
-| action			| command
-|				|
-| list files			| `hadoop fs -ls`
-| list files' disk usage	| `hadoop fs -du`
-| total HDFS usage/available	| visit namenode console
-|				|
-|				|
-| copy local -> HDFS		|
-| copy HDFS -> local		|
-| copy HDFS -> remote HDFS	|
-|				|
-| make a directory		| `hadoop fs -mkdir ${DIR}`
-| move/rename			| `hadoop fs -mv ${FILE}`
-| dump file to console		| `hadoop fs -cat ${FILE} \| cut -c 10000 \| head -n 10000`
-|				|
-|				|
-| remove a file			|
-| remove a directory tree	|
-| remove a file, skipping Trash	|
-| empty the trash NOW		|
-|				|
-| health check of HDFS		|
-| report block usage of files	|
-|				|
-| decommission nodes		|
-|				|
-|				|
-| list running jobs		|
-| kill a job			|
-| kill a task attempt		|
-|				|
-|				|
-| CPU usage by process		| `htop`, or `top` if that's not installed
-| Disk activity			|
-| Network activity		|
-|				|
-|				| `grep -e '[regexp]'`
-|				| `head`, `tail`
-|				| `wc`
-|				| `uniq -c`
-|				| `sort -n -k2`
-| tuning                        | csshX, htop, dstat, ulimit
-|
-| also useful:                  | cat, echo, true, false, yes, tee, time, watch, time
-| dos-to-unix line endings	| `ruby -ne 'puts $_.gsub(/\r\n?/, "\n")'`
-|				|
-|				|
-|======
-
-[[commandline_tricks]]
-.UNIX commandline tricks
-[options="header"]
-|=======
-| action			| command             		| Flags
-| Sort data                     | `sort`              		| reverse the sort: `-r`; sort numerically: `-n`; sort on a field: `-t [delimiter] -k [index]` 
-| Sort large amount of data     | `sort --parallel=4 -S 500M` 	| use four cores and a 500 megabyte sort buffer
-| Cut delimited field           | `cut -f 1,3-7 -d ','`   	| emit comma-separated fields one and three through seven
-| Cut range of characters       | `cut -c 1,3-7`          	| emit characters one and three through seven
-| Split on spaces               | `| ruby -ne 'puts $_.split(/\\s+/).join("\t")'` | split on continuous runs of whitespace, re-emit as tab-separated
-| Distinct fields               | `| sort | uniq`      		| only dupes: `-d`
-| Quickie histogram             | `| sort | uniq -c`   		| TODO: check the rendering for backslash
-| Per-process usage             | `htop`                        | Installed 
-| Running system usage          | `dstat -drnycmf -t 5`  	| 5-second rolling system stats. You likely will have to http://dag.wieers.com/home-made/dstat/[install dstat] yourself. If that's not an option, use `iostat -x 5 & sleep 3 ; ifstat 5` for an interleaved 5-second running average.
-|======
-
-For example: `cat * | cut -c 1-4 | sort | uniq -c` cuts the first 4-character
-
-Not all commands available on all platforms; OSX users should use Homebrew, Windows users should use Cygwin.
 
 === Regular Expressions ===
 
@@ -253,18 +177,3 @@ Ascii table:
 	"~"	 	 	 
 	"\x7F"	\c	 	 
 	"\x80"	\c	 	 
-
-
-=== Pig Operators ===
-
-[[pig_cheatsheet]]
-.Pig Operator Cheatsheet
-[options="header"]
-|=======
-| action			| operator
-|				|
-|				| JOIN
-|				| FILTER
-|				|
-|=======
-
diff --git a/24c-pig_cheatsheet.asciidoc b/24c-pig_cheatsheet.asciidoc
@@ -0,0 +1,13 @@
+=== Pig Operators ===
+
+[[pig_cheatsheet]]
+.Pig Operator Cheatsheet
+[options="header"]
+|=======
+| action			| operator
+|				|
+|				| JOIN
+|				| FILTER
+|				|
+|=======
+
diff --git a/24d-hadoop_tunables_cheatsheet.asciidoc b/24d-hadoop_tunables_cheatsheet.asciidoc
@@ -0,0 +1,3 @@
+=== Hadoop Tunables Cheatsheet
+
+
diff --git a/25-appendix.asciidoc b/25-appendix.asciidoc
@@ -0,0 +1,2 @@
+[[appendix]]== Appendix
+
diff --git a/30a-authors.asciidoc → 25a-authors.asciidoc b/30a-authors.asciidoc → 25a-authors.asciidoc
diff --git a/30b-colophon.asciidoc → 25b-colophon.asciidoc b/30b-colophon.asciidoc → 25b-colophon.asciidoc
diff --git a/25b-acquiring_a_hadoop_cluster.asciidoc → 25c-acquiring_a_hadoop_cluster.asciidoc b/25b-acquiring_a_hadoop_cluster.asciidoc → 25c-acquiring_a_hadoop_cluster.asciidoc
diff --git a/30c-references.asciidoc → 25c-references.asciidoc b/30c-references.asciidoc → 25c-references.asciidoc
diff --git a/30f-glossary.asciidoc → 25f-glossary.asciidoc b/30f-glossary.asciidoc → 25f-glossary.asciidoc
diff --git a/30g-back_cover.asciidoc → 25g-back_cover.asciidoc b/30g-back_cover.asciidoc → 25g-back_cover.asciidoc
diff --git a/30h-TODO.asciidoc → 25h-TODO.asciidoc b/30h-TODO.asciidoc → 25h-TODO.asciidoc
diff --git a/...iidoc_cheatsheet_and_style_guide.asciidoc → ...iidoc_cheatsheet_and_style_guide.asciidoc b/...iidoc_cheatsheet_and_style_guide.asciidoc → ...iidoc_cheatsheet_and_style_guide.asciidoc