Skip to content

Commit

Permalink
Added three simple opening exercises
Browse files Browse the repository at this point in the history
  • Loading branch information
Philip (flip) Kromer committed Mar 2, 2014
1 parent 01f4076 commit fdf425c
Showing 1 changed file with 37 additions and 0 deletions.
37 changes: 37 additions & 0 deletions 01-opening.asciidoc
Expand Up @@ -73,3 +73,40 @@ Peter Norvig (Google's Director of Research) calls this the "Unreasonable Effect
This proposition is sure to cause barroom brawls at scientific conferences for years to come, because it advocates another path to truth that _does not follow_ the Scientific Method. Roughly speaking, the scientific method has you (a) use a simplified model of the universe to make falsifiable predictions; (b) test those predictions in controlled circumstances; (c) use established truths to bound any discrepancies footnote:[plus (d) a secret dose of our sense of the model's elegance]. Under this paradigm, data is non-comprehensive: scientific practice demands you carefully control experimental conditions, and the whole point of the model is to strip out all but the reductionistically necessary parameter. A large part of the analytic machinery acts to account for discrepancies from sampling (too little comprehensiveness) or discrepancies from "extraneous" effects (too much comprehensiveness). If those discrepancies are modest, the model is judged to be valid. This paradigm is regarded as the only acceptably rigorous way to admit a simplified representation of the world into the canon of truth.





=== Simple Exploration

(TODO transplant intro to UFO sighting data here)
(TODO introduce this in context of reindeer?)

Sad to say, but many of the sighting reports are likely to be bogus. To eliminate sightings that lack a detailed description, we can filter out records whose description Field is shorter than 80 characters:

----
TODO code
----

A key activity in a Big Data exploration is summarizing big datasets into a comprehensible smaller ones. Each sighting has a field giving the shape of the flying object: cigar, disk, etc. This script will tell us how many sightings there are for each craft type:

----
LOAD sightings
GROUP sightings BY craft type
FOREACH cf_sightings GENERATE COUNTSTAR(sightings)
STORE cf_counts INTO 'out/geo/ufo_sightings/craft_type_counts';
----

We can make a little travel guide for the sightings by amending each sighting with the Wikipedia article about its place. The JOIN operator matches records from different tables based on a common key:

----
TODO pseudocode
----

This yields the following output:

Of course this would make a much better travel guide if it held not just the one article about the general location but a set of prominent nearby places of interest. We'll show you how to do a nearby-ness query in the Geodata chapter (REF), and how to attach a notion of "prominence" in the event log chapter (REF).





0 comments on commit fdf425c

Please sign in to comment.