A Seriously Fun guide to Big Data Analytics in Practice
Ruby PigLatin Other
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
attic
bin
code
data @ f3c0820
images
supplementary
.dexy
.gitattributes
.gitignore
.gitmodules
.gitscribe
Ch00-preface.asciidoc
Ch01-hadoop_basics.asciidoc
Ch02-map_reduce.asciidoc
Ch03-introducing-baseball-data.asciidoc
Ch04-introduction_to_pig.asciidoc
Ch05-map_only_patterns.asciidoc
Ch06-grouping_patterns.asciidoc
Ch07-joining_patterns.asciidoc
Ch08-ordering_patterns.asciidoc
Ch09-uniquing_patterns.asciidoc
Part_II-patterns.asciidoc
Part_I_Intro.asciidoc
README.md
Rakefile
TODO.asciidoc
XX00-outlines.asciidoc
XX01-intro.asciidoc
XX01-opening.asciidoc
XX02---part_one-basics.asciidoc
XX03.2-extra-bits.asciidoc
XX03.5-advanced-mapreduce.asciidoc
XX05.5-advanced-material.asciidoc
XX10-statistics_and_sampling.asciidoc
XX11-advanced_patterns.asciidoc
XX12---part_three-applications.asciidoc
XX12-event_streams.asciidoc
XX13-munging.asciidoc
XX14a-spatial-intro.asciidoc
XX14b-spatial-mechanics.asciidoc
XX14c-spatial-aggregations_on_regions.asciidoc
XX14d-spatial-joins_on_regions.asciidoc
XX15-text_analysis.asciidoc
XX40---part_four-practicalities.asciidoc
XX41-big_data_ecosystem.asciidoc
XX42-organizing_data.asciidoc
XX43-commandline_mojo.asciidoc
XX46-tips_and_gotchas.asciidoc
XX50---part_five-internals_and_tuning.asciidoc
XX51-java_api.asciidoc
XX52-advanced_pig.asciidoc
XX53-hadoop_internals.asciidoc
XX53-tuning-practical_and_eager.asciidoc
XX53-tuning-wise_and_lazy.asciidoc
XX54-tuning-brave_and_foolish.asciidoc
XX54-tuning-use_method_checklist.asciidoc
XX55-hbase_data_modeling.asciidoc
XX80-appendixes.asciidoc
XXE_and_C.md
XXLICENSE.asciidoc
big_data_for_chimps.pdf
book-docinfo.xml
book.asciidoc
cover.pdf
cover.png
list.html

README.md

Big Data for Chimps: A Seriously Fun guide to Terabyte-scale data processing

This is the work-in-progress version of the upcoming O'Reilly book, Big Data for Chimps: A Seriously Fun guide to Hadoop and Terabyte-scale data processing.

Our intent is to provide the best guide for exploratory data analytics using Hadoop -- for data science in practice. We use high-level languages (Pig and Ruby) that make Hadoop a tool, not a framework, allowing re-use and rapid development. We'll cover enough Hadoop internals to save you from diving into the source code, and enough tuning advice to let you know where to drill deep.

In all cases, the focus is on maximizing your time and creativity -- on helping you uncover what question to ask and the right way to ask it.

O'Reilly has courageouly agreed to release the book under an http://creativecommons.org/licenses/by-nc-sa/3.0/[CC-BY-NC-SA]. To buy a physical copy of the book, or a Kindle (.mobi) or iOS/Nook (.epub), visite the early release http://shop.oreilly.com[O'Reilly bookstore] (TODO: link to early release page). Buy it now, and you'll get frequently-updated access and the final version once available.

License

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Code is Apache licensed unless specifically labeled otherwise.