This repository has been archived by the owner. It is now read-only.
Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
hadoop ruby/streaming statistically improbable phrases
Fetching latest commit…
Cannot retrieve the latest commit at this time.
on the train project to attempt to copy amazons statistically improbable phrase calculations data from project gutenberg, runs using hadoop streaming with ruby map/reduce functions see project page at http://matpalm.com/sip bash> <start-hadoop-here/> bash> rake prepare_files input=input.eg # upload 8 file example bash> rake upload_input bash> rake calculate_sips bash> rake cat dir=least_freq_trigrams bash> # bask in glow of diy-sips bash> zcat hadoop-input/*gz | ./calc_sips_simple.rb bash> # be amazed by how much faster it was to NOT use hadoop coming soon: running in the cloud, when does hadoop become worth it...