Passing options to streamers is now deprecated. Use
Streamer by default has a periodic monitor that logs (to STDERR by default) every 10_000 lines or 30 seconds
Examples cleaned up, should all run
- you can now pass Script.new an instance of Streamer to use as mapper or reducer
- Adding an experimental sugar:
#!/usr/bin/env ruby require 'wukong/script' LineStreamer.map do |line| emit line.reverse end.run
Note that you can now tweet a wukong script.
- It's now recommended that at the top of a wukong script you say
require 'wukong/script'Among other benefits, this lets you refer to wukong streamers without prefix.
- EMR support now works very well
- A couple of bugfixes. Sorry about that.
- Documentation fixes
Use --run=emr to launch a job onto the Amazon Elastic MapReduce cloud.
- copies the script to s3, as foo-mapper.rb and foo-reducer.rb (removing the need for the --map flag)
- copies the wukong libs up as a .tar/bz2, and extracts it into the cwd
- combines settings from commandline and yaml config, etc to configure and launch job
It's still way shaky and I don't think anything but the sample app will run. That sample app runs, tho.
Incompatible changes to option handling and script launching:
- Script doesn't use extra_options any more. You should relocate them to the initializer or to configliere.
- there is no more default_mapper or default_reducer
- Improvements to the pig conversion methods
hdp-rmrespects the -skipTrash method
- added the
- added jobconfs for io_job_mb and friends.
- added a loadable module to convert output data to pig bags and tuples
- pulled in several methods from active_support, incl. Enumerable#sum
- Scripts to find percentile rank of elements in a dataset
- We are starting to move wukong to a model where streaming is from a generic source into a generic sink. Several stores have been landed in the code, but many are in a half- or un-baked state. Please ignore this for the moment.
- made scripts inject a helpful job name using mapred.job.name
- Hash.compact_blank! and HashLike.compact_blank! -- eliminate all key-values whoes value is blank?
- Bug in passing commandline args down to map and reduce child processes
Lots more examples:
- examples/stats/avg_value_frequency.rb does an Average Value Frequency histogram
- examples/server_logs has a quite useful apache log file parser
- Made the base streamer use each_record, opening the door for alternative record injection (eg Datamapper!)
- wukong/streamer/counting_reducer.rb is an um reducer and it counts things.
- A HELLA AWESOME working example from retail web analytics by @lenbust
--run=localmode, you can use '-' alone as a filename to indicate STDIN / STDOUT as input/output respectively.
- Minor tweaks to contrib/jeans