Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

executable file 34 lines (29 sloc) 0.928 kb
#!/usr/bin/env ruby
require 'rubygems'
require 'wukong/script'
Settings.define :sampling_fraction, :type => Float, :required => true, :description => "floating-point number between 0 and 1 giving the fraction of lines to emit: at sampling_fraction=1 all records are emitted, at 0 none are."
#
# Probabilistically emit some fraction of record/lines
#
# Set the sampling fraction at the command line using the
# --sampling_fraction=
# option: for example, to take a random 1/1000th of the lines in huge_files,
# ./examples/sample_records.rb --sampling_fraction=0.001 --run huge_files sampled_files
#
class Mapper < Wukong::Streamer::LineStreamer
include Wukong::Streamer::Filter
#
# randomly decide to emit +sampling_fraction+ fraction of lines
#
def emit? line
rand < Settings.sampling_fraction
end
end
#
# Executes the script
#
Wukong.run( Mapper,
nil,
:reduce_tasks => 0,
:reuse_jvms => true
)
Jump to Line
Something went wrong with that request. Please try again.