Skip to content

Commit

Permalink
EmrEtlRunner: use Elasticity to specify Thrift-specific configuration (
Browse files Browse the repository at this point in the history
…closes #3252)
  • Loading branch information
BenFradet committed Jun 27, 2017
1 parent f39f487 commit 5714d5e
Showing 1 changed file with 14 additions and 28 deletions.
42 changes: 14 additions & 28 deletions 3-enrich/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,33 +22,6 @@
require 'contracts'
require 'iglu-client'

# Global variable used to decide whether to patch Elasticity's AwsRequestV4 payload with Configurations
# This is only necessary if we are loading Thrift with AMI >= 4.0.0
$patch_thrift_configuration = false

# Monkey patched to support Configurations
module Elasticity
class AwsRequestV4
def payload
if $patch_thrift_configuration
@ruby_service_hash["Configurations"] = [{
"Classification" => "core-site",
"Properties" => {
"io.file.buffer.size" => "65536"
}
},
{
"Classification" => "mapred-site",
"Properties" => {
"mapreduce.user.classpath.first" => "true"
}
}]
end
AwsUtils.convert_ruby_to_aws_v4(@ruby_service_hash).to_json
end
end
end

# Ruby class to execute Snowplow's Hive jobs against Amazon EMR
# using Elasticity (https://github.com/rslifka/elasticity).
module Snowplow
Expand Down Expand Up @@ -165,7 +138,20 @@ def initialize(debug, enrich, shred, elasticsearch, s3distcp, archive_raw, confi
@jobflow.add_bootstrap_action(action)
end
else
$patch_thrift_configuration = true
[{
"Classification" => "core-site",
"Properties" => {
"io.file.buffer.size" => "65536"
}
},
{
"Classification" => "mapred-site",
"Properties" => {
"mapreduce.user.classpath.first" => "true"
}
}].each do |config|
@jobflow.add_configuration(config)
end
end
end

Expand Down

0 comments on commit 5714d5e

Please sign in to comment.