Skip to content

Commit

Permalink
Pass GZIP compression argument to S3DistCp as "gz" not "gzip" (closes s…
Browse files Browse the repository at this point in the history
  • Loading branch information
BenFradet authored and peel committed May 25, 2020
1 parent a3a2a0f commit 3dba229
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 1 deletion.
4 changes: 3 additions & 1 deletion lib/snowplow-emr-etl-runner/utils.rb
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,9 @@ def partition_by_run(folder, run_id, retain=true)
def output_codec_from_compression_format(compression_format)
# those are the supported compression codecs
if not compression_format.nil? and [ 'gzip', 'gz', 'lzo', 'snappy' ].include?(compression_format.downcase)
[ '--outputCodec', compression_format.downcase ]
downcased = compression_format.downcase
format = [ 'gzip', 'gz' ].include?(downcased) ? 'gz' : downcased
[ '--outputCodec', format ]
else
[]
end
Expand Down
5 changes: 5 additions & 0 deletions spec/snowplow-emr-etl-runner/utils_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,11 @@
expect(subject.output_codec_from_compression_format('NONE')).to eq([])
end

it 'should give back gz if the compression format is GZIP or GZ' do
expect(subject.output_codec_from_compression_format('GZIP')).to eq([ '--outputCodec', 'gz' ])
expect(subject.output_codec_from_compression_format('GZ')).to eq([ '--outputCodec', 'gz' ])
end

it 'should return the proper output codec if the provided one is supported' do
expect(subject.output_codec_from_compression_format('LZO')).to eq([ '--outputCodec', 'lzo' ])
end
Expand Down

0 comments on commit 3dba229

Please sign in to comment.