Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for running steps on a persistent EMR cluster (closes sno…
- Loading branch information
Showing
12 changed files
with
296 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
#!/bin/sh | ||
|
||
# Copyright (c) 2012-2018 Snowplow Analytics Ltd. All rights reserved. | ||
# | ||
# This program is licensed to you under the Apache License Version 2.0, | ||
# and you may not use this file except in compliance with the Apache License Version 2.0. | ||
# You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0. | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the Apache License Version 2.0 is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under. | ||
|
||
# Author:: Joshua Beemster (mailto:support@snowplowanalytics.com) | ||
# Copyright:: Copyright (c) 2012-2018 Snowplow Analytics Ltd | ||
# License:: Apache License Version 2.0 | ||
|
||
# Recursively removes a list of directories on HDFS | ||
for i in "${@}" | ||
do | ||
hadoop fs -test -d ${i} | ||
if [ $? == 0 ]; then | ||
echo "Removing directory ${i} ..." | ||
hadoop fs -rm -r -skipTrash ${i} | ||
else | ||
echo "Directory ${i} does not exist" | ||
fi | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Copyright (c) 2012-2018 Snowplow Analytics Ltd. All rights reserved. | ||
# | ||
# This program is licensed to you under the Apache License Version 2.0, | ||
# and you may not use this file except in compliance with the Apache License Version 2.0. | ||
# You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0. | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the Apache License Version 2.0 is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under. | ||
|
||
# Author:: Joshua Beemster (mailto:support@snowplowanalytics.com) | ||
# Copyright:: Copyright (c) 2012-2018 Snowplow Analytics Ltd | ||
# License:: Apache License Version 2.0 | ||
|
||
require 'contracts' | ||
require 'pathname' | ||
require 'uri' | ||
|
||
module Snowplow | ||
module EmrEtlRunner | ||
module EMR | ||
|
||
include Contracts | ||
|
||
# Attempts to find an active EMR JobFlow with a given name | ||
# | ||
# Parameters: | ||
# +client+:: EMR client | ||
# +name+:: EMR cluster name | ||
def get_emr_jobflow_id(client, name) | ||
# Marker is used for paginating through all results | ||
marker = nil | ||
emr_clusters = [] | ||
|
||
loop do | ||
response = list_clusters(client, marker) | ||
emr_clusters = emr_clusters + response['Clusters'].select { |c| c['Name'] == name } | ||
marker = response['Marker'] if response.has_key?('Marker') | ||
break if marker.nil? | ||
end | ||
|
||
case emr_clusters.size | ||
when 0 | ||
return nil | ||
when 1 | ||
emr_cluster = emr_clusters.first | ||
if emr_cluster['Status']['State'] == "RUNNING" | ||
raise EmrClusterStateError, "EMR Cluster must be in WAITING state before new job steps can be submitted - found #{emr_cluster['Status']['State']}" | ||
end | ||
return emr_cluster['Id'] | ||
else | ||
raise EmrDiscoveryError, "EMR Cluster name must be unique for safe discovery - found #{emr_clusters.size} with name #{name}" | ||
end | ||
end | ||
|
||
private | ||
|
||
def list_clusters(client, marker) | ||
options = { | ||
states: ["WAITING", "RUNNING"], | ||
} | ||
options[:marker] = marker unless marker.nil? | ||
client.list_clusters(options) | ||
end | ||
|
||
end | ||
end | ||
end |
Oops, something went wrong.