Skip to content

Commit

Permalink
Merge branch 'release/0.9.14'
Browse files Browse the repository at this point in the history
  • Loading branch information
alexanderdean committed Dec 31, 2014
2 parents 0ff2c5a + e64ccaf commit f93812d
Show file tree
Hide file tree
Showing 91 changed files with 3,884 additions and 176 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
# Only project- and language-specific ignores in here. Use global .gitignore for editors etc
# Nothing in here yet

# Vagrant
.vagrant/
2 changes: 1 addition & 1 deletion 2-collectors/clojure-collector/java-servlet/project.clj
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
;;;; Copyright: Copyright (c) 2012-2013 Snowplow Analytics Ltd
;;;; License: Apache License Version 2.0

(defproject snowplow/clojure-collector "0.9.0" ;; MUST also bump version in server.xml
(defproject snowplow/clojure-collector "0.9.1" ;; MUST also bump version in server.xml
:license {:name "Apache Version 2.0"
:url "http://www.apache.org/licenses/LICENSE-2.0"}
:description "A SnowPlow event collector written in Clojure. AWS Elastic Beanstalk compatible."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
(def ^:const cookie-name "sp")
(def ^:const id-name "uid")

(def pixel (Base64/decodeBase64 (.getBytes "R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="))) ; Can't define ^:const on this as per http://stackoverflow.com/questions/13109958/why-cant-i-use-clojures-const-with-a-java-byte-array
(def pixel (Base64/decodeBase64 (.getBytes "R0lGODlhAQABAPAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw=="))) ; Can't define ^:const on this as per http://stackoverflow.com/questions/13109958/why-cant-i-use-clojures-const-with-a-java-byte-array
(def ^:const pixel-length (str (alength pixel)))

(defn- uuid
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
APR (HTTP/AJP) Connector: /docs/apr.html
Define a non-SSL HTTP/1.1 Connector on port 8080
-->
<Connector port="8080" protocol="HTTP/1.1"
<Connector port="8080" maxHttpHeaderSize="65536" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"
enableLookups="true" />
Expand Down Expand Up @@ -135,7 +135,7 @@
Note: The pattern used ensures that the access log format matches that produced by the Cloudfront Collector. (So that the same ETL process can be employed for both collectors.) -->
<Valve className="com.snowplowanalytics.snowplow.collectors.clojure.SnowplowAccessLogValve" directory="logs"
prefix="localhost_access_log" suffix=".txt" rotatable="false" requestAttributesEnabled="true"
pattern="%{yyyy-MM-dd}t&#9;%{HH:mm:ss}t&#9;-&#9;%b&#9;%a&#9;%m&#9;%h&#9;%U&#9;%s&#9;%{Referer}i&#9;%{User-Agent}I&#9;%q&amp;cv=clj-0.9.0-%v&amp;nuid=%{sp}C&#9;-&#9;-&#9;-&#9;%~&#9;%w" />
pattern="%{yyyy-MM-dd}t&#9;%{HH:mm:ss}t&#9;-&#9;%b&#9;%a&#9;%m&#9;%h&#9;%U&#9;%s&#9;%{Referer}i&#9;%{User-Agent}I&#9;%q&amp;cv=clj-0.9.1-%v&amp;nuid=%{sp}C&#9;-&#9;-&#9;-&#9;%~&#9;%w" />


</Host>
Expand Down
Binary file modified 2-collectors/cloudfront-collector/static/i
Binary file not shown.
1 change: 1 addition & 0 deletions 3-enrich/emr-etl-runner/Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ gem "contracts", "~> 0.4"
gem "elasticity", "~> 3.0.4"
gem "sluice", "~> 0.2.1"
gem "awrence", "~> 0.1.0"
gem "time_diff", "~> 0.3.0"

group :development do
gem "rspec", "~> 2.14", ">= 2.14.1"
Expand Down
16 changes: 16 additions & 0 deletions 3-enrich/emr-etl-runner/Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
GEM
remote: https://rubygems.org/
specs:
activesupport (4.1.8)
i18n (~> 0.6, >= 0.6.9)
json (~> 1.7, >= 1.7.7)
minitest (~> 5.1)
thread_safe (~> 0.1)
tzinfo (~> 1.1)
awrence (0.1.0)
builder (3.2.2)
contracts (0.4)
Expand Down Expand Up @@ -38,10 +44,13 @@ GEM
fog-json (1.0.0)
multi_json (~> 1.0)
formatador (0.2.5)
i18n (0.6.11)
inflecto (0.0.2)
ipaddress (0.8.0)
json (1.8.1)
mime-types (2.3)
mini_portile (0.6.0)
minitest (5.4.3)
multi_json (1.10.1)
net-scp (1.2.1)
net-ssh (>= 2.6.5)
Expand Down Expand Up @@ -69,7 +78,13 @@ GEM
term-ansicolor (1.3.0)
tins (~> 1.0)
thor (0.19.1)
thread_safe (0.3.4)
time_diff (0.3.0)
activesupport
i18n
tins (1.3.0)
tzinfo (1.2.2)
thread_safe (~> 0.1)
unf (0.1.4)
unf_ext
unf_ext (0.0.6)
Expand All @@ -84,3 +99,4 @@ DEPENDENCIES
elasticity (~> 3.0.4)
rspec (~> 2.14, >= 2.14.1)
sluice (~> 0.2.1)
time_diff (~> 0.3.0)
4 changes: 2 additions & 2 deletions 3-enrich/emr-etl-runner/config/config.yml.sample
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@
:etl:
:job_name: Snowplow ETL # Give your job a name
:versions:
:hadoop_enrich: 0.10.1 # Version of the Hadoop Enrichment process
:hadoop_shred: 0.2.1 # Version of the Hadoop Shredding process
:hadoop_enrich: 0.11.0 # Version of the Hadoop Enrichment process
:hadoop_shred: 0.3.0 # Version of the Hadoop Shredding process
:collector_format: cloudfront # Or 'clj-tomcat' for the Clojure Collector
:continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
:iglu:
Expand Down
2 changes: 1 addition & 1 deletion 3-enrich/emr-etl-runner/lib/snowplow-emr-etl-runner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,6 @@
module Snowplow
module EmrEtlRunner
NAME = "snowplow-emr-etl-runner"
VERSION = "0.9.2"
VERSION = "0.10.0"
end
end
48 changes: 47 additions & 1 deletion 3-enrich/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ def run()
status = wait_for()

if !status
raise EmrExecutionError, "EMR jobflow #{jobflow_id} failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived."
raise EmrExecutionError, get_failure_details()
end

logger.debug "EMR jobflow #{jobflow_id} completed successfully."
Expand Down Expand Up @@ -354,6 +354,52 @@ def wait_for()
success
end

# Prettified string containing failure details
# for this job flow.
Contract None => String
def get_failure_details()

js = @jobflow.status

[
"EMR jobflow #{js.jobflow_id} failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.",
"#{js.name}: #{js.state} [#{js.last_state_change_reason}] ~ #{self.class.get_elapsed_time(js.started_at, js.ended_at)} #{self.class.get_timespan(js.started_at, js.ended_at)}"
].concat(js.steps
.sort { |a,b|
a.started_at <=> b.started_at
}
.each_with_index
.map { |s,i|
" - #{i + 1}. #{s.name}: #{s.state} ~ #{self.class.get_elapsed_time(s.started_at, s.ended_at)} #{self.class.get_timespan(s.started_at, s.ended_at)}"
})
.join("\n")
end

# Gets the time span.
#
# Parameters:
# +start+:: start time
# +_end+:: end time
Contract Maybe[Time], Maybe[Time] => String
def self.get_timespan(start, _end)
"[#{start} - #{_end}]"
end

# Gets the elapsed time in a
# human-readable format.
#
# Parameters:
# +start+:: start time
# +_end+:: end time
Contract Maybe[Time], Maybe[Time] => String
def self.get_elapsed_time(start, _end)
if start.nil? or _end.nil?
"elapsed time n/a"
else
Time.diff(start, _end, '%H %N %S')[:diff]
end
end

# We need to partition our output buckets by run ID
# Note buckets already have trailing slashes
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def self.stage_logs_for_emr(args, config)
files_to_move = case
when (args[:start].nil? and args[:end].nil?)
if config[:etl][:collector_format] == 'clj-tomcat'
'.*localhost\_access\_log\.txt\-.*'
'.*localhost\_access\_log\.txt.*'
else
'.+'
end
Expand Down
2 changes: 1 addition & 1 deletion 3-enrich/scala-common-enrich/project/BuildSettings.scala
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ object BuildSettings {
// Basic settings for our app
lazy val basicSettings = Seq[Setting[_]](
organization := "com.snowplowanalytics",
version := "0.9.1",
version := "0.10.0",
description := "Common functionality for enriching raw Snowplow events",
scalaVersion := "2.10.1",
scalacOptions := Seq("-deprecation", "-encoding", "utf8",
Expand Down
2 changes: 1 addition & 1 deletion 3-enrich/scala-common-enrich/project/Dependencies.scala
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ object Dependencies {
val refererParser = "0.2.2"
val maxmindIplookups = "0.2.0"
val json4s = "3.2.11"
val igluClient = "0.1.1"
val igluClient = "0.2.0"
// Scala (test only)
val specs2 = "1.14"
val scalazSpecs2 = "0.1.2"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.
*/
package com.snowplowanalytics
package com.snowplowanalytics
package snowplow
package enrich
package common
Expand All @@ -29,7 +29,10 @@ import registry.{
SnowplowAdapter,
IgluAdapter,
CallrailAdapter,
MailchimpAdapter
MailchimpAdapter,
MandrillAdapter,
PagerdutyAdapter,
PingdomAdapter
}

/**
Expand All @@ -39,10 +42,13 @@ import registry.{
object AdapterRegistry {

private object Vendor {
val Snowplow = "com.snowplowanalytics.snowplow"
val Iglu = "com.snowplowanalytics.iglu"
val Callrail = "com.callrail"
val Snowplow = "com.snowplowanalytics.snowplow"
val Iglu = "com.snowplowanalytics.iglu"
val Callrail = "com.callrail"
val Mailchimp = "com.mailchimp"
val Mandrill = "com.mandrill"
val Pagerduty = "com.pagerduty"
val Pingdom = "com.pingdom"
}

/**
Expand All @@ -64,6 +70,9 @@ object AdapterRegistry {
case (Vendor.Iglu, "v1") => IgluAdapter.toRawEvents(payload)
case (Vendor.Callrail, "v1") => CallrailAdapter.toRawEvents(payload)
case (Vendor.Mailchimp, "v1") => MailchimpAdapter.toRawEvents(payload)
case (Vendor.Mandrill, "v1") => MandrillAdapter.toRawEvents(payload)
case (Vendor.Pagerduty, "v1") => PagerdutyAdapter.toRawEvents(payload)
case (Vendor.Pingdom, "v1") => PingdomAdapter.toRawEvents(payload)
// TODO: add Sendgrid et al
case _ => s"Payload with vendor ${payload.api.vendor} and version ${payload.api.version} not supported by this version of Scala Common Enrich".failNel
}
Expand Down

0 comments on commit f93812d

Please sign in to comment.