Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
alexanderdean committed May 16, 2014
2 parents f559c6e + d5a6161 commit 0068ae7
Show file tree
Hide file tree
Showing 18 changed files with 454 additions and 172 deletions.
1 change: 1 addition & 0 deletions .coveralls.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
service_name: travis-ci
10 changes: 10 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
language: ruby
cache: bundler

rvm:
- 1.9.3
- jruby
- 2.0.0
- 2.1.0

script: 'bundle exec rspec spec'
17 changes: 17 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
Version 0.2.0 (2014-05-16)
--------------------------
Bumped Contracts to 0.4 (#22)
Bumped Fog to 1.22.0 (#24)
Added gem button to README (#18)
Added Coveralls code coverage to project (#17)
Added Code Climate button to README (#23)
Added Travis support to project (#16)
Added initial unit tests (#15)
Added in FogFile = Fog::Storage::AWS::File-based contracts (#12)
Added additional contracts (#20)
Broke up s3.rb into separate files (#21)
Updated alter_filename_lambda to accept original filepath as 2nd arg (#19)
Overrode equality operator for S3::Location to support tests (#10)
Fixed break bugs in core Sluice process flow (#6)
Made is_empty? work if another folder starts with the same name as this one (#5)

Version 0.1.5 (2013-10-13)
--------------------------
Fixed is_empty? returns true if folder contains 1 file (#9)
Expand Down
11 changes: 11 additions & 0 deletions Guardfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
guard 'rspec' do
# watch /lib/ files
watch(%r{^lib/(.+).rb$}) do |m|
"spec/#{m[1]}_spec.rb"
end

# watch /spec/ files
watch(%r{^spec/(.+).rb$}) do |m|
"spec/#{m[1]}.rb"
end
end
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# Sluice
[![Gem Version](https://badge.fury.io/rb/sluice.svg)](http://badge.fury.io/rb/sluice)
[![Build Status](https://travis-ci.org/snowplow/sluice.png)](https://travis-ci.org/snowplow/sluice)
[![Code Climate](https://codeclimate.com/github/snowplow/sluice.png)](https://codeclimate.com/github/snowplow/sluice)
[![Coverage Status](https://coveralls.io/repos/snowplow/sluice/badge.png?branch=feature%2F0.2.0)](https://coveralls.io/r/snowplow/sluice?branch=feature%2F0.2.0)

Sluice is a Ruby gem (built with [Bundler] [bundler]) to help you build cloud-friendly ETL (extract, transform, load) processes.

Expand All @@ -21,7 +25,7 @@ Sluice has been extracted from a pair of Ruby ETL applications built by the [Sno

Or in your Gemfile:

gem 'sluice', '~> 0.1.0'
gem 'sluice', '~> 0.2.0'

## Usage

Expand All @@ -32,7 +36,7 @@ Rubydoc and usage examples to come.
To hack on Sluice locally:

$ gem build sluice.gemspec
$ sudo gem install sluice-0.1.0.gem
$ sudo gem install sluice-0.2.0.gem

To contribute:

Expand All @@ -48,7 +52,7 @@ Sluice was developed by [Alex Dean] [alexanderdean] ([Snowplow Analytics] [snowp

## Copyright and license

Sluice is copyright 2012-2013 Snowplow Analytics Ltd.
Sluice is copyright 2012-2014 Snowplow Analytics Ltd.

Licensed under the [Apache License, Version 2.0] [license] (the "License");
you may not use this software except in compliance with the License.
Expand Down
2 changes: 0 additions & 2 deletions Rakefile

This file was deleted.

14 changes: 6 additions & 8 deletions lib/sluice.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2012 SnowPlow Analytics Ltd. All rights reserved.
# Copyright (c) 2012-2014 Snowplow Analytics Ltd. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0,
# and you may not use this file except in compliance with the Apache License Version 2.0.
Expand All @@ -10,14 +10,12 @@
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.

# Author:: Alex Dean (mailto:support@snowplowanalytics.com)
# Copyright:: Copyright (c) 2012 SnowPlow Analytics Ltd
# Copyright:: Copyright (c) 2012-2014 Snowplow Analytics Ltd
# License:: Apache License Version 2.0

require 'sluice/errors'
require 'sluice/storage/storage'
require 'sluice/storage/s3'

module Sluice
NAME = "sluice"
VERSION = "0.1.5"
end
require 'sluice/storage/s3/contracts'
require 'sluice/storage/s3/location'
require 'sluice/storage/s3/manifest'
require 'sluice/storage/s3/s3'
4 changes: 2 additions & 2 deletions lib/sluice/errors.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2012 SnowPlow Analytics Ltd. All rights reserved.
# Copyright (c) 2012-2014 Snowplow Analytics Ltd. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0,
# and you may not use this file except in compliance with the Apache License Version 2.0.
Expand All @@ -10,7 +10,7 @@
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.

# Author:: Alex Dean (mailto:support@snowplowanalytics.com)
# Copyright:: Copyright (c) 2012 SnowPlow Analytics Ltd
# Copyright:: Copyright (c) 2012-2014 Snowplow Analytics Ltd
# License:: Apache License Version 2.0

# All errors
Expand Down
32 changes: 32 additions & 0 deletions lib/sluice/storage/s3/contracts.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Copyright (c) 2012-2014 Snowplow Analytics Ltd. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0,
# and you may not use this file except in compliance with the Apache License Version 2.0.
# You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the Apache License Version 2.0 is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.

# Authors:: Alex Dean (mailto:support@snowplowanalytics.com), Michael Tibben
# Copyright:: Copyright (c) 2012-2014 Snowplow Analytics Ltd
# License:: Apache License Version 2.0

require 'fog'
require 'fog/aws/models/storage/file'

require 'contracts'
include Contracts

module Sluice
module Storage
module S3

# Aliases for Contracts
FogStorage = Fog::Storage::AWS::Real
FogFile = Fog::Storage::AWS::File

end
end
end
77 changes: 77 additions & 0 deletions lib/sluice/storage/s3/location.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Copyright (c) 2012-2014 Snowplow Analytics Ltd. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0,
# and you may not use this file except in compliance with the Apache License Version 2.0.
# You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the Apache License Version 2.0 is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.

# Authors:: Alex Dean (mailto:support@snowplowanalytics.com), Michael Tibben
# Copyright:: Copyright (c) 2012-2014 Snowplow Analytics Ltd
# License:: Apache License Version 2.0

require 'contracts'
include Contracts

module Sluice
module Storage
module S3

# Class to describe an S3 location
# TODO: if we are going to require trailing line-breaks on
# buckets, maybe we should make that clearer?
class Location

attr_reader :bucket, :dir

# Location constructor
#
# Parameters:
# +s3location+:: the s3 location config string e.g. "bucket/directory"
Contract String => Location
def initialize(s3_location)
@s3_location = s3_location

s3_location_match = s3_location.match('^s3n?://([^/]+)/?(.*)/$')
raise ArgumentError, 'Bad S3 location %s' % s3_location unless s3_location_match

@bucket = s3_location_match[1]
@dir = s3_location_match[2]
self
end

Contract nil => String
def dir_as_path
if @dir.length > 0
return @dir+'/'
else
return ''
end
end

Contract nil => String
def to_s
@s3_location
end

Contract Any => Bool
def ==(o)
o.class == self.class && o.state == state
end
alias_method :eql?, :==

protected

Contract nil => [String, String, String]
def state
[@s3_location, @bucket, @dir]
end

end

end
end
end
129 changes: 129 additions & 0 deletions lib/sluice/storage/s3/manifest.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Copyright (c) 2012-2014 Snowplow Analytics Ltd. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0,
# and you may not use this file except in compliance with the Apache License Version 2.0.
# You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the Apache License Version 2.0 is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.

# Authors:: Alex Dean (mailto:support@snowplowanalytics.com), Michael Tibben
# Copyright:: Copyright (c) 2012-2014 Snowplow Analytics Ltd
# License:: Apache License Version 2.0

require 'set'

require 'contracts'
include Contracts

module Sluice
module Storage
module S3

# Legitimate manifest scopes:
# 1. :filename - store only the filename
# in the manifest
# 2. :relpath - store the relative path
# to the file in the manifest
# 3. :abspath - store the absolute path
# to the file in the manifest
# 4. :bucket - store bucket PLUS absolute
# path to the file in the manifest
#
# TODO: add support for 2-4. Currently only 1 supported
class ManifestScope

@@scopes = Set::[](:filename) # TODO add :relpath, :abspath, :bucket

def self.valid?(val)
val.is_a?(Symbol) &&
@@scopes.include?(val)
end
end

# Class to read and maintain a manifest.
class Manifest
attr_reader :s3_location, :scope, :manifest_file

# Manifest constructor
#
# Parameters:
# +path+:: full path to the manifest file
# +scope+:: whether file entries in the
# manifest should be scoped to
# filename, relative path, absolute
# path, or absolute path and bucket
Contract Location, ManifestScope => nil
def initialize(s3_location, scope)
@s3_location = s3_location
@scope = scope
@manifest_file = "%ssluice-%s-manifest" % [s3_location.dir_as_path, scope.to_s]
nil
end

# Get the current file entries in the manifest
#
# Parameters:
# +s3+:: A Fog::Storage s3 connection
#
# Returns an Array of filenames as Strings
Contract FogStorage => ArrayOf[String]
def get_entries(s3)

manifest = self.class.get_manifest(s3, @s3_location, @manifest_file)
if manifest.nil?
return []
end

manifest.body.split("\n").reject(&:empty?)
end

# Add (i.e. append) the following file entries
# to the manifest
# Files listed previously in the manifest will
# be kept in the new manifest file.
#
# Parameters:
# +s3+:: A Fog::Storage s3 connection
# +entries+:: an Array of filenames as Strings
#
# Returns all entries now in the manifest
Contract FogStorage, ArrayOf[String] => ArrayOf[String]
def add_entries(s3, entries)

existing = get_entries(s3)
filenames = entries.map { |filepath|
File.basename(filepath)
} # TODO: update when non-filename-based manifests supported
all = (existing + filenames)

manifest = self.class.get_manifest(s3, @s3_location, @manifest_file)
body = all.join("\n")
if manifest.nil?
bucket = s3.directories.get(s3_location.bucket).files.create(
:key => @manifest_file,
:body => body
)
else
manifest.body = body
manifest.save
end

all
end

private

# Helper to get the manifest file
Contract FogStorage, Location, String => Maybe[FogFile]
def self.get_manifest(s3, s3_location, filename)
s3.directories.get(s3_location.bucket, prefix: s3_location.dir).files.get(filename) # TODO: break out into new generic get_file() procedure
end

end

end
end
end
Loading

0 comments on commit 0068ae7

Please sign in to comment.