version 0.3.0 -- fixing input array accidental flattening

qiaopeng55 · Nov 7, 2009 · 8e3af0e · 8e3af0e
1 parent 4009f6a
commit 8e3af0e
Show file tree

Hide file tree

Showing 8 changed files with 36 additions and 7 deletions.
diff --git a/cloud-crowd.gemspec b/cloud-crowd.gemspec
@@ -1,7 +1,7 @@
 Gem::Specification.new do |s|
   s.name      = 'cloud-crowd'
-  s.version   = '0.2.9'         # Keep version in sync with cloud-cloud.rb
+  s.version   = '0.3.0'         # Keep version in sync with cloud-cloud.rb
-  s.date      = '2009-11-03'
+  s.date      = '2009-11-06'
 
   s.homepage    = "http://wiki.github.com/documentcloud/cloud-crowd"
   s.summary     = "Parallel Processing for the Rest of Us"

diff --git a/lib/cloud-crowd.rb b/lib/cloud-crowd.rb
@@ -44,7 +44,7 @@ module CloudCrowd
   autoload :WorkUnit,     'cloud_crowd/models'
 
   # Keep this version in sync with the gemspec.
-  VERSION        = '0.2.9'
+  VERSION        = '0.3.0'
 
   # Increment the schema version when there's a backwards incompatible change.
   SCHEMA_VERSION = 3

diff --git a/lib/cloud_crowd/models/job.rb b/lib/cloud_crowd/models/job.rb
@@ -45,7 +45,7 @@ def check_for_completion
       return unless all_work_units_complete?
       set_next_status    
       outs = gather_outputs_from_work_units
-      return queue_for_workers(outs) if merging?
+      return queue_for_workers([outs]) if merging?
       if complete?
         update_attributes(:outputs => outs, :time => time_taken)
         Thread.new { fire_callback } if callback_url
@@ -177,7 +177,7 @@ def gather_outputs_from_work_units
     # away.
     def queue_for_workers(input=nil)
       input ||= JSON.parse(self.inputs)
-      [input].flatten.each {|i| WorkUnit.start(self, action, i, status) }        
+      input.each {|i| WorkUnit.start(self, action, i, status) }        
       self
     end
 

diff --git a/lib/cloud_crowd/models/work_unit.rb b/lib/cloud_crowd/models/work_unit.rb
@@ -87,6 +87,7 @@ def self.find_by_worker_name(name)
 
     # Convenience method for starting a new WorkUnit.
     def self.start(job, action, input, status)
+      input = input.to_json unless input.is_a? String
       self.create(:job => job, :action => action, :input => input, :status => status)
     end
 
@@ -97,7 +98,6 @@ def self.start(job, action, input, status)
     def finish(result, time_taken)
       if splitting?
         [parsed_output(result)].flatten.each do |new_input|
-          new_input = new_input.to_json unless new_input.is_a? String
           WorkUnit.start(job, action, new_input, PROCESSING)
         end
         self.destroy

diff --git a/test/unit/test_job.rb b/test/unit/test_job.rb
@@ -65,6 +65,11 @@ class JobTest < Test::Unit::TestCase
       assert job.splitting?
     end
 
+    should "not accidentally flatten array inputs" do
+      job = Job.create_from_request({'inputs' => [[1,2], [3,4]], 'action' => 'process_pdfs'})
+      assert JSON.parse(job.work_units.first.input) == [1,2]
+    end
+
     should "fire a callback when a job has finished, successfully or not" do
       @job.update_attribute(:callback_url, 'http://example.com/callback')
       Job.any_instance.stubs(:fire_callback).returns(true)

diff --git a/wiki/change_log.textile b/wiki/change_log.textile
@@ -1,6 +1,11 @@
+h3. Version 0.3.0
+
+* Fixed a bug where passing arrays as @inputs@ to an action would cause the 
+  arrays to be flattened.
+
 h3. Version 0.2.9
 
-* Added CloudCrowd.server? and CloudCrowd.node? methods, so that action
+* Added @CloudCrowd.server?@ and @CloudCrowd.node?@ methods, so that action
   dependencies can more easily be required only in the node.
 
 h3. Version 0.2.8

diff --git a/wiki/home.textile b/wiki/home.textile
@@ -19,6 +19,8 @@ sudo gem install cloud-crowd
 
 [[The Configuration Folder]]
 
+[[CloudCrowd on Rails]]
+
 [[Gallery of Actions]]
 
 [[Wish List]]

diff --git a/wiki/rails.textile b/wiki/rails.textile
@@ -0,0 +1,17 @@
+CloudCrowd can share code with your Rails applications, taking advantage of pre-existing models, and performing expensive database operations in parallel. The precise technique to accomplish this may vary -- not all applications will need to load the entire Rails stack in order to perform a job. At *DocumentCloud*, we store our actions in @app/actions@, and load the Rails environment in order to seamlessly access all of our models, just like during a real web request. Because CloudCrowd forks a worker for each work unit, Rails is only loaded once when a node spins up, and the child processes don't incur the expense of loading it.
+
+Here's an example snippet of code that we use to setup the Rails environment for an action. The @CloudCrowd.node?@ bit ensures that the Rails stack doesn't ever get loaded on the central server.
+
+<pre>
+# Inherit Rails environment from Sinatra.
+RAILS_ROOT = File.expand_path(Dir.pwd)
+RAILS_ENV = ENV['RAILS_ENV'] = ENV['RACK_ENV']
+
+# Load the DocumentCloud environment if we're in a Node context.
+if CloudCrowd.node?
+  require 'rubygems'
+  require 'activerecord'
+  ActiveRecord::Base.logger = Logger.new(STDOUT)
+  require 'config/environment'
+end
+</pre>