Permalink
Browse files

Fleshed out schema export

  • Loading branch information...
1 parent 36a8112 commit 2b92784d4a754700e84225ca1585f72f0ddaa969 Philip (flip) Kromer committed Oct 6, 2009
Showing with 322 additions and 40 deletions.
  1. +63 −0 INSTALL.textile
  2. +16 −1 README.textile
  3. +18 −14 lib/wukong/datatypes/enum.rb
  4. +17 −0 lib/wukong/datatypes/fake_types.rb
  5. +208 −25 lib/wukong/schema.rb
View
@@ -0,0 +1,63 @@
+---
+layout: default
+title: Install Notes
+---
+h1(gemheader). {{ site.gemname }} %(small):: install%
+
+<notextile><div class="toggle"></notextile>
+
+h2. Get the code
+
+Wukong is still under active development. The newest version is available via "Git":http://git-scm.com on "github:":http://github.com/mrflip/{{ site.gemname }}
+
+pre. $ git clone git://github.com/mrflip/{{ site.gemname }}
+
+A gem is available from "github:":http://gems.github.com
+
+pre. $ sudo gem install mrflip-{{ site.gemname }} --source=http://gems.github.com
+
+or from "gemcutter":http://gemcutter.org
+
+pre. $ sudo gem install {{ site.gemname }} --source=http://gemcutter.org
+
+You can instead download this project in either "zip":http://github.com/mrflip/{{ site.gemname }}/zipball/master or "tar":http://github.com/mrflip/{{ site.gemname }}/tarball/master formats.
+
+<notextile></div><div class="toggle"></notextile>
+
+h2. Get the Dependencies
+
+* Hadoop, pig
+* extlib, YAML, JSON
+* Optional gems: trollop, addressable/uri, htmlentities
+
+
+<notextile></div><div class="toggle"></notextile>
+
+h2. Setup
+
+1. Allow Wukong to discover where his elephant friend lives: either
+** set a $HADOOP_HOME environment variable,
+** or create a file 'config/wukong-site.yaml' with a line that points to the top-level directory of your hadoop install: @:hadoop_home: /usr/local/share/hadoop@
+2. Add wukong's @bin/@ directory to your $PATH, so that you may use its filesystem shortcuts.
+
+<notextile></div></notextile>
+
+h2. Installing and Running Wukong under Hadoop
+
+Wukong is used by many in an non-Hadoop environment -- anywhere you can stream data records you can unleash its monkey power. It was developed for Hadoop, though, and we think it's actually the best (and certainly the most fun) way to use Hadoop.
+
+h3. Set up a Hadoop cluster
+
+To set up hadoop, your best bet are the Cloudera AMIs on Amazon's EC2 compute cloud:
+
+* http://www.cloudera.com/hadoop-ec2
+* http://www.cloudera.com/hadoop-ec2-ebs-beta
+
+EC2 means anyone with a $10 bill can rent a 10-machine cluster with 1TB of distributed storage for 8 hours.
+
+If you have a local cluster, or just want to experiment with a single-machine install, check out the Cloudera packages for both Debian/Ubuntu-based and Redhat/RPM-based Linux systems.
+
+h3. Run Wukong on the Amazon AWS EC2 Cloud
+
+Phil Ripperger has prepared "instructions on getting wukong to work on the Amazon AWS cloud":http://blog.pdatasolutions.com/post/191978092/ruby-on-hadoop-quickstart that are better than anything we could put here. Thanks Phil!
+
View
@@ -12,6 +12,22 @@ Wukong is friends with "Hadoop":http://hadoop.apache.org/core the elephant, "Pig
The main documentation -- including tutorials and tips for working with big data -- lives on the "Wukong Pages":http://mrflip.github.com/wukong and there is some supplemental information on the "wukong wiki.":http://wiki.github.com/mrflip/wukong
+h2. Install
+
+Wukong is still under active development. The newest version is available at
+
+ http://github.com/mrflip/wukong
+
+A gem is available from "github:":http://gems.github.com
+
+ gem install mrflip-wukong --source=http://gems.github.com
+
+or from "gemcutter":http://gemcutter.org
+
+ gem install wukong --source=http://gemcutter.org
+
+Phil Ripperger has prepared "instructions on getting wukong to work on the Amazon AWS cloud.":http://blog.pdatasolutions.com/post/191978092/ruby-on-hadoop-quickstart Thanks Phil!
+
h2. How to write a Wukong script
Here's a script to count words in a text stream:
@@ -165,7 +181,6 @@ h2. Setup
2. Add wukong's @bin/@ directory to your $PATH, so that you may use its filesystem shortcuts.
-
h2. How to run a Wukong script
To run your script using local files and no connection to a hadoop cluster,
@@ -26,25 +26,38 @@ def initialize val
def self.[] *args
new *args
end
+
+ # returns the value corresponding to that string representation
+ def index *args
+ # delegate
+ self.class.names.index *args
+ end
+
+ # Representations:
def to_i
val
end
def to_s
return nil if val.nil?
self.class.names[val]
end
+
def inspect
"<#{self.class.to_s} #{to_i} (#{to_s})>"
end
- # returns the value corresponding to that string representation
- def index *args
- # delegate
- self.class.names.index *args
- end
+
def to_flat
to_s #to_i
end
+ def self.to_sql_str
+ "ENUM('#{names.join("', '")}')"
+ end
+
+ def self.to_pig
+ 'chararray'
+ end
+
#
# Use enumerates to set the class' names
#
@@ -57,17 +70,8 @@ def to_flat
def self.enumerates *names
self.names = names.map(&:to_s)
end
-
- def self.to_sql_str
- "ENUM('#{names.join("', '")}')"
- end
-
- def self.typify
- 'chararray'
- end
end
-
#
# Note that bin 0 is
#
@@ -0,0 +1,17 @@
+module Wukong
+ module Datatypes
+ class Text < String ; end unless defined?(Text)
+ class Blob < String ; end unless defined?(Blob)
+ class Boolean < Integer ; end unless defined?(Boolean)
+ class BigDecimal < Float ; end unless defined?(BigDecimal)
+ class EpochTime < Integer ; end unless defined?(EpochTime)
+ class FilePath < String ; end unless defined?(FilePath)
+ class Flag < String ; end unless defined?(Flag)
+ class IPAddress < String ; end unless defined?(IPAddress)
+ class URI < String ; end unless defined?(URI)
+ class Csv < String ; end unless defined?(Csv)
+ class Yaml < String ; end unless defined?(Yaml)
+ class Json < String ; end unless defined?(Json)
+ class Regex < Regexp ; end unless defined?(Regex)
+ end
+end
Oops, something went wrong.

0 comments on commit 2b92784

Please sign in to comment.