Permalink
Browse files

Initial factored-out Skywriting bindings.

  • Loading branch information...
1 parent b4fb54d commit e8dce2851203c888a10fef20903e6646e038c166 @mrry committed Jul 28, 2011
Showing 425 changed files with 15 additions and 36,426 deletions.
View
@@ -1,24 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<classpath>
- <classpathentry kind="src" path="src/java"/>
- <classpathentry kind="src" path="examples/skyhout/src/java"/>
- <classpathentry kind="src" path="examples/tests/src/java"/>
- <classpathentry kind="src" path="examples/Pi/src/java"/>
- <classpathentry kind="src" path="examples/SmithWaterman/src/java"/>
- <classpathentry kind="src" path="examples/Grep/src"/>
- <classpathentry kind="src" path="examples/TeraSort/src"/>
- <classpathentry kind="src" path="examples/WordCount/src"/>
- <classpathentry kind="src" path="examples/kmeans/src/java"/>
- <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
- <classpathentry kind="lib" path="ext/google-gson-1.6/gson-1.6.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/lib/commons-logging-1.1.1.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/lib/gson-1.3.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/lib/hadoop-core-0.20.2.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/lib/slf4j-api-1.5.8.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/lib/slf4j-jcl-1.5.8.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/lib/uncommons-maths-1.2.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/mahout-collections-0.3.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/mahout-core-0.3.jar"/>
- <classpathentry kind="lib" path="ext/mahout-0.3/mahout-math-0.3.jar"/>
- <classpathentry kind="output" path="bin"/>
-</classpath>
View
@@ -1,33 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<projectDescription>
- <name>ciel</name>
- <comment>Cloned from default (/auto/homes/dgm36/projects/PhD/mercator.hg)</comment>
- <projects>
- </projects>
- <buildSpec>
- <buildCommand>
- <name>org.eclipse.jdt.core.javabuilder</name>
- <arguments>
- </arguments>
- </buildCommand>
- <buildCommand>
- <name>org.python.pydev.PyDevBuilder</name>
- <arguments>
- </arguments>
- </buildCommand>
- <buildCommand>
- <name>org.eclipse.ui.externaltools.ExternalToolBuilder</name>
- <triggers>full,incremental,</triggers>
- <arguments>
- <dictionary>
- <key>LaunchConfigHandle</key>
- <value>&lt;project&gt;/.externalToolBuilders/Examples.launch</value>
- </dictionary>
- </arguments>
- </buildCommand>
- </buildSpec>
- <natures>
- <nature>org.eclipse.jdt.core.javanature</nature>
- <nature>org.python.pydev.pythonNature</nature>
- </natures>
-</projectDescription>
View
@@ -1,11 +0,0 @@
-<?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<?eclipse-pydev version="1.0"?>
-
-<pydev_project>
-<pydev_property name="org.python.pydev.PYTHON_PROJECT_VERSION">python 2.6</pydev_property>
-<pydev_property name="org.python.pydev.PYTHON_PROJECT_INTERPRETER">Default</pydev_property>
-<pydev_pathproperty name="org.python.pydev.PROJECT_SOURCE_PATH">
-<path>/ciel/src/python</path>
-</pydev_pathproperty>
-
-</pydev_project>
View
@@ -1,2 +1 @@
-include src/python/skywriting/runtime/lighttpd.conf
include src/sw/stdlib/*
View
12 README
@@ -1,12 +1,8 @@
-CIEL is an execution engine for distributed, parallel
-computation. Like previous frameworks (such as MapReduce and Dryad),
-it masks the complexity of distributed programming. Unlike those
-frameworks, it offers a full general-purpose programming language,
-called Skywriting, for expressing distributed coordination, which
-enables developers to implement algorithms using arbitrary task
-structures.
+Skywriting is a full general-purpose programming language for
+expressing distributed coordination, which enables developers to
+implement algorithms using arbitrary task structures.
For documentation, tutorials and examples, please visit:
-http://www.cl.cam.ac.uk/netos/ciel/
+http://www.cl.cam.ac.uk/netos/ciel/skywriting/
View
@@ -1,48 +0,0 @@
-To run the examples shipped with Ciel:
-
-DEPENDENCIES:
-
-1. Python 2.6
-2. Python packages: pycurl, cherrypy
-3. Java runtime environment available and in PATH
-
-BUILD EXAMPLES:
-
-In the repository root:
-
-./build-all.sh
-
-This will:
-a. Retrieve Hadoop/Mahout JARs to ext/mahout-0.3
-b. Compile Java examples found under examples/, leaving .class files in build/ .jar files in dist/
-c. Generate input data for some of the examples, stored under data/
-
-RUNNING A LOCAL CLUSTER:
-
-cd scripts
-./run_master.sh
-
-In another terminal:
-
-./run_worker.sh
-
-To launch further workers:
-
-WORKER_PORT=N ./run_worker.sh
-where WORKER_PORT is not the default port of 9001
-
-RUNNING EXAMPLES:
-
-cd scripts
-PYTHONPATH=../src/python ./sw-start-job ../src/packages/[eg-name].pack
-where eg-name is one of:
-
-java_test: Basic functionality test
-pi: Compute pi. Data-independent.
-grep: Searches for the word 'for' in the local dictionary, or another file if DATAREF is specified in the environment
-wordcount-java: Counts all words in its input, by default using the local dictionary but setting DATAREF in the environment
-kmeans: Computes kmeans clusters over some random data. Set KMEANS_VECTORS and KMEANS_CLUSTERS in the environment to override the default example data files. See the script (repo root)/gen-data.sh for an example of data generation.
-pagerank: Computes pagerank given a random web graph. Set PAGERANK_GRAPH_FILE in the environment to override its source.
-smithwaterman: Performs SW gene alignment using random input data
-
-At present most examples output the name of a reference rather than an intelligible result. To improve.
@@ -1,51 +0,0 @@
-MapReduce:
-----------
-
-// inputs is a list of file references
-inputs = [file(pc1, /tmp/input1), file(pc1, /tmp/input2), file(pc2, /tmp/input3), file(pc2, /tmp/input4)];
-mapper = "uk.ac.cam.cl.cloudmake.Mapper";
-reducer = "uk.ac.cam.cl.cloudmake.Reducer";
-num_reducers = 10;
-
-outputs = map_reduce(inputs, mapper, reducer, num_reducers);
-
-function map_reduce(input_partitions, map_class, reduce_class, num_reducers) {
- map_outputs = [];
- foreach (i in range(0, len(input_partitions))) {
- map_outputs[i] = run_java_class(map_class, input_partitions[i], num_reducers) ;
- }
- reduce_outputs = [];
- foreach (i in range(0, num_reducers)) {
- reducer_inputs = []
- foreach (j in range(0, len(input_partitions))) {
- reducer_inputs[j] = map_outputs[j][i];
- }
- reduce_outputs[i] = run_java_class(reduce_class, reducer_inputs, 1);
- }
- return reduce_outputs;
-
-}
-
-Power iteration (PageRank):
----------------------------
-
-inputs = [file(pc1, /tmp/input1), file(pc2, /tmp/input2), file(pc3, /tmp/input3), ...];
-curr_vector = file(pc1, /tmp/vector);
-
-do {
- result_parts = [];
- foreach (i in range(len(inputs))) {
- result_parts[i] = run_c_program("matrix_vector", inputs[i], curr_vector);
- }
- new_vector = combine(result_parts);
- converged = compare(curr_vector, new_vector, EPSILON);
-} while (!*converged);
-
-Halo swapping:
---------------
-
-// m x n array of inputs
-m = M;
-n = N;
-curr_state = [ [ cell_1_1, cell_1_2, cell_1_3, ... ], [ cell_2_1, ... ], [ cell_3_1, ... ], ... ];
-
@@ -1,69 +0,0 @@
-// MapReduce
-
-num_reduces = 10;
-
-map_inputs = [http('http://www.mrry.co.uk/input00'),
- http('http://www.mrry.co.uk/input01'),
- http('http://www.mrry.co.uk/input02'),
- ...,
- http('http://www.mrry.co.uk/input80')];
-
-function map_reduce(map_inputs, num_reduces, map, reduce) {
-
- map_outputs = [map(m, num_reduces) for m in map_inputs];
- // or
- map_outputs = []
- for i in range(len(map_inputs))
- map_outputs[i] = map(map_inputs[i], num_reduces)
-
- reduce_inputs = [[map_outputs[i][j] for j in len(map_outputs[i])] for i in len(map_inputs)];
- // or
- reduce_inputs = [];
- for i in range(num_reduces) {
- reduce_inputs[i] = [];
- for j in range(len(map_inputs))
- reduce_inputs[i][j] = map_outputs[j][i];
- }
-
- reduce_outputs = [reduce(r) for r in reduce_inputs];
- // or
- reduce_outputs = [];
- for r in reduce_inputs
- reduce_outputs[i] = reduce(r);
-
-}
-
-// Halo-swap
-
-grid_x = 10;
-grid_y = 10;
-
-input_data = file('/home/initial_climate_settings');
-
-init_data = partition_data(input_data, grid_x, grid_y);
-
-next_iter = []
-n_halo = []
-s_halo = []
-e_halo = []
-w_halo = []
-
-for i in range(grid_x) {
- curr_iter[i] = [];
- for j in range(grid_y) {
- curr_iter[i][j] = compute_first(init_data[i][j]);
- }
-}
-
-do {
- next_iter = [];
- for i in range(grid_x) {
- next_iter[i] = []
- for j in range(grid_y) {
- next_iter[i][j] = compute(curr_iter[i][j].data,
- curr_iter[i_left][j].e_halo,
- ...);
- }
- }
- should_term = summarise(next_iter);
-} while (!*should_term);
View
@@ -1,60 +0,0 @@
-
-Ciel now comes equipped with a primitive distributed filesystem to make the process of starting jobs that depend on multiple files easier.
-
-scripts/sw-start-job is a helper script that populates the DFS namespace at job startup time.
-
-Its input is a filename which should contain JSON data describing the desired namespace as well as job startup parameters. Its format is as follows:
-
-{"package":
- {
- "name1": <filespec>,
- "name2": <filespec>
- },
- "start":
- {
- "handler": <handler_name>,
- "args": <args_dict>
- }
-}
-
-Where <filespec> has the form: {"filename": <filename>, ["index": <boolean>]}. The square brackets around 'index' indicate it is optional, not that it is a list.
-
-If "index" is not specified or is false, the file will be synchronously uploaded to the cluster master and a reference associated with the corresponding name.
-If it is true, the file will be uploaded in chunks to the cluster's workers, an index created and a reference to that index returned. This is identical to the action the sw-load script.
-
-At present this functionality is quite limited, but I plan to add more possible sources of files and to permit more fine-grained control of sw-load.
-
-Looking up references against the DFS namespace varies from language to language, but to use Skywriting as an example, 'x = package("name1");' would retrieve either the reference or index-reference corresponding to that name.
-
-The <handler_name> and <args_dict> parameters specify how the first task in the new job should be started. They are passed verbatim to the "init" executor, which will then spawn a task with the given handler and arguments. The required keys in args_dict vary from handler to handler, but again to give a Skywriting example, "handler": "swi", "args": {"sw_file_ref": <ref>}, where <ref> is a reference to a Skywriting file, would start the job by running the given file with no arguments or environment.
-
-PACKAGE REFERENCES
-
-It is frequently necessary to refer to references in the DFS namespace when passing the initial task args_dict. This can be done using the notation {"__package__": "name"}. To give a complete example, a very simple Skywriting script could be uploaded and run using:
-
-{"package": {"swmain": "~/test.sw"}, "start": {"handler": "swi", "args": {"sw_file_ref": {"__package__": "swmain"}}}}
-
-ARGUMENTS AND ENVIRONMENT VARIABLES
-
-sw-start-job supports simple customisation of package descriptors: the constructs:
-
-{"__env__": "<Unix environment variable name>", ["default": <default value>]} and {"__args__": <integer>, ["default": <default value>]}
-
-can be used in both package and start dictionaries.
-
-SIMPLER JOBS
-
-This form is rather onerous to describe very simple jobs with no need for the distributed filesystem. Simple driver programs are supplied to run one-file Skywriting and SkyPy jobs: sw-job and skypy-job respectively. Both are simple wrapper programs that generate and run trivial package descriptors.
-
-MORE COMPLEX JOBS
-
-Naturally this form can't express arbitrarily complicated jobs; it's really just some simple macros to make it easy to write down short descriptions of moderately involved jobs. For more complicated tasks you should write a Python program that uses the innards of sw-start-job. For example, the Skywriting example above could be executed using:
-
-from skywriting.runtime.util.start_job import submit_job_with_package
-
-submit_job_with_package({"swmain": "~/test.sw"}, "swi", {"sw_file_ref": {"__package__": "swmain}}, <path-root>, <master-url>)
-
-where path-root is the root directory against which to resolve any relative names in the package dictionary and master-url is an absolute HTTP URL corresponding to a Ciel master node.
-
-That package also contains sub-functions which may be useful if you need to pull the process apart further. TODO: factor it better and document.
-
Oops, something went wrong.

0 comments on commit e8dce28

Please sign in to comment.