Browse files

notes on pigstorefunc, need to fix outputformat so the options arent …

…namespaced with "wonderdog" in any way
  • Loading branch information...
thedatachef committed Jan 24, 2011
1 parent 302f2e4 commit b25d04c17ac839fe99877fa9563cde9a275eddbd
Showing with 46 additions and 0 deletions.
  1. +45 −0 notes/pigstorefunc.pig
  2. +1 −0 src/java/com/infochimps/elasticsearch/
@@ -0,0 +1,45 @@
+-- Doesn't work at the moment, just some notes on how the storefunc might look.
+-- Right now the ElasticSearchOutputFormat gets all its options from the
+-- Job object. We can use the call to setStoreLocation in the storefunc
+-- to set the required parameters. Need to make sure the following are
+-- set:
+-- - should be set by the storefunc constructor
+-- wonderdog.bulk.size - should be set by the storefunc constructor
+-- wonderdog.field.names - should be set by the call to checkSchema
+-- - should be set by the storefunc constructor
+-- wonderdog.object.type - should be set by the storefunc constructor
+-- wonderdog.plugins.dir - should be set by call to setStoreLocation
+-- wonderdog.config - should be set by call to setStoreLocation
+-- FIXME: options used in the ElasticSearchOutputFormat should NOT be
+-- namespaced with 'wonderdog'
+%default INDEX 'es_index'
+%default OBJ 'text_obj'
+records = LOAD '$DATA' AS (text_field:chararray);
+records_with_id = LOAD '$IDDATA' AS (id_field:int, text_field:chararray);
+-- Here we would use the elasticsearch index name as the uri, pass in a
+-- comma separated list of field names as the first arg, the id field
+-- as the second arg and the bulk size as the third.
+-- and so on.
+STORE records INTO '$INDEX/$OBJ' USING ElasticSearchStorage('my_text_field', '-1', '1000');
+-- but it would be really nice to duplicate what's in in that,
+-- should a bulk request fail, the failed records are written to hdfs. The
+-- user should have some control of this. Also, it should be possible to generate
+-- the field names directly from the pig schema? (We'd have to be VERY explicit in the
+-- docs about this as it would be a point of headscratching/swearing...) In this
+-- case we might have something like:
+named_records = FOREACH records GENERATE text_field AS text_field_name;
+STORE records INTO '/path/to/failed_requests' USING ElasticSearchStorage('$INDEX/$OBJ', '-1', '1000');
@@ -130,6 +130,7 @@ private void start_embedded_client() {
return new ElasticSearchRecordWriter();
+ // FIXME: options used here should NOT be namespaced with 'wonderdog'
public void setConf(Configuration conf) {
this.indexName = conf.get("");
this.bulkSize = Integer.parseInt(conf.get("wonderdog.bulk.size"));

0 comments on commit b25d04c

Please sign in to comment.