Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing `es.resource` for writing during mapping phase #181

Closed
whitfin opened this Issue Mar 28, 2014 · 14 comments

Comments

Projects
None yet
2 participants
@whitfin
Copy link

whitfin commented Mar 28, 2014

I'm trying to migrate JSON to different indexes from hadoop, but the structure I want depends on some of the JSON fields.

Is there a way to change the resource we're writing to during the mapping phase?

@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String index = // some index calculated by value;
    conf.set("es.resource", "/" + index + "/type");
    context.write(key, value);
}

The above change to the conf was just ignored, and the one specified in the main() was used. If I don't specify one in the main, it reports that none was ever set.

Not sure if there are limitations around this, or if there's a workaround. But any advice is appreciated.

If this feature doesn't exist, it'd be cool to see added to the lib if possible.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 28, 2014

It's available in master but not documented yet. You can specify the field based on the entry being read - if you generate the index/type using a different strategy then you are best creating your own OutputFormat that handles it.

Assuming you are using index/type based on the data read, when using Map/Reduce (your use-case) you can set es.resource to something like:
es.resource=this-is-{my}/special-{pattern}
and my and pattern will be resolved from the MapWritable passed to the OF.

@whitfin

This comment has been minimized.

Copy link
Author

whitfin commented Mar 28, 2014

@costin My use case is more along the lines of having to handle during the execution, here is an example:

I have 3 JSON objects:

{
    "media_type":"film",
    "title":"Harry Potter",
    "year":"2013"
}

{
    "media_type":"book",
    "title":"Harry Potter",
    "year":"2010"
}

{
    "media_type":"music",
    "title":"Don't Stop Believin'",
    "year":"1981"
}

They all represent a line in a file (the same file) being fed line-by-line into my Mapper. I want to change my index like this:

conf.set("es.resource", "/" + object.get("media_type") + "/store");

Meaning that the objects are processed as follows:

object1 => /film/store
object2 => /book/store
object3 => /music/store

Is this not possible?

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 28, 2014

As long as the pattern is represented through a field in your JSON/Map it is possible out of the box:
es.resource={media_type}/store

Remember that es-hadoop can extract data from raw json as well - you don't have to do it yourself. In this case, that is you're using JSON instead of a Map, it can automatically pick up the "media_type".

@whitfin

This comment has been minimized.

Copy link
Author

whitfin commented Mar 28, 2014

@costin So you'd just go ahead and set as conf.set("es.resource", "/{media_type}/store")?

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 28, 2014

Yes, that's exactly what I've been suggesting in my previous comments.

On 3/29/14 12:46 AM, Isaac Whitfield wrote:

@costin https://github.com/costin So you'd just go ahead and set as |conf.set("es.resource", "/{media_type}/store")|?


Reply to this email directly or view it on GitHub
#181 (comment).

Costin

@whitfin

This comment has been minimized.

Copy link
Author

whitfin commented Mar 28, 2014

@costin I'm seeing the following:

Caused by: org.apache.commons.httpclient.URIException: escaped absolute path not valid

with the latest snapshot, which is after the commit for this (I think).

Same issue with build from master.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 29, 2014

@IWhitfield Maybe you hit a bug or your configuration is incorrect. I can't infer much just one line. As I've mentioned before, post a sample code, the logs and the full stacktrace somewhere (but not here) such as in a gist.

@whitfin

This comment has been minimized.

Copy link
Author

whitfin commented Mar 29, 2014

@costin I've created a gist here, hopefully it explains better: https://gist.github.com/iwhitfield/5349684ac6656e11171d

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 30, 2014

Thanks - that helps.

Can you remove the leading "/" from es.resource, change it to:
conf.set("es.resource", "{media_type}/store");

and see whether that helps?

Thanks,

On 3/29/14 11:17 PM, Isaac Whitfield wrote:

@costin https://github.com/costin I've created a gist here, hopefully it explains better:
https://gist.github.com/iwhitfield/5349684ac6656e11171d


Reply to this email directly or view it on GitHub
#181 (comment).

Costin

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 30, 2014

Actually it looks like you are using the wrong es-hadoop version - double check that you are actually using the latest
BUILD-SNAPSHOT since your line numbers do not match.
My guess is that you're still using M2 which doesn't understand index patterns.

On 3/29/14 11:17 PM, Isaac Whitfield wrote:

@costin https://github.com/costin I've created a gist here, hopefully it explains better:
https://gist.github.com/iwhitfield/5349684ac6656e11171d


Reply to this email directly or view it on GitHub
#181 (comment).

Costin

@whitfin

This comment has been minimized.

Copy link
Author

whitfin commented Mar 30, 2014

@costin Looks like Maven was still using the older version, I've changed to the snapshot.

I still saw an IllegalArgumentException with the pattern, so trying without the leading /.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 30, 2014

The stracktrace you posted indicates an older code base - see EsOutputFormat#init (line 177) which in your stacktrace
invokes touch while master [1]
points to something else. The latest published build (you can check it in Maven) is
elasticsearch-hadoop-1.3.0.BUILD-20140329.171321-361.jar - make sure that is the one you have.
Additionally you can build the sources yourself - see the readme.

I'm trying to help but between lack of info, cryptic error messages, incorrect library versions and repetition just to
get some basic messages across this goes nowhere and I don't think either of us is making any progress.

[1]
https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/src/main/java/org/elasticsearch/hadoop/mr/EsOutputFormat.java#L177

On 3/30/14 2:30 AM, Isaac Whitfield wrote:

@costin https://github.com/costin It looks like I have the latest snapshot, since this is the dependency listed in my
pom.xml:

|
org.elasticsearch
elasticsearch-hadoop
1.3.0.BUILD-SNAPSHOT

|


Reply to this email directly or view it on GitHub
#181 (comment).

Costin

@whitfin

This comment has been minimized.

Copy link
Author

whitfin commented Mar 30, 2014

@costin You were right about the leading "/". It looks as though it's working with {media_type}/store.

Although, there is still a thrown exception even though it's working: https://gist.github.com/iwhitfield/5e4c511374fe15f0588c

@whitfin

This comment has been minimized.

Copy link
Author

whitfin commented Mar 30, 2014

I'll close this, seeing as the feature is implemented and functionally sound. I'll leave it to the author to decide what to do about the above error being thrown incorrectly, as that is a separate issue.

Thanks for everything @costin!

@whitfin whitfin closed this Mar 30, 2014

costin added a commit that referenced this issue Mar 30, 2014

@costin costin added rest labels Mar 30, 2014

costin added a commit that referenced this issue Apr 8, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.