Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EsTap is not working #162

Closed
mbaig opened this Issue Mar 5, 2014 · 24 comments

Comments

Projects
None yet
2 participants
@mbaig
Copy link

mbaig commented Mar 5, 2014

Please note EsTap was/is working as expected in 1.3.0.M2, however, it seems to be broken in the last ~2 weeks of nightly builds. Also note, our usage pattern or code did not change between the release of 1.3.0.M2 and today (2014-03-05).

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 5, 2014

Can you expand on that - what exception do you encounter?

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 5, 2014

Hello Costin: thanks for the quick response! Ok, so the exception I'm seeing is below, I pared down everything to keep things simple for debug purposes, but, basically I'm creating an EsTap to an index/type together with an array of fields of interest, then simply outputting all data from the tap to stdout, no queries to complicate matters, no "es.query" config param either since this also doesn't seem to be working any longer. Note, if I use 1.3.0.M2, everything works as expected, but, not so with the snapshots.
14/03/05 15:19:11 ERROR stream.TrapHandler: caught Throwable, no trap available, rethrowing
cascading.tuple.TupleException: unable to read from input identifier: 'unknown'
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:127)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.IllegalStateException: Cannot build scroll [adevents-2014-02-12/click/_search?search_type=scan&scroll=5&size=50&_source=ri,bot_act,psn,sts,ptz,pv_lo,cid&preference=_shards:1;_only_node:s6gZjGaBT6KFEpn23r5vgA]
at org.elasticsearch.hadoop.rest.QueryBuilder.build(QueryBuilder.java:201)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:286)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
at org.elasticsearch.hadoop.cascading.EsHadoopScheme.source(EsHadoopScheme.java:154)
at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:140)
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:120)
... 6 more
Caused by: java.io.IOException: Out of nodes and retries; caught exception
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:98)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:250)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:246)
at org.elasticsearch.hadoop.rest.RestClient.scan(RestClient.java:274)
at org.elasticsearch.hadoop.rest.RestRepository.scan(RestRepository.java:97)
at org.elasticsearch.hadoop.rest.QueryBuilder.build(QueryBuilder.java:199)
... 13 more

@mbaig mbaig closed this Mar 5, 2014

@mbaig mbaig reopened this Mar 5, 2014

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 5, 2014

Sorry, hit Close by accident, please ignore.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 5, 2014

The error indicates a network error, that is es-hadoop cannot connect to your host.
I have pushed a new nightly build (20140305.224939-329) can you please try it out and let me know how it goes.
It seems that es-hadoop does connect to ES initially but then it starts losing the connection for some reason..

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 5, 2014

Just tried that nightly, sorry it didn't work, same exception stack(s).
Btw, es-hadoop is able to connect to our ES host if I use 1.3.0.M2, so I think we can rule out poor connectivity issues for us, although, there may still be other programmatic connectivity issues in the es-hadoop client.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 5, 2014

Can you turn on logging (TRACE level) in log4j.properties on org.elasticsearch.hadoop package and report back your findings (upload to a gist somewhere). Ideally try it on a small data set since there will be a lot of output.

Thanks!

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 6, 2014

Hi Costin: here is the trace level log output you requested (see gist below), I had to redact some parts of the logs, so if you see something like, Received [200-OK] [], the empty [] actually was populated correctly, hope you understand. Thanks again for all your help with this and the great work you guys are doing with ES in general!
https://gist.github.com/mbaig/9397119

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 6, 2014

Hi,

There are several things suspicious in the logs. There's the network error but there's also the NoSuchMethodError at the
end (this one caused by some incompatible library).

There were several improvements made to cascading so I've pushed a nightly build [1] - can you please check it out once
it completes.
Then if possible, please update the gist of the current build and M2 - I've checked the differences between the two but
nothing stands out.

Are you available on IRC? This would make things a lot easier to debug - I'm 'costin' on #elasticsearch. Let's connect
in 30' or so if that works for you.

Thanks!

On 3/6/2014 9:19 PM, mbaig wrote:

Hi Costin: here is the trace level log output you requested (see gist below), I had to redact some parts of the logs, so
if you see something like, Received [200-OK] [], the empty [] actually was populated correctly, hope you understand.
Thanks again for all your help with this and the great work you guys are doing with ES in general!
https://gist.github.com/mbaig/9397119


Reply to this email directly or view it on GitHub
#162 (comment).

Costin

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 6, 2014

By the way, the #221 build has been published.

On 3/6/2014 9:19 PM, mbaig wrote:

Hi Costin: here is the trace level log output you requested (see gist below), I had to redact some parts of the logs, so
if you see something like, Received [200-OK] [], the empty [] actually was populated correctly, hope you understand.
Thanks again for all your help with this and the great work you guys are doing with ES in general!
https://gist.github.com/mbaig/9397119


Reply to this email directly or view it on GitHub
#162 (comment).

Costin

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 6, 2014

Costin, is that build #331 or #221? Also, I'll be available on IRC, 'mbaig'.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 6, 2014

The id between the build plan and maven are not synchronized. #331 is the Maven number, #221 the number of the build plan.
Basically, try the latest available snapshot.

On 3/6/2014 11:03 PM, mbaig wrote:

Costin, is that build #331 or #221? Also, I'll be available on IRC, 'mbaig'.


Reply to this email directly or view it on GitHub
#162 (comment).

Costin

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 6, 2014

As for IRC, give me a ping once you get online - I'll be available for the next 1.5h or so.

Cheers!

On 3/6/2014 11:03 PM, mbaig wrote:

Costin, is that build #331 or #221? Also, I'll be available on IRC, 'mbaig'.


Reply to this email directly or view it on GitHub
#162 (comment).

Costin

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 6, 2014

Hi,

I've pushed some changes on a new branch - cfg-refactor.
Can you try it out? You can easily build the package using: gradlew -x test build. Still interested in the logs on M2.

Cheers!

On 3/6/2014 11:03 PM, mbaig wrote:

Costin, is that build #331 or #221? Also, I'll be available on IRC, 'mbaig'.


Reply to this email directly or view it on GitHub
#162 (comment).

Costin

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 7, 2014

Hi,

Can you please try the latest build #333 ? Also please post the update logs just in case.

Thanks,

On 3/6/2014 11:03 PM, mbaig wrote:

Costin, is that build #331 or #221? Also, I'll be available on IRC, 'mbaig'.


Reply to this email directly or view it on GitHub
#162 (comment).

Costin

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 7, 2014

Costin: not sure what you changed, but, it looks like build #333 is working. That is, reading from ES looks good. Haven't tried writing, will do that next.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 7, 2014

That's good to know. Getting some logs between M2 and current master would still be useful - we can chat on IRC more if you'd like. thanks!

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 7, 2014

Yeah, I'll definitely get you those logs. I was trying to filter the dataset for the logs using the es.query job config param, which incidentally didn't work, however, passing the filter query to the EsTap constructor did work, so that should get me over that obstacle. I'm going to deploy #333 now to our cluster for larger dataset test, fingers crossed.

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 7, 2014

I'm deploying the new jar to our cluster, but, I just realized we upgraded our ES to 1.0.1 (successfully) last night. Will this be a problem for the es-hadoop client?

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 7, 2014

It's not a problem. es-hadoop since M2 supports both ES 1.0 and 9.x

On 3/8/2014 12:54 AM, mbaig wrote:

I'm deploying the new jar to our cluster, but, I just realized we upgraded our ES to 1.0.1 (successfully) last night.
Will this be a problem for the es-hadoop client?


Reply to this email directly or view it on GitHub
#162 (comment).

Costin

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 7, 2014

The Configuration option should work in master just like on M2.
Along with the logs, can you please post a simple code snippet that reproduces the issue.

Thanks!

On 3/8/2014 12:39 AM, mbaig wrote:

Yeah, I'll definitely get you those logs. I was trying to filter the dataset for the logs using the es.query job config
param, which incidentally didn't work, however, passing the filter query to the EsTap constructor did work, so that
should get me over that obstacle. I'm going to deploy #333 now to our cluster for larger dataset test, fingers crossed.


Reply to this email directly or view it on GitHub
#162 (comment).

Costin

@costin costin added bug labels Mar 10, 2014

@mbaig

This comment has been minimized.

Copy link
Author

mbaig commented Mar 11, 2014

Hey: sorry, didn't mean to disappear on you like that, I was busy firefighting issues with our ES upgrade to 1.0.1. One of the issues I ran into was that es-hadoop nightly stopped working again, albeit due to a different issue. Let me know if I should open another issue for it, meantime I'll try to describe it here.
Our hadoop cluster connects to our prod ES cluster via ssh tunnels on localhost:9200. It seems the es-hadoop client instead of connecting to localhost, connects instead to the resolved ip of the ES shards (more accurately it connects to localhost the 1st time but switches to ips for all subsequent calls). Since those ips aren't visible I'm seeing ConnectExceptions, see here for logs https://gist.github.com/mbaig/9397119
I tried setting the config param "es.nodes.discovery" to false to force the client to only use localhost, but, this doesn't seem to be doing what I hoped.
Btw, I'm using the shapshot of es-hadoop for this.

Thanks again Costin. Oh and I still owe you those M2 logs...

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 12, 2014

@mbaig Best to open another issue. Tunnelling is not supported by es-hadoop and I'm not sure whether it will ever be. Without a direct network connection, the parallel read/write don't make sense since there's no direct connection to each shard and thus the performance goes down the drain.
I'll look into this but it's not a priority at the moment - using a VPN is probably a better solution long term since it hides the tunnelling much better than an actual tunnel.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 12, 2014

@mbaig Those M2 logs would still be nice ....

@costin costin closed this Mar 12, 2014

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 12, 2014

By the way, you could try setting up the JDK property for proxies, in particular the SOCKS one:
http://docs.oracle.com/javase/7/docs/api/java/net/doc-files/net-properties.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.