Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

es.port is not always used when set to non default port #138

Closed
nahap opened this Issue Feb 11, 2014 · 2 comments

Comments

Projects
None yet
2 participants
@nahap
Copy link

commented Feb 11, 2014

Im encountering strange behaviour when I write into elasticsearch from hive using the current master branch of es-hadoop.

I have one es-server running on the default port 9200, and another on port 9800.
When I define the hive external Table like this:

CREATE EXTERNAL TABLE andy.test_with_es_port (
    id STRING,
    virt_cat BIGINT)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'testing/reco',
              'es.nodes'='localhost',
              'es.port'='9800',
              'es.nodes.discovery' = 'false');

INSERT OVERWRITE TABLE andy.test_with_es_port
SELECT 
    id,
    virt_cat
from andy.test_data limit 10;

Then the following happens.

  1. The 'testing' index is created on both(!) servers.
  2. The 'testing' index on the default port contains data. (it shouldnt be here at all)
  3. The 'testing' index on port 9800 is empty (this is where i expected the data to land in).

If I do another test and not use es.port, but instead define the port in the es.nodes string:

CREATE EXTERNAL TABLE andy.test_with_port_string (
    id STRING,
    virt_cat BIGINT,
    keyword STRING,
    recommended_products ARRAY<BIGINT>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'testing/reco',
              'es.nodes'='localhost:9800',
              'es.nodes.discovery' = 'false');

INSERT OVERWRITE TABLE andy.test_with_port_string
SELECT 
    id,
    virt_cat
from andy.test_data limit 10;

Then es-hadoop works as expected, writing only to the server on port 9800 (with data)
and not to the server on the default port.

costin added a commit that referenced this issue Feb 18, 2014

@costin

This comment has been minimized.

Copy link
Member

commented Feb 18, 2014

I tried reproducing your issue but I couldn't; I'm not sure whether that's because everything is on localhost or not.
Can you please enable logging on the mr (org.elasticsearch.hadoop.mr) and network (org.elasticsearch.hadoop.rest) package and post somewhere the logs?
After that can you please try the master - I've added a small fix which might address your issue (but since I couldn't reproduce it, I can't tell for sure)?

Thanks!

@costin costin added bug labels Feb 20, 2014

@costin

This comment has been minimized.

Copy link
Member

commented Apr 8, 2014

I'm closing this for now since it looks like it is fixed in master. If that is not the case, please reopen it.

@costin costin closed this Apr 8, 2014

costin added a commit that referenced this issue Apr 8, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.