Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API support #116

Closed
wants to merge 3 commits into from
Closed

Update API support #116

wants to merge 3 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 21, 2015

This PR addresses issue #16 and #114

Add the ability to update existing ES document and support of upsert (if document doesn't exists, create it)

Configuration changes :

  • add "update" to action parameter
  • new parameter "doc_as_upsert"=> boolean : if document doesn't exists, use event as source for creating it
  • new parameter "upsert" => json_string: unused if "doc_as_upsert" is true, give control on the source for creating document if it doesn't exists. (support %{[field]} resolution)

Implementation questions/notes:

  • Adding retry on 409 http error code (version conflict) may introduce unwanted behaviors on other action (or not) ?
  • I am not convinced about json_string as "upsert" parameter format (need to serialize/deserialize json with use of event.sprintf), but give full control ...

Example with http protocol:

test.conf

input {
  file {
    path => "/logs/test.log"
    codec => "json"
    start_position => "beginning"
  }
}

output {
  if [use_case] == "doc_upsert" {
    elasticsearch {
      host => "elasticsearch"
      protocol => "http"
      action => "update"
      document_id => "%{[uid]}"
      doc_as_upsert => true
    }    
  } else if [use_case] == "doc_static_upsert" {
    elasticsearch {
      host => "elasticsearch"
      protocol => "http"
      action => "update"
      document_id => "%{[uid]}"
      upsert => '{"static_field": "demo"}'
    }        
  } else if [use_case] == "doc_dynamic_upsert" {
    elasticsearch {
      host => "elasticsearch"
      protocol => "http"
      action => "update"
      document_id => "%{[uid]}"
      upsert => '{"use_case": "%{[use_case]}", "dynamic": { "fieldC": "%{[dynamic_field][fieldC]}"}}'
    }        
  } else {
    elasticsearch {
      host => "elasticsearch"
      protocol => "http"
    }    
  }
}

test.log

{ "use_case": "doc_upsert", "uid": "1", "fieldA": "abcd" }
{ "use_case": "doc_upsert", "uid": "1", "fieldB": "cdef" }
{ "use_case": "doc_static_upsert", "uid": "2", "fieldA": "abcd" }
{ "use_case": "doc_static_upsert", "uid": "2", "fieldB": "cdef" }
{ "use_case": "doc_dynamic_upsert", "uid": "3", "fieldA": "abcd", "dynamic_field": {"fieldC": "ghij"} }
{ "use_case": "placebo"}

@talevy
Copy link
Contributor

talevy commented Apr 30, 2015

This is looking great!

would you mind adding some rspec tests for this?

@ghost
Copy link
Author

ghost commented Apr 30, 2015

I forgot "routing" in node/transport protocol. Now it's added !

I'm not comfortable with rspec, I need time, but it's in progress :-)

@ghost ghost mentioned this pull request Jun 10, 2015
@ghost
Copy link
Author

ghost commented Jun 18, 2015

Tests added : with and without upsert for all protocols.

@talevy
Copy link
Contributor

talevy commented Jun 18, 2015

@Da-wei This is awesome! thanks!

Unfortunately, we just updated the structure of the tests and broke them out into separate files. Do you mind rebasing off master? I suggest possible doing a reset of the tests you added, and then manually writing them into the appropriate files in the new structure. This way, it should be rather straightforward to update.

let me know if you need any help!

@ghost
Copy link
Author

ghost commented Jun 22, 2015

@talevy, after playing with rebase, tests was added to the new structure. Thanks for the docker feature, it's very appreciable.

@talevy
Copy link
Contributor

talevy commented Jun 22, 2015

@Da-wei thanks for updating your pr structure and rebasing! I'm sure it wasn't easy 👍

how did you find running the tests? There seems to be some compatibility issues across docker versions, so if you can share your experience that would be awesome!

I'll check out the tests now!

@ghost
Copy link
Author

ghost commented Jun 22, 2015

I use docker version 1.6.0. and a container made of java oracle 8 and elasticsearch 1.6.0.

My first attempt to run test failed because i haven't any elasticsearch:1.6 image in my local registry, I think it's missing a check or an automatic pull from public registry.
I was disappointed by some tests fails randomly or at the very beginning. I see two possible reasons:

  • the first is already known @Tavely: the TODO about the static timeout waiting elasticsearch startup, sleep 10 not always sufficient for my poor machine
  • the second is probably the lack of forced indices refresh in the"before" clause of some tests (in transport_create_spec.rb by example)

But definitively, its a time saver !

(just a question : is it normal that rspec never terminate. I always need to do Ctrl+C although results are successful)

@talevy
Copy link
Contributor

talevy commented Jun 23, 2015

ah. I will update the README to reflect this. docker pull elasticsearch:1.6 is a prerequisite.

static timeout is definitely an annoyance at this time. that should get updated soon.

there is some termination problems when running the node_spec.rb within bundler that does not terminate. and Ctrl+C is necessary. I will look into it.

Thanks!

@reneklacan
Copy link

Hello @Da-wei and @talevy, this looks great, any plan to merge this?

@errm
Copy link

errm commented Aug 7, 2015

Would love for this to get merged, @Da-wei @talevy is there anything I could do to help?

@talevy
Copy link
Contributor

talevy commented Aug 7, 2015

@errm. @Da-wei I think this branch looks pretty good, just need to rebase master and resolve the conflicts, and we can take a look at it again real quick and merge it in thereafter.

@ghost
Copy link
Author

ghost commented Aug 7, 2015

Hi all, rebase is done !
@talevy, PR is ready for review.

@talevy
Copy link
Contributor

talevy commented Aug 10, 2015

awesome @Da-wei. I will check this out now and run the tests and update you!

@suyograo
Copy link
Contributor

@talevy I am +1 on merging this, please check with @andrewvc re: refactoring work

@untergeek
Copy link
Contributor

lgtm!

@elasticsearch-bot
Copy link

Merged sucessfully into master!

jordansissel pushed a commit that referenced this pull request Aug 11, 2015
jordansissel pushed a commit that referenced this pull request Aug 11, 2015
@talevy
Copy link
Contributor

talevy commented Aug 11, 2015

@reneklacan @Da-wei @errm This has made it into a new version 1.0.7 release of this plugin!

andrewvc pushed a commit that referenced this pull request Aug 20, 2015
andrewvc pushed a commit that referenced this pull request Aug 20, 2015
andrewvc pushed a commit that referenced this pull request Aug 20, 2015
@wallflower
Copy link

I don't know if this is the right place for this, but I am trying this feature and it does not work as I expected (but maybe my expectation is wrong). Here is what I am doing:

  1. I created a mapping file for my elasticsearch index. (Let's say it has 10 fields.). I do a post against the index to create the mapping:
{ "mappings": {
        "well": {
            "_all": {
                "enabled": true
            },
            "properties": {
                        ... <10 fields detailed in here> ...
                       }
              }
       }
}
  1. I index my documents with logstash (coming from a jdbc input). I am planning on having a couple of these running, where config1 brings in 5 fields, config2 brings in 3 fields, and config3 brings in 2 fields. I know the documentId for all the config files. But I am needing to do seperate updates due to random updates for those different tables. Basically Elasticsearch is my coordinated document.

I am using the following elasticsearch conf for the three different imports:

elasticsearch {
        host => "localhost"
        protocol => "http"
        index => "wells"
        action => "update"
        document_type => "well"
        document_id =>  "%{[master_well_interval_id]}"
        doc_as_upsert => true
        manage_template => false
    }

Now, what happens is, when the document gets updated, by config1, the mapping gets changed to only having those 5 fields. The other five get removed.

If I remove the action => "update" part from the config, then the mapping stays, but the document gets overritten by each config that runs...

So, is this working as advertised? I would expect my configuration to keep my mapping, but that is not what I am experiencing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants