Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API: Detect noop updates when using doc #6862

Closed
wants to merge 7 commits into from

Conversation

Projects
None yet
3 participants
@nik9000
Copy link
Contributor

nik9000 commented Jul 14, 2014

This allows you to request that Elasticsearch figure out if an update using a document is a noop and then skip it. This is useful for clients where it is difficult to figure out if the request is really a noop.

Also adds two tolerance measures that can be used to deem a request a noop even if it isn't quite. The idea being that it isn't that important to know that a page has 100 vs 101 links - so you may as well wait until the field is out of date by more then, say, 10%.

Close #6822

nik9000 added some commits Jul 14, 2014

Detect noop updates sent with doc_as_upsert
This should help prevent spurious updates that just cause extra writing
and cache invalidation for no real reason.

Close #6822
Add tolerance to noop detection
Just numeric tolerances - percentages and absolute numbers.  Other things
are certainly possible from here.
@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 14, 2014

I need to add docs and hit it over REST a few more times to make sure it works, but so far so good.

@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 14, 2014

Might also be good to see what the performance cost is but I don't imagine its huge. Next to 0 if you disable it.

@nik9000 nik9000 changed the title Detect noop updates from doc_as_upsert Detect noop updates when using doc in update API Jul 14, 2014

@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 14, 2014

Tried a quick and dirty performance test

echo '{
    "doc": {"foo": 2},
    "doc_as_upsert": true,
    "detect_noop": {
        "foo": "10%"
    }
}' > /tmp/test2
ab -n 1000 -c 20 -p /tmp/test2 http://localhost:9200/test/test/1/_update

Request dropped from 11ms to 8ms. Cut but not a big deal - I'm more interested in creating less garbage that has to be merged out later.

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 15, 2014

I understand the point of not scheduling future merges if they are not necessary, but I'm not very happy with the part about tolerances. Eg if you have one update that increments a counter by 10 it could be rejected while two updates that would increment the same counter by 5 would be accepted. I think that can be confusing and should rather be dealt with from client side.

@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 15, 2014

I understand the point of not scheduling future merges if they are not necessary, but I'm not very happy with the part about tolerances. Eg if you have one update that increments a counter by 10 it could be rejected while two updates that would increment the same counter by 5 would be accepted. I think that can be confusing and should rather be dealt with from client side.

I could certainly document that adding tolerances to counters is unlikely to be a good idea. In my use case I don't have counters but instead recount on the fly. In that case the tolerances make this more interesting by allowing me to trade off a tiny bit of accuracy for performance.

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 16, 2014

I'd still rather have it implemented on client side, where you have all the flexibility to define what you want to use as a distance between documents to know whether an update is worth it. I'm worried about making the API complicated for something that doesn't bring much value.

@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 16, 2014

Is it still worth having the noop detection without the tolerances? I'm happy to roll them back. I can't use the noop detection without the tolerances but if its worth having its worth having.

I imagine I'll give scripted updates another shot in 1.3 when I can use groovy. They'll probably be much less brittle then MVEL was.

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 16, 2014

Detecting no-ops without tolerances sounds ok to me, but I'm wondering how common it is to submit a no-op update.

I was thinking about making it an automatic optimization rather than an option, but it would sometimes not work as expected. For example, it is allowed to update mappings to add a new multi field. So even an update that would re-submit the same document could result in a different inverted index. So if we decide to add such detection, I think we would need to make it an opt-in to avoid surprises.

My current feeling is that it has a high cost (in terms of development/maintenance) compared to the potential speedup but if no-op updates prove to be common I could change my opinion. @clintongormley What do you think?

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 17, 2014

@jpountz i agree with you about not liking the tolerances, although what would need to be done to make this work in the application is to retrieve the document, check the tolerances, and then decide whether to reindex or not.

I do hear of people wanting noop updates - I don't think it is a corner case, so I'd be inclined to accept the PR without tolerances, but keeping noop as an option for the reasons you cite. Another reason to make it opt-in: the user updates a synonyms file.

nik9000 added some commits Jul 17, 2014

@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 17, 2014

Removed tolerances. I believe one side effect of the way the PR is implemented now is that updates that are literally empty objects ({}) are considered noops even with detection off. I imagine I can fix that pretty easily if its a problem.

Object old = source.get(changesEntry.getKey());
if (old instanceof Map && changesEntry.getValue() instanceof Map) {
// recursive merge maps
modified = update((Map<String, Object>) source.get(changesEntry.getKey()),

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 17, 2014

Contributor

Should it be |= instead of =? Otherwise the last one would always win?

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 17, 2014

Author Contributor

Yup - I'll fix it.

}
modified = !old.equals(changesEntry.getValue());

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 17, 2014

Contributor

|= ?

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 17, 2014

Author Contributor

This one I caught when I did the weighting and it looks like reverting that reverted my catch. Or I'm just remembering things. Anyway, I'll fix it.

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 18, 2014

Author Contributor

Just finishing up - this |= isn't required because we continue above if modified is true so we can be super duper lazy with the .equals method.

*/
public static void update(Map<String, Object> source, Map<String, Object> changes) {
public static boolean update(Map<String, Object> source, Map<String, Object> changes, boolean checkUpdatesAreUnequal) {

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 17, 2014

Contributor

I think it would be cleaner to always return if the doc has been modified and then to check externally if we want to update a document that has not been modified?

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 17, 2014

Author Contributor

I thought passing in the flag would be better because it allows us to avoid the equality check if we don't need it. I imagine it'd really only matter for long strings but I also imagine this method receives long strings.

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 17, 2014

Contributor

Oh that's a good reason for having such a parameter! Can you add a comment about that?

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 17, 2014

Author Contributor

Absolutely.

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 17, 2014

Indeed I think that the behaviour when detect_noop is false should be the same as today, so to always perform an update and increase the version?

@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 17, 2014

Indeed I think that the behaviour when detect_noop is false should be the same as today, so to always perform an update and increase the version?

Can do.

Updates from code review
1.  |= for recursive merge
2.  Do not consider empty updates noop without detect_noops
@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 18, 2014

Done with updates from this round of comments. Thanks for reviewing!

@@ -218,6 +218,11 @@ public void postDeleteByQuery(Engine.DeleteByQuery deleteByQuery) {
}
}

public void noopUpdate(String type) {
totalStats.noopUpdates.inc();
typeStats(type).noopUpdates.dec();

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 22, 2014

Contributor

I think you mean .inc ?

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 22, 2014

Thanks Nik, I just merged this change. FYI I did a minor change while merging to replace dec with inc on the type stats in ShardIndexingService.

@jpountz jpountz closed this Jul 22, 2014

@jpountz jpountz removed the enhancement label Jul 22, 2014

@jpountz jpountz removed the review label Jul 22, 2014

@nik9000

This comment has been minimized.

Copy link
Contributor Author

nik9000 commented Jul 22, 2014

Thanks! I saw the comment and, yeah, it agree it was supposed to be inc.

Thanks again!

Nik

On Tue, Jul 22, 2014 at 8:57 AM, Adrien Grand notifications@github.com
wrote:

Thanks Nik, I just merged this change. FYI I did a minor change while
merging to replace dec with inc on the type stats in ShardIndexingService.


Reply to this email directly or view it on GitHub
#6862 (comment)
.

@clintongormley clintongormley changed the title Detect noop updates when using doc in update API Update API: Detect noop updates when using doc Sep 10, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.