Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pipeline to ensure unique Enrich index documents #46348

Merged
merged 12 commits into from
Sep 24, 2019

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Sep 4, 2019

This PR adds a pipeline that removes ids and routing from documents before indexing them into enrich indices. Enrich documents may come from multiple indices, and thus have id collisions on them. This pipeline ensures that documents with colliding id fields do not clobber one another during the reindex operation while executing an enrich policy.

@jbaiera jbaiera added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Sep 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@martijnvg
Copy link
Member

The reason this pr build failed, is because node startup took too long in enrich qa security module:

[2019-09-04T21:40:01,046][INFO ][o.e.e.NodeEnvironment    ] [node-0] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [106gb], net total_space [145.3gb], types [ext4]
[2019-09-04T21:40:01,053][INFO ][o.e.e.NodeEnvironment    ] [node-0] heap size [494.9mb], compressed ordinary object pointers [true]
[2019-09-04T21:40:01,204][INFO ][o.e.n.Node               ] [node-0] node name [node-0], node ID [rZ45d1yoT3StdGTi9fbenA], cluster name [x-pack_plugin_enrich_qa_rest-with-security_restTestCluster]
[2019-09-04T21:40:01,205][INFO ][o.e.n.Node               ] [node-0] version[8.0.0-SNAPSHOT], pid[78996], build[default/tar/7aa08d1d28a9b2e6e51585206be43bfdf0a654c7/2019-09-04T21:28:09.960250Z], OS[Linux/4.15.0-1041-gcp/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/11.0.2/11.0.2+7]
[2019-09-04T21:40:01,206][INFO ][o.e.n.Node               ] [node-0] JVM home [/var/lib/jenkins/.java/openjdk-11.0.2-linux]
[2019-09-04T21:40:01,207][INFO ][o.e.n.Node               ] [node-0] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch-5322334081754049278, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -Xms512m, -Xmx512m, -ea, -esa, -Dio.netty.allocator.type=unpooled, -XX:MaxDirectMemorySize=268435456, -Des.path.home=/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-2/x-pack/plugin/enrich/qa/rest-with-security/build/cluster/restTestCluster node0/elasticsearch-8.0.0-SNAPSHOT, -Des.path.conf=/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-2/x-pack/plugin/enrich/qa/rest-with-security/build/cluster/restTestCluster node0/elasticsearch-8.0.0-SNAPSHOT/config, -Des.distribution.flavor=default, -Des.distribution.type=tar, -Des.bundled_jdk=true]
[2019-09-04T21:40:01,210][WARN ][o.e.n.Node               ] [node-0] version [8.0.0-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
[2019-09-04T21:40:12,171][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [aggs-matrix-stats]
[2019-09-04T21:40:12,172][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [analysis-common]
[2019-09-04T21:40:12,173][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [data-frame]
[2019-09-04T21:40:12,173][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [flattened]
[2019-09-04T21:40:12,174][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [frozen-indices]
[2019-09-04T21:40:12,174][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [ingest-common]
[2019-09-04T21:40:12,175][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [ingest-geoip]
[2019-09-04T21:40:12,176][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [ingest-user-agent]
[2019-09-04T21:40:12,176][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [lang-expression]
[2019-09-04T21:40:12,177][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [lang-mustache]
[2019-09-04T21:40:12,177][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [lang-painless]
[2019-09-04T21:40:12,178][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [mapper-extras]
[2019-09-04T21:40:12,178][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [parent-join]
[2019-09-04T21:40:12,178][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [percolator]
[2019-09-04T21:40:12,179][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [rank-eval]
[2019-09-04T21:40:12,180][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [reindex]
[2019-09-04T21:40:12,182][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [repository-url]
[2019-09-04T21:40:12,183][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [search-business-rules]
[2019-09-04T21:40:12,183][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [spatial]
[2019-09-04T21:40:12,184][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [transport-netty4]
[2019-09-04T21:40:12,184][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [vectors]
[2019-09-04T21:40:12,218][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-analytics]
[2019-09-04T21:40:12,219][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-ccr]
[2019-09-04T21:40:12,232][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-core]
[2019-09-04T21:40:12,234][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-deprecation]
[2019-09-04T21:40:12,235][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-enrich]
[2019-09-04T21:40:12,249][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-graph]
[2019-09-04T21:40:12,250][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-ilm]
[2019-09-04T21:40:12,250][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-logstash]
[2019-09-04T21:40:12,251][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-ml]
[2019-09-04T21:40:12,253][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-monitoring]
[2019-09-04T21:40:12,254][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-rollup]
[2019-09-04T21:40:12,255][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-security]
[2019-09-04T21:40:12,262][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-sql]
[2019-09-04T21:40:12,262][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-voting-only-node]
[2019-09-04T21:40:12,263][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [x-pack-watcher]
[2019-09-04T21:40:12,264][INFO ][o.e.p.PluginsService     ] [node-0] no plugins loaded
[2019-09-04T21:40:25,304][INFO ][o.e.x.s.a.s.FileRolesStore] [node-0] parsed [2] roles from file [/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-2/x-pack/plugin/enrich/qa/rest-with-security/build/cluster/restTestCluster node0/elasticsearch-8.0.0-SNAPSHOT/config/roles.yml]
[2019-09-04T21:40:26,505][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [node-0] [controller/80093] [Main.cc@110] controller (64 bit): Version 8.0.0-SNAPSHOT (Build e2577717f7b9a4) Copyright (c) 2019 Elasticsearch BV
[2019-09-04T21:40:28,529][DEBUG][o.e.a.ActionModule       ] [node-0] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security

I've seen this in other pr builds too (#46351). Not sure what the actual cause is here. Maybe slow build node?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left two small comments.

IndexResponse indexRequest = client().index(new IndexRequest()
.index(sourceIndex)
.id(collidingDocId)
.source(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set specific _routing value here too and verify that in the .enrich index it is no longer there?

Normally if a _routing gets specified it gets indexed into a _routing field that is also queryable, so
testing that it is queryable in the source index and not in the enrich index should be sufficient.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jbaiera jbaiera merged commit d5cf383 into elastic:enrich Sep 24, 2019
@jbaiera jbaiera deleted the enrich-set-unique-id branch September 24, 2019 19:16
jbaiera added a commit that referenced this pull request Oct 4, 2019
Adds a pipeline that removes ids and routing from documents before indexing
them into enrich indices. Enrich documents may come from multiple indices,
and thus have id collisions on them. This pipeline ensures that documents
with colliding id fields do not clobber one another during the reindex operation
while executing an enrich policy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants