-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Support square bracket array syntax in flexible ingest pipelines #133790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support square bracket array syntax in flexible ingest pipelines #133790
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
masseyke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IngestDocumentTests::testRemoveFieldIgnoreMissing fails when it takes the "case 1" path (so about a third of the time), but otherwise it all looks good to me.
masseyke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Converting this to draft. We are going to put this on hold until we can determine how we'd like to expose this new syntax to the scripting field API |
Adds support for array indexing in the new
flexibleingest document access pattern.The classic way to index into an array field via ingest node is to use an integer field name. This is done in a context sensitive manner. If the field path finds itself at an array when handling a field name, it will try to parse it as an integer and use it to index into the array. If the path finds itself at an object when handling an integer field name, it will treat the integer as a regular field name:
a.b.c.1{a: {b: {c: [0, 1, 2] } } }1{a: {b: {c: {0: foo, 1: bar, 2: baz] } } }barConfusingly, if you try to write a value to an array index, and that array does not exist, it will create a path of objects and use the index as the field name for the value:
a.b.c.1tofoo{a: {b: {c: [0, 1, 2] } } }{a: {b: {c: [0, foo, 2] } } }{}{a: {b: {c: {1: foo} } } }Since the
flexibleaccess pattern is a new way to denote field access in ingest documents, we will add support for new array indexing syntax which is closer to how most programming languages surface the concept. This will allow for explicit control over whether a field path element is meant to be a field name or an array index, and will hopefully cut down on set operations that inconsistently create fields when array accesses were intended.The new syntax follows some simple rules:
Arrays are indexed by using square brackets to denote the position to use.
Square brackets may be repeated to index higher dimensionality arrays.
Using numbers after a dot will always be treated as a field name.
When retrieving a field, square brackets interrupt the ability to chain dotted fields together. This is because dotted field names can only be accessed on map data. Array indices can only be applied to array data.
a[0]athen array index0a[0].bathen array index0then fieldba.b.c[2].d.e.f[1].ga.b.cthen array index2then fieldd.e.fthen array index1then fieldgSetting a value on a document with a field path that includes array indices will require the arrays to exist on the document.
{a: [ 0, 1, 2 ] }set a[1] = 5{a: [ 0, 5, 2 ] }{a: [ {b:foo} ]}set a[0].b = bar{a:[ {b:bar} ]}{a: [] }set a[0] = barindex [0] out of bounds for array with size [0]{a: {} }set a[0] = barcould not resolve array index [0] against field type [Map]{}set a[0] = barcould not resolve field [a]Append operations continue to work as expected
{a: [ 0, 1, 2 ] }a append 5{a: [ 0, 1, 2, 5 ] }{a: [ {b:foo} ]}a append {c = bar}{a:[ {b:foo}, {c:bar} ]}{a: [] }a append bar{a: [bar] }{a: {} }a append bar{a: [{}, bar] }{}a append bar{a: [bar] }{a: [0, 1] }a[0] append bar{a: [ [0, bar], 1] }