New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source filtering on nested properties no longer returning empty[] when no nested documents. #23796

Open
murfee25 opened this Issue Mar 29, 2017 · 7 comments

Comments

Projects
None yet
8 participants
@murfee25

murfee25 commented Mar 29, 2017

Elasticsearch version: 5.2.2

Plugins installed: none

JVM version: 1.8.0_111

OS version: Windows 10

Description of the problem including expected versus actual behavior:
Just ungraded from 5.0.1 and now hitting this issue.
When a document contains no nested documents in its array property and we use source filtering to exclude any nested document property, the result is missing the entire nested documents empty array property.

Steps to reproduce:

  1. Create document with nested document array property
{
 "mydocument": {
   "properties": {
     "mynesteddocuments": {
       "type": "nested"
     }
   }
 }
}
  1. Add couple of documents, 1 of which should have empty array of nesteddocuments
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54321,
  "mynesteddocuments": []
}
  1. Search for the documents with source filtering to exclude say the "intprop" property.
{
  "_source": {
    "excludes": [
      "mynesteddocuments.intprop"
    ]
  },
  "query": {
    "match_all": {}
  }
}
  1. For documents with nesteddocuments it correctly excludes the "intprop" but for the documents with empty nested documents array... this property is completely missing.
    Previously it always returned an empty [] which we relied on in code and it would be painful to have to introduce checks for nulls now.

Believe it may be somewhat related to #22557 and #22593

@colings86

This comment has been minimized.

Show comment
Hide comment
@colings86

colings86 Mar 31, 2017

Member

Full script to reproduce:

PUT test
{
  "mappings": {
    "mydocument": {
      "properties": {
        "mynesteddocuments": {
          "type": "nested"
        }
      }
    }
  }
}

POST test/mydocument/1
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54321,
  "mynesteddocuments": []
}

POST test/mydocument/2
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54322,
  "mynesteddocuments": [{ "foo:": "bar"}]
}

POST test/mydocument/3
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54323,
  "mynesteddocuments": [{ "foo:": "bar"}]
}

GET test/_search
{
  "_source": {
    "excludes": [
      "mynesteddocuments.intprop"
    ]
  },
  "query": {
    "match_all": {}
  }
}

Note that in the above the source filtering is excluding mynesteddocuments.intprop which does not exist on any of the nested documents. If you instead exclude intprop (see below) the document with no nested docs is returned with "mynesteddocuments": [], so the mynesteddocumentsarray is only not printed when the source excludes is excluding fields in the nested document.

GET test/_search
{
  "_source": {
    "excludes": [
      "intprop"
    ]
  },
  "query": {
    "match_all": {}
  }
}
Member

colings86 commented Mar 31, 2017

Full script to reproduce:

PUT test
{
  "mappings": {
    "mydocument": {
      "properties": {
        "mynesteddocuments": {
          "type": "nested"
        }
      }
    }
  }
}

POST test/mydocument/1
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54321,
  "mynesteddocuments": []
}

POST test/mydocument/2
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54322,
  "mynesteddocuments": [{ "foo:": "bar"}]
}

POST test/mydocument/3
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54323,
  "mynesteddocuments": [{ "foo:": "bar"}]
}

GET test/_search
{
  "_source": {
    "excludes": [
      "mynesteddocuments.intprop"
    ]
  },
  "query": {
    "match_all": {}
  }
}

Note that in the above the source filtering is excluding mynesteddocuments.intprop which does not exist on any of the nested documents. If you instead exclude intprop (see below) the document with no nested docs is returned with "mynesteddocuments": [], so the mynesteddocumentsarray is only not printed when the source excludes is excluding fields in the nested document.

GET test/_search
{
  "_source": {
    "excludes": [
      "intprop"
    ]
  },
  "query": {
    "match_all": {}
  }
}
@colings86

This comment has been minimized.

Show comment
Hide comment
@colings86

colings86 Mar 31, 2017

Member

Also note that this is not specific to nested documents. If you index the array as an embedded object (See below) you can reproduce the same thing:

DELETE test

POST test/mydocument/1
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54321,
  "mynesteddocuments": []
}

POST test/mydocument/2
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54322,
  "mynesteddocuments": [{ "foo:": "bar"}]
}

POST test/mydocument/3
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54323,
  "mynesteddocuments": [{ "foo:": "bar"}]
}

GET test/_search
{
  "_source": {
    "excludes": [
      "mynesteddocuments.intprop"
    ]
  },
  "query": {
    "match_all": {}
  }
}
Member

colings86 commented Mar 31, 2017

Also note that this is not specific to nested documents. If you index the array as an embedded object (See below) you can reproduce the same thing:

DELETE test

POST test/mydocument/1
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54321,
  "mynesteddocuments": []
}

POST test/mydocument/2
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54322,
  "mynesteddocuments": [{ "foo:": "bar"}]
}

POST test/mydocument/3
{
  "strprop": "document string property",
  "boolprop": true,
  "intprop": 54323,
  "mynesteddocuments": [{ "foo:": "bar"}]
}

GET test/_search
{
  "_source": {
    "excludes": [
      "mynesteddocuments.intprop"
    ]
  },
  "query": {
    "match_all": {}
  }
}
@clintongormley

This comment has been minimized.

Show comment
Hide comment
@clintongormley

clintongormley Mar 31, 2017

Member

This needs more investigation - need to figure out the edge cases before we can figure out how to make things more consistent.

What should we do with dots in field names eg foo.bar.baz and you exclude foo.bar?

Member

clintongormley commented Mar 31, 2017

This needs more investigation - need to figure out the edge cases before we can figure out how to make things more consistent.

What should we do with dots in field names eg foo.bar.baz and you exclude foo.bar?

@b-viguier

This comment has been minimized.

Show comment
Hide comment
@b-viguier

b-viguier Jun 13, 2017

Hi!
We have a lot of issues on our API because of this breaking change… Instead of just returning an empty array, we have now to deal with the case where the field is missing and to replace it with an empty array on the fly, in order to prevent all our applications to crash when testing the length of the array… 😞

Is it acceptable to restore previous behavior in a fix release?

b-viguier commented Jun 13, 2017

Hi!
We have a lot of issues on our API because of this breaking change… Instead of just returning an empty array, we have now to deal with the case where the field is missing and to replace it with an empty array on the fly, in order to prevent all our applications to crash when testing the length of the array… 😞

Is it acceptable to restore previous behavior in a fix release?

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Jun 13, 2017

Member

@b-viguier yeah, this is a tricky area where we tried to fix things only ending up breaking something else (in this case your use case, sorry for that). We have decided to take a step back and think about the entire source filtering logic, including more edge cases. No ETA known at the moment.

Member

bleskes commented Jun 13, 2017

@b-viguier yeah, this is a tricky area where we tried to fix things only ending up breaking something else (in this case your use case, sorry for that). We have decided to take a step back and think about the entire source filtering logic, including more edge cases. No ETA known at the moment.

@b-viguier

This comment has been minimized.

Show comment
Hide comment
@b-viguier

b-viguier Jun 13, 2017

@bleskes Thank you very much for this feedback and your work about this.
We stay tuned for any news 👍

b-viguier commented Jun 13, 2017

@bleskes Thank you very much for this feedback and your work about this.
We stay tuned for any news 👍

@lcawl lcawl added v6.0.1 and removed v6.0.0 labels Nov 13, 2017

@lcawl lcawl added v6.0.2 and removed v6.0.1 labels Dec 6, 2017

@jaymode jaymode added v6.0.3 and removed v6.0.2 labels Dec 13, 2017

@javanna

This comment has been minimized.

Show comment
Hide comment
Member

javanna commented Mar 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment