New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP _bulk allows "_id": "" to set empty _id #24116

Closed
pickypg opened this Issue Apr 14, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@pickypg
Member

pickypg commented Apr 14, 2017

Tested on master, 5.3, 5.2, 5.0, and 2.3:

PUT /_bulk
{ "index": { "_index": "test", "_type": "type", "_id": "" } }
{ "doc": "1" }
{ "index": { "_index": "test", "_type": "type", "_id": "" } }
{ "doc": "2" }
{ "index": { "_index": "test", "_type": "type" } }
{ "doc": "3" }

This creates two documents:

{
  "took": 439,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "test",
        "_type": "type",
        "_id": "",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "created": true,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "test",
        "_type": "type",
        "_id": "",
        "_version": 2,
        "result": "updated",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 1,
        "created": false,
        "status": 200
      }
    },
    {
      "index": {
        "_index": "test",
        "_type": "type",
        "_id": "AVttiPbjiL4QtkRlx6WB",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "created": true,
        "status": 201
      }
    }
  ]
}

This should create 3 documents, all with flake IDs (like the third document here). It's a bug because you can't actually GET a document with an empty _id.

GET /test/type/

spits out:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No endpoint or operation is available at [type]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "No endpoint or operation is available at [type]"
  },
  "status": 400
}
@jasontedor

This comment has been minimized.

Member

jasontedor commented Apr 14, 2017

It's always been like this (at least back to the 1.x series), it's not just a since 5.x thing. I do agree it's problematic though, for example you can not get such a document via the get API.

This should create 3 documents, all with flake IDs (like the third document here).

Maybe, or perhaps it should simply be rejected as an invalid request?

@pickypg

This comment has been minimized.

Member

pickypg commented Apr 14, 2017

@jasontedor

Yeah, I've been slowly testing more and more builds.

Maybe, or perhaps it should simply be rejected as an invalid request?

That works for me too, specifically because individual indexing requests with a blank _id are rejected:

PUT /test/type/
{ "doc": 1 }
@jasontedor

This comment has been minimized.

Member

jasontedor commented Apr 14, 2017

The reason I think that it should be rejected is because allowing _id: "" is a form of leniency. An application bug could easily unintentionally introduce _id: "" on a request and if I were an application developer accidentally introducing such a bug I would want Elasticsearch to reject it. We already have a way to specify auto-generated IDs: do not send an explicit ID field _id on the request, I don't think that we need a second way.

@jasontedor jasontedor self-assigned this Apr 14, 2017

@gmoskovicz

This comment has been minimized.

Contributor

gmoskovicz commented Apr 14, 2017

Agree that we should reject empty ID's. By the way, the issue is only with empty ID's, because if it's a space, you can still get the document by ID:

PUT /_bulk
{"index":{"_index":"test","_type":"type","_id":" "}}
{"doc":"1"}

GET test/type/%20
{
  "_index": "test",
  "_type": "type",
  "_id": " ",
  "_version": 1,
  "found": true,
  "_source": {
    "doc": "1"
  }
}

TBH, not sure if it should be rejected, or treated like it needs to auto-generate the ID. That's another option, right?

@jasontedor

This comment has been minimized.

Member

jasontedor commented Apr 14, 2017

Yes, I do not think that we should reject a space as the _id as those can still be obtained exactly as you say.

TBH, not sure if it should be rejected, or treated like it needs to auto-generate the ID. That's another option, right?

I do not think so, that would be a breaking change.

@jasontedor

This comment has been minimized.

Member

jasontedor commented Apr 14, 2017

The issue of _id being empty however is already broken, there's nothing to break but rather to fix.

@jasontedor

This comment has been minimized.

Member

jasontedor commented Apr 14, 2017

I opened #24118.

@lcawl lcawl added :Distributed/CRUD and removed :Bulk labels Feb 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment