Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTERNAL_SERVER_ERROR when calling _stats #24872

Closed
cwurm opened this issue May 24, 2017 · 7 comments · Fixed by #24922

Comments

@cwurm
Copy link
Member

commented May 24, 2017

Elasticsearch version: 5.4.0 (on Elastic Cloud)

Steps to reproduce:

  1. Call GET it_ops_logs/_stats
  2. Output:
{
  "_shards": {
    "total": 2,
    "successful": 0,
    "failed": 2,
    "failures": [
      {
        "shard": 0,
        "index": "it_ops_logs",
        "status": "INTERNAL_SERVER_ERROR",
        "reason": {
          "type": "failed_node_exception",
          "reason": "Failed node [ee8ER9CZQZaSSKkdcLJkzQ]",
          "caused_by": {
            "type": "illegal_state_exception",
            "reason": "Negative longs unsupported, use writeLong or writeZLong for negative numbers [-3]"
          }
        }
      },
      {
        "shard": 0,
        "index": "it_ops_logs",
        "status": "INTERNAL_SERVER_ERROR",
        "reason": {
          "type": "failed_node_exception",
          "reason": "Failed node [YAc58bLUTuiYAZy943nTAA]",
          "caused_by": {
            "type": "illegal_state_exception",
            "reason": "Negative longs unsupported, use writeLong or writeZLong for negative numbers [-3]"
          }
        }
      }
    ]
  },
  "_all": {
    "primaries": {},
    "total": {}
  },
  "indices": {}
}

Logs: Nothing unusual (no errors or such).

@nik9000

This comment has been minimized.

Copy link
Contributor

commented May 24, 2017

If it is possible to reproduce It'd be helpful to have error_trace turned on.

@abeyad

This comment has been minimized.

Copy link
Contributor

commented May 24, 2017

I can't reproduce this with the above steps, I'm sure that there were some node failures in a particular manner that allowed this error to manifest.

@cwurm Any other steps that you can add for the reproduction?

@abeyad abeyad self-assigned this May 24, 2017
@cwurm

This comment has been minimized.

Copy link
Member Author

commented May 24, 2017

@nik9000 It's a Cloud cluster, I don't think I can turn this on. :-(

@abeyad It only happens on this index as far as I can tell. I know we resized that Cloud cluster (doubled it in size) earlier today. Cloud console shows all nodes as up and running. No errors in the log.

I'm not sure what I can do. You can access the cluster if you want - ping me.

@jasontedor

This comment has been minimized.

Copy link
Member

commented May 24, 2017

It's a Cloud cluster, I don't think I can turn this on. :-(

It's a request parameter, you can set it (?error_trace=true).

No errors in the log.

Are you sure? We warn log all requests that are responded to with a 500 (unless error_trace is set to true).

@cwurm

This comment has been minimized.

Copy link
Member Author

commented May 24, 2017

@jasontedor Oh sorry, my bad. Running GET it_ops_logs/_stats?error_trace=true doesn't change the output though.

I'm searching furiously through the logs UI in Cloud, but can't find anything related. Unfortunately, it seems next to impossible to go through ES logs in Cloud sequentially (new log lines get added all the time and screw up the pagination). What would the exception look like? (I searched for exception, error, failed - anything I could think of).

@jasontedor

This comment has been minimized.

Copy link
Member

commented May 27, 2017

I obtained the logs from this instance and I know what the issue is, we have a double decrement bug when handling certain queries that fail in the fetch phase. This double decrement leads to the number of outstanding queries on a shard falling negative and that leads to the serialization issue here.

@jasontedor

This comment has been minimized.

Copy link
Member

commented May 27, 2017

I opened #24922.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.