Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All shards failed on time series chart on TSVB #38544

Closed
bhavyarm opened this issue Jun 10, 2019 · 10 comments
Closed

All shards failed on time series chart on TSVB #38544

bhavyarm opened this issue Jun 10, 2019 · 10 comments
Labels
bug Fixes for quality problems that affect the customer experience Feature:elasticsearch Feature:TSVB TSVB (Time Series Visual Builder)

Comments

@bhavyarm
Copy link
Contributor

bhavyarm commented Jun 10, 2019

Kibana version: 7.2.0 BC5

Elasticsearch version: 7.2.0 BC5

Server OS version: darwin_x86_64

Browser version: chrome latest

Browser OS version: OS X

Original install method (e.g. download page, yum, from source, etc.): from staging

Describe the bug: If user tries to drag and drop on time series chart on TSVB - she sees all shards failed message at one point.

Steps to reproduce:

  1. Open visualize and click on create new - click on visual builder
  2. Click on date picker - set the time to 5 years ago
  3. Drag on a peak of the chart and drag on it again
  4. Instead of the chart - I see all shards failed.

Screenshots (if relevant):
tsvb

** Network request:

Request payload:

{
  "timerange": {
    "timezone": "America/Montreal",
    "min": "2019-05-25T07:09:00.862Z",
    "max": "2019-05-25T07:09:26.944Z"
  },
  "query": [
    {
      "query": "",
      "language": "kuery"
    }
  ],
  "filters": [],
  "panels": [
    {
      "id": "61ca57f0-469d-11e7-af02-69e470af7417",
      "type": "timeseries",
      "series": [
        {
          "id": "61ca57f1-469d-11e7-af02-69e470af7417",
          "color": "#68BC00",
          "split_mode": "everything",
          "metrics": [
            {
              "id": "61ca57f2-469d-11e7-af02-69e470af7417",
              "type": "count"
            }
          ],
          "separate_axis": 0,
          "axis_position": "right",
          "formatter": "number",
          "chart_type": "line",
          "line_width": 1,
          "point_size": 1,
          "fill": 0.5,
          "stacked": "none"
        }
      ],
      "time_field": "",
      "index_pattern": "",
      "interval": "",
      "axis_position": "left",
      "axis_formatter": "number",
      "axis_scale": "normal",
      "show_legend": 1,
      "show_grid": 1,
      "default_index_pattern": "kibana_sample_data_ecommerce",
      "default_timefield": "order_date"
    }
  ],
  "state": {}
}

Request response:

{
  "type": "timeseries",
  "uiRestrictions": {
    "whiteListedMetrics": {
      "*": true
    },
    "whiteListedGroupByFields": {
      "*": true
    }
  },
  "61ca57f0-469d-11e7-af02-69e470af7417": {
    "id": "61ca57f0-469d-11e7-af02-69e470af7417",
    "statusCode": 400,
    "error": {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "Unable to parse interval [0.26s]"
          }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
          {
            "shard": 0,
            "index": "kibana_sample_data_ecommerce",
            "node": "m_pWo1RmRTSIV8i-lemrSQ",
            "reason": {
              "type": "illegal_argument_exception",
              "reason": "Unable to parse interval [0.26s]"
            }
          }
        ],
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "Unable to parse interval [0.26s]",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Unable to parse interval [0.26s]"
          }
        }
      },
      "status": 400
    },
    "series": []
  }
}

ES Logs:

org.elasticsearch.transport.RemoteTransportException: [bhavyarajumandya][127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: java.lang.IllegalArgumentException: Unable to parse interval [0.25s]
	at org.elasticsearch.search.aggregations.bucket.histogram.DateIntervalWrapper.createRounding(DateIntervalWrapper.java:299) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramAggregationBuilder.innerBuild(DateHistogramAggregationBuilder.java:490) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.doBuild(ValuesSourceAggregationBuilder.java:315) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.doBuild(ValuesSourceAggregationBuilder.java:39) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.build(AbstractAggregationBuilder.java:139) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.build(AggregatorFactories.java:332) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.SearchService.parseSource(SearchService.java:789) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.SearchService.createContext(SearchService.java:591) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:550) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:353) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:340) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.search.SearchService$2.doRun(SearchService.java:1052) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:835) [?:?]
@bhavyarm bhavyarm added bug Fixes for quality problems that affect the customer experience Feature:TSVB TSVB (Time Series Visual Builder) labels Jun 10, 2019
@gospodarsky
Copy link
Contributor

gospodarsky commented Jun 11, 2019

@bhavyarm After the issue investigation the following outcomes could be drawn up:

  • The bug has been reproduced in new chart as well as in old charts; It means that there isn't a chart problem;
  • We have found out if we set time range less than 50s the bug will appear;
  • If a time gap between 2 data point less than 1s the following exception is fired: java.lang.IllegalArgumentException: Unable to parse interval [0.???s];
  • We've tried to reproduce the bug in old instances and it works fine there;
  • From the previous point it was decided to compare the history of the DateIntervalWrapper class which emits the error. And we see that the underlying problem is there. We don't know what the cause of the acts so we ask @elastic/elasticsearch for clarifying.

@polyfractal
Copy link
Contributor

I'm not sure why it worked before, but fractional times (0.26s) have been disallowed for a long time. E.g. from the 6.1 docs:

Note that fractional time values are not supported, but you can address this by shifting to another time unit (e.g., 1.5h could instead be specified as 90m).

Could we get a copy of the request being sent from older versions, maybe kibana is sending a different interval in those cases? It's also possible the recent refactoring fixed what was a bug previously (allowing those fractional units when it shouldn't)... I can investigate in a little bit.

@gospodarsky
Copy link
Contributor

gospodarsky commented Jun 12, 2019

@polyfractal. (Shane edited / deleted this)

Request payload:

{
  "timerange": {
    "timezone": "Europe/Minsk",
    "min": "2019-04-26T14:40:25.000Z",
    "max": "2019-04-26T14:40:26.000Z"
  },
  "query": {
    "query": "",
    "language": "kuery"
  },
  "filters": [],
  "panels": [
    {
      "id": "61ca57f0-469d-11e7-af02-69e470af7417",
      "type": "timeseries",
      "series": [
        {
          "id": "61ca57f1-469d-11e7-af02-69e470af7417",
          "color": "rgba(0,156,224,1)",
          "split_mode": "everything",
          "metrics": [
            {
              "id": "61ca57f2-469d-11e7-af02-69e470af7417",
              "type": "count",
              "numerator": "FlightDelay:true"
            }
          ],
          "separate_axis": 0,
          "axis_position": "right",
          "formatter": "number",
          "chart_type": "line",
          "line_width": "2",
          "point_size": "0",
          "fill": 0.5,
          "stacked": "none",
          "label": "Percent Delays"
        }
      ],
      "time_field": "timestamp",
      "index_pattern": "kibana_sample_data_flights",
      "interval": "",
      "axis_position": "left",
      "axis_formatter": "number",
      "show_legend": 1,
      "show_grid": 1,
      "annotations": [],
      "legend_position": "bottom",
      "axis_scale": "normal",
      "default_index_pattern": "apm*"
    }
  ],
  "state": {}
}

Pay attention the difference between min and max is 1s.

Response payload:

{
  "61ca57f0-469d-11e7-af02-69e470af7417": {
    "id": "61ca57f0-469d-11e7-af02-69e470af7417",
    "series": [
      {
        "id": "61ca57f1-469d-11e7-af02-69e470af7417",
        "label": "Percent Delays",
        "color": "rgb(0, 156, 224)",
        "data": [
          [
            1556289625000,
            0
          ]
        ],
        "stack": false,
        "lines": {
          "show": true,
          "fill": 0.5,
          "lineWidth": 2,
          "steps": false
        },
        "points": {
          "show": false,
          "radius": 1,
          "lineWidth": 5
        },
        "bars": {
          "show": false,
          "fill": 0.5,
          "lineWidth": 2
        }
      }
    ]
  },
  "type": "timeseries"
}

The similar things I do in latest master.

Request payload:

{
  "timerange": {
    "timezone": "Europe/Minsk",
    "min": "2019-06-04T19:01:05.000Z",
    "max": "2019-06-04T19:01:06.000Z"
  },
  "query": [
    {
      "query": "",
      "language": "kuery"
    }
  ],
  "filters": [],
  "panels": [
    {
      "id": "61ca57f0-469d-11e7-af02-69e470af7417",
      "type": "timeseries",
      "series": [
        {
          "id": "61ca57f1-469d-11e7-af02-69e470af7417",
          "color": "rgba(0,156,224,1)",
          "split_mode": "everything",
          "metrics": [
            {
              "id": "61ca57f2-469d-11e7-af02-69e470af7417",
              "type": "count",
              "numerator": "FlightDelay:true"
            }
          ],
          "separate_axis": 0,
          "axis_position": "right",
          "formatter": "number",
          "chart_type": "line",
          "line_width": "2",
          "point_size": "0",
          "fill": 0.5,
          "stacked": "none",
          "label": "Percent Delays"
        }
      ],
      "time_field": "timestamp",
      "index_pattern": "kibana_sample_data_flights",
      "interval": "",
      "axis_position": "left",
      "axis_formatter": "number",
      "show_legend": 1,
      "show_grid": 1,
      "annotations": [],
      "legend_position": "bottom",
      "axis_scale": "normal",
      "default_index_pattern": "kibana_sample_data_flights",
      "default_timefield": "timestamp"
    }
  ],
  "state": {}
}

Response payload:

{
  "type": "timeseries",
  "uiRestrictions": {
    "whiteListedMetrics": {
      "*": true
    },
    "whiteListedGroupByFields": {
      "*": true
    }
  },
  "61ca57f0-469d-11e7-af02-69e470af7417": {
    "id": "61ca57f0-469d-11e7-af02-69e470af7417",
    "statusCode": 400,
    "error": {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "Unable to parse interval [0.01s]"
          }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
          {
            "shard": 0,
            "index": "kibana_sample_data_flights",
            "node": "a-ZOVl94R1uvjGy0dO_-zw",
            "reason": {
              "type": "illegal_argument_exception",
              "reason": "Unable to parse interval [0.01s]"
            }
          }
        ],
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "Unable to parse interval [0.01s]",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Unable to parse interval [0.01s]"
          }
        }
      },
      "status": 400
    },
    "series": []
  }
}

image

In both cases the interval is auto.

@polyfractal
Copy link
Contributor

Sorry, I meant the request that Kibana is sending to Elasticsearch. Can we get a copy of that? E.g. I'd like to see if this is a date_histogram or auto_date_histogram, the interval being set (if it really is auto or something else), etc etc.

I looked around but didn't see any easy way to "inspect" the query that TSVB is sending.

@gospodarsky
Copy link
Contributor

@eskibars Sorry. The password and history have been changed.

@gospodarsky
Copy link
Contributor

gospodarsky commented Jun 13, 2019

@polyfractal it took some time to get the data from both versions :) Nevertheless, it has been pulled.

7.x (with error)

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "timestamp": {
              "gte": "2019-06-05T10:46:23.000Z",
              "lte": "2019-06-05T10:46:24.000Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "filter": [
        {
          "match_all": {}
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "aggs": {
    "timeseries": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "0.01s",
        "min_doc_count": 0,
        "time_zone": "Europe/Minsk",
        "extended_bounds": {
          "min": 1559731583000,
          "max": 1559731584000
        }
      },
      "aggs": {
        "61ca57f2-469d-11e7-af02-69e470af7417": {
          "bucket_script": {
            "buckets_path": {
              "count": "_count"
            },
            "script": {
              "source": "count * 1",
              "lang": "expression"
            },
            "gap_policy": "skip"
          }
        }
      },
      "meta": {
        "timeField": "timestamp",
        "intervalString": "0.01s",
        "bucketSize": 1,
        "seriesId": "61ca57f1-469d-11e7-af02-69e470af7417"
      }
    }
  }
}

7.0

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "timestamp": {
              "gte": "2019-07-06T15:52:22.000Z",
              "lte": "2019-07-06T15:52:23.000Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "filter": [
        {
          "match_all": {}
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "aggs": {
    "61ca57f1-469d-11e7-af02-69e470af7417": {
      "filter": {
        "match_all": {}
      },
      "aggs": {
        "timeseries": {
          "date_histogram": {
            "field": "timestamp",
            "interval": "1s",
            "min_doc_count": 0,
            "time_zone": "Europe/Minsk",
            "extended_bounds": {
              "min": 1562428342000,
              "max": 1562428343000
            }
          },
          "aggs": {
            "61ca57f2-469d-11e7-af02-69e470af7417": {
              "bucket_script": {
                "buckets_path": {
                  "count": "_count"
                },
                "script": {
                  "source": "count * 1",
                  "lang": "expression"
                },
                "gap_policy": "skip"
              }
            }
          }
        }
      },
      "meta": {
        "timeField": "timestamp",
        "intervalString": "1s",
        "bucketSize": 1
      }
    }
  }
}

@polyfractal
Copy link
Contributor

Thanks @Avinar-24! So it looks like 7.0 TSVB is sending a valid interval which is why there are no errors:

"interval": "1s",

But 7.x TSVB is sending an invalid, fractional interval:

"interval": "0.01s",

Which is where the exception is coming from. So something must have changed in Kibana or TSVB which is causing the interval to be sent incorrectly. This is using the regular date_histogram (and not the auto_date_histogram), which means the "auto" interval setting is being generated by Kibana in some fashion, not Elasticsearch.

@gospodarsky
Copy link
Contributor

@polyfractal the related PR has been created. Thank you too.

@polyfractal
Copy link
Contributor

np, happy to help :)

@gospodarsky
Copy link
Contributor

Closed #38880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:elasticsearch Feature:TSVB TSVB (Time Series Visual Builder)
Projects
None yet
Development

No branches or pull requests

3 participants