Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent treatment of invalid characters in index and bulk requests #61763

Closed
cjcenizal opened this issue Aug 31, 2020 · 7 comments
Closed
Labels
>bug Team:Clients Meta label for clients team

Comments

@cjcenizal
Copy link
Contributor

A user has gotten into a situation where the Index Management UI fails to load indices named like %{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23 (elastic/kibana#75869). This occurs because we're fetching indices using the get index API (https://github.com/elastic/kibana/pull/66422/files#diff-784d69eb6ea217af4edd2559d5ca2445R44), which complains about these indices' names with an error like this one:

{
  "statusCode":405,
  "error":"Method Not Allowed",
  "message":"invalid escape sequence `%{[' at index 5 of: test,%{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23, allowed: [PUT, DELETE, HEAD, GET]"
}

This looks similar to other problems users have reported regarding Logstash creating indices with these names (https://discuss.elastic.co/t/logstash-index-name-as-metadata-beat-metadata-version/184225). I suspect that Logstash is using the bulk API, which will allow creation of an index with this name (disallowed by the index API):

POST _bulk
{ "index" : { "_index" : "%{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23", "_id" : "1" } }

It seems reasonable for the index API to reject invalid characters like %, [, and {, but inconsistent for the bulk API to allow them. It also seems trappy because then users can get into the situation as in the first link I shared, in which parts of the stack break for no clear reason, due to unexpected interactions between other parts of the stack.

Possibly related: #48308, #9379

@cjcenizal cjcenizal added the >bug label Aug 31, 2020
@gwbrown gwbrown added the :Core/Infra/REST API REST infrastructure and utilities label Sep 1, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/REST API)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Sep 1, 2020
@rjernst rjernst added team-discuss :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Core/Infra/REST API REST infrastructure and utilities labels Sep 1, 2020
@elasticmachine elasticmachine removed the Team:Core/Infra Meta label for core/infra team label Sep 1, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Sep 1, 2020
@rjernst
Copy link
Member

rjernst commented Sep 1, 2020

I think this falls under the distributed area as it's really about the behavior of bulk indexing requests (which I think CRUD includes), not about general infrastructure of all requests.

@ywelsch ywelsch removed the >bug label Sep 1, 2020
@ywelsch
Copy link
Contributor

ywelsch commented Sep 1, 2020

This looks like Kibana is not properly URL escaping the index name:

If I URL escape that index name using https://www.urlencoder.org, I get %25%7B%5B%40metadata%5D%5Bbeat%5D%7D-%25%7B%5B%40metadata%5D%5Bversion%5D%7D-2020.08.23, and using that with curl works just fine.

$ curl localhost:9200/%25%7B%5B%40metadata%5D%5Bbeat%5D%7D-%25%7B%5B%40metadata%5D%5Bversion%5D%7D-2020.08.23
{"%{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23":{"aliases":{},"mappings":{},"settings":{"index":{"creation_date":"1598944051293","number_of_shards":"1","number_of_replicas":"1","uuid":"Lp0PuvbcRQeR2fkpZ9GhXw","version":{"created":"7090099"},"provided_name":"%{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23"}}}}

Kibana DEV console also does not look to properly escape either.

POST /%{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23/_doc/1
{}

returns

{
  "error" : {
    "root_cause" : [
      {
        "type" : "invalid_index_name_exception",
        "reason" : "Invalid index name [%7B[@metadata][beat]}-%7B[@metadata][version]}-2020.08.23], must be lowercase",
        "index_uuid" : "_na_",
        "index" : "%7B[@metadata][beat]}-%7B[@metadata][version]}-2020.08.23"
      }
    ],
    "type" : "invalid_index_name_exception",
    "reason" : "Invalid index name [%7B[@metadata][beat]}-%7B[@metadata][version]}-2020.08.23], must be lowercase",
    "index_uuid" : "_na_",
    "index" : "%7B[@metadata][beat]}-%7B[@metadata][version]}-2020.08.23"
  },
  "status" : 400
}

whereas the properly escaped

POST /%25%7B%5B%40metadata%5D%5Bbeat%5D%7D-%25%7B%5B%40metadata%5D%5Bversion%5D%7D-2020.08.23/_doc/1
{}

returns

{
  "_index" : "%{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

@cjcenizal
Copy link
Contributor Author

Thanks, @ywelsch! You're right. It looks like this encoding step might be missing in the Elasticsearch JS client. Once I can confirm where the problem resides (in the client or in Kibana), I'll close this.

@elastic/es-clients I'm making a request to transport.request on the legacy client with this payload:

{
    method: 'GET',
    path: `/%{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23`,
    query: {
      expand_wildcards: 'hidden,all',
    },
}

The response is this error:

invalid escape sequence `%{[' at index 0 of: %{[@metadata][beat]}-%{[@metadata][version]}-2020.08.23, allowed: [GET, HEAD, PUT, DELETE]

Does the client expect the consumer to url-encode the path? Or should the client be responsible for that?

@ywelsch ywelsch removed :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. feedback_needed labels Sep 2, 2020
@elasticmachine elasticmachine removed the Team:Distributed Meta label for distributed team label Sep 2, 2020
@ywelsch ywelsch added Team:Clients Meta label for clients team >bug labels Sep 2, 2020
@delvedor
Copy link
Member

delvedor commented Sep 2, 2020

Hello! The js client is encoding only URL parameters and not the entire URL, see here for example.
Parameters are encoded with encodeURIComponent.

I fear the issue might be caused by the difference between encodeURI and encodeURIComponent. As far as I know encodeURI can't be used for encoding specific parts of the URL, only entire URLs.
The difference between the two methods is:

  • encodeURIComponent() encodes everything except these characters: A-Z a-z 0-9 - _ . ! ~ * ' ( )
  • encodeURI() ignores the same characters as encodeURIComponent(), and also ignores these: ; , / ? : @ & = + $ #

Side note, transport.request expects you to pass the URL already encoded. If you use the surface API of the client (client.indices.get), everything should be handled for you.

@cjcenizal
Copy link
Contributor Author

Thanks for your help @delvedor! We'll fix this on the Kibana side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Team:Clients Meta label for clients team
Projects
None yet
Development

No branches or pull requests

6 participants