Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_reindex from remote and its bugs #22027

Closed
celesteking opened this issue Dec 7, 2016 · 8 comments
Closed

_reindex from remote and its bugs #22027

celesteking opened this issue Dec 7, 2016 · 8 comments
Labels
discuss :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.

Comments

@celesteking
Copy link

I'm trying to mass-reindex from remote. As elasticdump is "unsupported", I was trying to use so-called "reindex from remote" feature. It failed in following aspekts:

  • It looks like it's impossible to not specify destination index. I need to transfer multiple indexes from remote. I can't specify them one by one. I need multi-index notation, like index: blah-*-2016-* . It looks like this isn't possible without some dirty scripting. It would be wonderful if all this would happen under the hood.

Next, I was trying to use the documented API.

GET _tasks/Af0W-dC3QQSlJ28uRru0fQ:8488
=>
{
  "completed": false,
  "task": {
    "node": "Af0W-dC3QQSlJ28uRru0fQ",
    "id": 8488,
    "type": "transport",
    "action": "indices:data/write/reindex",
    "status": {
      "total": 0,
      "updated": 0,
      "created": 0,
      "deleted": 0,
      "batches": 0,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1,
      "throttled_until_millis": 0
    },
    "description": "",
    "start_time_in_millis": 1481112063773,
    "running_time_in_nanos": 650222891445,
    "cancellable": true
  }
}
GET .tasks/task/8488
=>
�{
  "_index": ".tasks",
  "_type": "task",
  "_id": "8488",
  "found": false
}

It should've been found as per docs.

GET /_tasks/taskId:8488
=>
{
  "error": {
    "root_cause": [
      {
        "type": "resource_not_found_exception",
        "reason": "task [taskId:8488] isn't running or stored its results"
      }
    ],
    "type": "resource_not_found_exception",
    "reason": "task [taskId:8488] isn't running or stored its results"
  },
  "status": 404
}

It's also unclear what's going on, was the task stalled, hung, connecting? Why is it taking so long without any movement? I'm on v5.0.2. Thanks.

@celesteking
Copy link
Author

celesteking commented Dec 7, 2016

Also, docs are way far from usable:

  "source": {
    "index": "metricbeat-*"
  },
  "dest": {
    "index": "metricbeat"
  },
  "script": {
    "lang": "painless",
    "inline": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
  }

You're assigning index to itself, basically. this wont' work. maybe you meant ctx._source._index ? This is really hard for a newbie guys.

update: ctx._index = ctx._source._index didn't work. Nothing works. This is bullshit.

@jimczi
Copy link
Contributor

jimczi commented Dec 7, 2016

You're assigning index to itself, basically. this wont' work. maybe you meant ctx._source._index ? This is really hard for a newbie guys.

Did you try it ? I just did and it works like a charm. It basically does what you're asking below:

It looks like it's impossible to not specify destination index

Yes it's possible with what you call "dirty scripting". Sorry if you don't like scripts but that's the way it works.

update: ctx._index = ctx._source._index didn't work. Nothing works. This is bullshit.

Yes it doesn't work if you invent new syntax, what works is what's documented. I'll ignore the last part of your comment since I understand your frustration but please note that being aggressive doesn't solve anything ;)

Regarding the hang, I can reproduce if the source node is not responding. I tested with the source node down and in that case the reindex blocks and there is no way to access the hanging task in the destination node.
For this reason I'll leave this issue open but I am sure that @nik9000 has a solution for this.

@celesteking
Copy link
Author

.tasks / _tasks (why 2 of them?) API is unusable, reread my comment -- it just doesn't work, even for tasks that completed fine.

As regarding the main problem, yes, I tried specifying the script exactly as per doc. It doesn't work, it's trying to log to "metricbeat" , not to "metricbeat-2016-12-07". I can provide access to our cluster so that you can try it.

@clintongormley
Copy link

@celesteking to echo @jimczi's comment - this isn't your first issue where you get really aggressive. Seriously, instead of telling us how shit it all is, just point out the problems you're having and we'll try to help you. The aggression just makes us want to look at the other 1,000 open issue instead of yours.

.tasks / _tasks (why 2 of them?)

.tasks is the index where the info is stored, and the docs are trying to point out that you'll need to delete data from this index at some stage in the future so that it doesn't use too much space. Nowhere does it tell you to do GET .tasks/task/ID

_tasks is the API and it should be GET _tasks/Af0W-dC3QQSlJ28uRru0fQ:8488 as you used in the first example, not GET /_tasks/taskId:8488. I can see how these docs could be confusing and will improve that.

@celesteking
Copy link
Author

According to REST guidelines, you should've used DELETE _tasks/task/$id and problem solved. This is not the first time I see inconsistencies in API.

I will stop logging issues from now on (or helping in any way) and will probably switch over another tool for log storage. This thing is not production ready. Period.

@nik9000
Copy link
Member

nik9000 commented Dec 8, 2016

Regarding the hang, I can reproduce if the source node is not responding. I tested with the source node down and in that case the reindex blocks and there is no way to access the hanging task in the destination node.
For this reason I'll leave this issue open but I am sure that @nik9000 has a solution for this.

@jimczi, do you know if the reindex was running on the source node? I can imagine a situation where you start a reindex, and then shoot the node that the reindex was running on before it finishes. The task get action notices that the node is no longer running and looks in the tasks index. If it doesn't find the task it then it reports that error message. And it won't find it because the reindex didn't complete before the node left. I think I need to add a better error message to that.

As to the question of multi-source, multi-destination: this comes a fair bit but I think the script solution is fine. The reason you might not want to do this at all is that you probably want to manage the process of creating each of the sub-indexes so you have progress and an easy way to pick up where you left off and things like that. If you do want to do it you can use the script. It tested on every build so it is going to work.

@nik9000
Copy link
Member

nik9000 commented Dec 8, 2016

@jimczi and I talked - trying to reindex from remote from a node that refuses the connection indeed hangs. The reindex process is still in the tasks API and can be found with curl 'localhost:9200/_tasks?pretty&detailed&actions=*reindex'. I'll have a look at that this afternoon.

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Dec 8, 2016
If you try to close the rest client inside one of its callbacks then
it blocks itself. The thread pool switches the status to one that
requests a shutdown and then waits for the pool to shutdown. When
another thread attempts to honor the shutdown request it waits
for all the threads in the pool to finish what they are working on.
Thus thread a is waiting on thread b while thread b is waiting
on thread a. It isn't quite that simple, but it is close.

Relates to elastic#22027
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Dec 8, 2016
Improves the error message returned when looking up a task that
belongs to a node that is no longer part of the cluster. The new
error message tells the user that the node isn't part of the cluster.
This is useful because if you start a task and the node goes down
there isn't a record of the task at all. This hints to the user that
the task might have died with the node.

Relates to elastic#22027
nik9000 added a commit that referenced this issue Dec 9, 2016
If you try to close the rest client inside one of its callbacks then
it blocks itself. The thread pool switches the status to one that
requests a shutdown and then waits for the pool to shutdown. When
another thread attempts to honor the shutdown request it waits
for all the threads in the pool to finish what they are working on.
Thus thread a is waiting on thread b while thread b is waiting
on thread a. It isn't quite that simple, but it is close.

Relates to #22027
nik9000 added a commit that referenced this issue Dec 9, 2016
If you try to close the rest client inside one of its callbacks then
it blocks itself. The thread pool switches the status to one that
requests a shutdown and then waits for the pool to shutdown. When
another thread attempts to honor the shutdown request it waits
for all the threads in the pool to finish what they are working on.
Thus thread a is waiting on thread b while thread b is waiting
on thread a. It isn't quite that simple, but it is close.

Relates to #22027
nik9000 added a commit that referenced this issue Dec 9, 2016
If you try to close the rest client inside one of its callbacks then
it blocks itself. The thread pool switches the status to one that
requests a shutdown and then waits for the pool to shutdown. When
another thread attempts to honor the shutdown request it waits
for all the threads in the pool to finish what they are working on.
Thus thread a is waiting on thread b while thread b is waiting
on thread a. It isn't quite that simple, but it is close.

Relates to #22027
nik9000 added a commit that referenced this issue Dec 9, 2016
Improves the error message returned when looking up a task that
belongs to a node that is no longer part of the cluster. The new
error message tells the user that the node isn't part of the cluster.
This is useful because if you start a task and the node goes down
there isn't a record of the task at all. This hints to the user that
the task might have died with the node.

Relates to #22027
nik9000 added a commit that referenced this issue Dec 9, 2016
Improves the error message returned when looking up a task that
belongs to a node that is no longer part of the cluster. The new
error message tells the user that the node isn't part of the cluster.
This is useful because if you start a task and the node goes down
there isn't a record of the task at all. This hints to the user that
the task might have died with the node.

Relates to #22027
@nik9000
Copy link
Member

nik9000 commented Dec 9, 2016

@jimczi, do you know if the reindex was running on the source node? I can imagine a situation where you start a reindex, and then shoot the node that the reindex was running on before it finishes. The task get action notices that the node is no longer running and looks in the tasks index. If it doesn't find the task it then it reports that error message. And it won't find it because the reindex didn't complete before the node left. I think I need to add a better error message to that.

I merged #22062 just now to improve the error message when the node isn't part of the cluster any more.

@jimczi and I talked - trying to reindex from remote from a node that refuses the connection indeed hangs. The reindex process is still in the tasks API and can be found with curl 'localhost:9200/_tasks?pretty&detailed&actions=*reindex'. I'll have a look at that this afternoon.

I merged #22061 this morning to fix the hang.

@nik9000 nik9000 closed this as completed Dec 9, 2016
@lcawl lcawl added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.
Projects
None yet
Development

No branches or pull requests

5 participants