[Feature Request] Add a river to ElasticSearch instance #1077

Closed
dadoonet opened this Issue Jun 30, 2011 · 53 comments

Comments

Projects
None yet
@dadoonet
Member

dadoonet commented Jun 30, 2011

As discussed in the mailing list : http://elasticsearch-users.115913.n3.nabble.com/How-to-reindex-an-ES-index-tp3089964p3089964.html

It would be nice to be able to reindex data from an ES instance using the _source field of previously stored documents.

With it, we could :

  • Modify the mapping and ask for reindexing (even in the same cluster) documents stored in oldindex to a newindex index. The new mapping will be defined in newindex.
  • Migrate easily from an ES version to another if needed
  • Do many cool things that I can't imagine right now ;-)

Thanks

@otisg

This comment has been minimized.

Show comment Hide comment
@otisg

otisg Jul 1, 2011

+1

otisg commented Jul 1, 2011

+1

@apinkin

This comment has been minimized.

Show comment Hide comment
@apinkin

apinkin Jul 9, 2011

+1

apinkin commented Jul 9, 2011

+1

@Vineeth-Mohan

This comment has been minimized.

Show comment Hide comment
@Vineeth-Mohan

Vineeth-Mohan Aug 13, 2011

Similar feature request - elasticsearch#1242

Similar feature request - elasticsearch#1242

@gsf

This comment has been minimized.

Show comment Hide comment
@gsf

gsf Sep 23, 2011

+1

gsf commented Sep 23, 2011

+1

@neogenix

This comment has been minimized.

Show comment Hide comment
@neogenix

neogenix Oct 5, 2011

+1

neogenix commented Oct 5, 2011

+1

@dadoonet

This comment has been minimized.

Show comment Hide comment
@dadoonet

dadoonet Oct 5, 2011

Member

Hi Shay,

What would you prefer for this plugin ? Would like it to be part of elasticsearch project or outside ?
Thanks

Member

dadoonet commented Oct 5, 2011

Hi Shay,

What would you prefer for this plugin ? Would like it to be part of elasticsearch project or outside ?
Thanks

@dadoonet dadoonet closed this Oct 5, 2011

@dadoonet dadoonet reopened this Oct 5, 2011

@dadoonet

This comment has been minimized.

Show comment Hide comment
@dadoonet

dadoonet Oct 5, 2011

Member

Sorry closed by error...

Member

dadoonet commented Oct 5, 2011

Sorry closed by error...

@Vineeth-Mohan

This comment has been minimized.

Show comment Hide comment
@Vineeth-Mohan

Vineeth-Mohan Oct 5, 2011

As talked in the one of the related thread ,

There should be some version system for an entire index which i believe its not implemented in ES.

Reference - elasticsearch#1242

As talked in the one of the related thread ,

There should be some version system for an entire index which i believe its not implemented in ES.

Reference - elasticsearch#1242

@Yegoroff

This comment has been minimized.

Show comment Hide comment
@Yegoroff

Yegoroff Oct 18, 2011

+1

+1

@HugoMag

This comment has been minimized.

Show comment Hide comment
@HugoMag

HugoMag Oct 21, 2011

+1

HugoMag commented Oct 21, 2011

+1

@mikegrassotti

This comment has been minimized.

Show comment Hide comment
@mikegrassotti

mikegrassotti Oct 25, 2011

+1

@outoftime

This comment has been minimized.

Show comment Hide comment
@outoftime

outoftime Oct 26, 2011

I've got a need for this as well -- I think to keep things simple I'm going to try to implement a plugin that provides API endpoints for the following:

  • Copy one document from one index to another
  • Copy the entire contents from one index to another

Presumably the latter call, by default, should not be blocking.

As far as re-mapping, etc., I think it's reasonable enough to still deal with that separately in the client. The main motivation I've got for wanting a full-index-copy is to get rid of the HTTP overhead of transmitting documents to/from a client just to move them from one index to another.

I've got a need for this as well -- I think to keep things simple I'm going to try to implement a plugin that provides API endpoints for the following:

  • Copy one document from one index to another
  • Copy the entire contents from one index to another

Presumably the latter call, by default, should not be blocking.

As far as re-mapping, etc., I think it's reasonable enough to still deal with that separately in the client. The main motivation I've got for wanting a full-index-copy is to get rid of the HTTP overhead of transmitting documents to/from a client just to move them from one index to another.

@Vineeth-Mohan

This comment has been minimized.

Show comment Hide comment
@Vineeth-Mohan

Vineeth-Mohan Oct 27, 2011

The issue to this solution as talked by Shay in a thread is that - if one of the endpoint goes down , and it want to resume from its last stopped point , how will it do that ?
Suggested solution was to have a index versioning and we can request changes since that version number. For this versioning have to be implemented at ES side.

The issue to this solution as talked by Shay in a thread is that - if one of the endpoint goes down , and it want to resume from its last stopped point , how will it do that ?
Suggested solution was to have a index versioning and we can request changes since that version number. For this versioning have to be implemented at ES side.

@dadoonet

This comment has been minimized.

Show comment Hide comment
@dadoonet

dadoonet Oct 27, 2011

Member

100% agree with Vineeth-Mohan. So I have to wait to issue #1242 before writing the river.
That's make sense and that's why I only start to create a java batch to pull and push a full index content to another one.

I will be able to move it to a river when the "change like" API will be in ES.

Member

dadoonet commented Oct 27, 2011

100% agree with Vineeth-Mohan. So I have to wait to issue #1242 before writing the river.
That's make sense and that's why I only start to create a java batch to pull and push a full index content to another one.

I will be able to move it to a river when the "change like" API will be in ES.

@outoftime

This comment has been minimized.

Show comment Hide comment
@outoftime

outoftime Oct 27, 2011

Ah -- having something basic working would be very useful for me even if it didn't have the capability to resume after a node failure, so I'll probably pursue that as a plugin.

Ah -- having something basic working would be very useful for me even if it didn't have the capability to resume after a node failure, so I'll probably pursue that as a plugin.

@drawks

This comment has been minimized.

Show comment Hide comment
@drawks

drawks Dec 27, 2011

+1 being able to apply new mappings without having to write a client and implement "schema versioning" in one of the various language api interfaces would be a big win operationally. Doing things like changing the type of a field on an index already on disk should be a single call affair that is handled without having to fetch the _source for every document and then sending it back to an index call.

drawks commented Dec 27, 2011

+1 being able to apply new mappings without having to write a client and implement "schema versioning" in one of the various language api interfaces would be a big win operationally. Doing things like changing the type of a field on an index already on disk should be a single call affair that is handled without having to fetch the _source for every document and then sending it back to an index call.

@TeRq

This comment has been minimized.

Show comment Hide comment
@TeRq

TeRq Mar 30, 2012

+1

TeRq commented Mar 30, 2012

+1

@emilis

This comment has been minimized.

Show comment Hide comment
@emilis

emilis Apr 4, 2012

+1

emilis commented Apr 4, 2012

+1

@llonchj

This comment has been minimized.

Show comment Hide comment
@llonchj

llonchj Apr 12, 2012

+1

llonchj commented Apr 12, 2012

+1

@benmccann

This comment has been minimized.

Show comment Hide comment
@benmccann

benmccann Apr 15, 2012

Contributor

+1

Contributor

benmccann commented Apr 15, 2012

+1

@darklow

This comment has been minimized.

Show comment Hide comment
@darklow

darklow Apr 24, 2012

+1

darklow commented Apr 24, 2012

+1

@piskvorky

This comment has been minimized.

Show comment Hide comment
@piskvorky

piskvorky Apr 27, 2012

+1

+1

@niuage

This comment has been minimized.

Show comment Hide comment
@niuage

niuage Apr 30, 2012

+1

niuage commented Apr 30, 2012

+1

@radostyle

This comment has been minimized.

Show comment Hide comment
@radostyle

radostyle May 31, 2012

+1

+1

@occ

This comment has been minimized.

Show comment Hide comment
@occ

occ Jul 5, 2012

+1

occ commented Jul 5, 2012

+1

@atkinson

This comment has been minimized.

Show comment Hide comment
@atkinson

atkinson Jul 19, 2012

+1

I'd like to push changes out via sockjs, so an evented API, or even a RESTful webhook would be awesome

+1

I'd like to push changes out via sockjs, so an evented API, or even a RESTful webhook would be awesome

@davekhor

This comment has been minimized.

Show comment Hide comment
@davekhor

davekhor Sep 28, 2012

+1

My need is simpler - I was hoping to use ES-to-ES river as a 'tee' (for those who still remember Unix :))

+1

My need is simpler - I was hoping to use ES-to-ES river as a 'tee' (for those who still remember Unix :))

@anatolyg

This comment has been minimized.

Show comment Hide comment
@anatolyg

anatolyg Sep 28, 2012

+1

I'd like to use this as a feeder for a backup elastic search instance

+1

I'd like to use this as a feeder for a backup elastic search instance

@JohnnyMarnell

This comment has been minimized.

Show comment Hide comment
@JohnnyMarnell

JohnnyMarnell Oct 5, 2012

Contributor

+1

This would be really useful for tweaking mappings and queries, especially in early stages of development

Contributor

JohnnyMarnell commented Oct 5, 2012

+1

This would be really useful for tweaking mappings and queries, especially in early stages of development

@jantoniucci

This comment has been minimized.

Show comment Hide comment
@jantoniucci

jantoniucci Oct 29, 2012

+1

+1

@deepeye

This comment has been minimized.

Show comment Hide comment
@deepeye

deepeye Oct 30, 2012

+1

deepeye commented Oct 30, 2012

+1

@paivaric

This comment has been minimized.

Show comment Hide comment
@paivaric

paivaric Nov 14, 2012

+1

+1

@spodgurskiy

This comment has been minimized.

Show comment Hide comment
@spodgurskiy

spodgurskiy Nov 26, 2012

+1

+1

@dadoonet

This comment has been minimized.

Show comment Hide comment
@dadoonet

dadoonet Nov 27, 2012

Member

@karussel Just created a plugin that may help. See: https://github.com/karussell/elasticsearch-reindex

Member

dadoonet commented Nov 27, 2012

@karussel Just created a plugin that may help. See: https://github.com/karussell/elasticsearch-reindex

@agindre

This comment has been minimized.

Show comment Hide comment
@agindre

agindre Dec 11, 2012

+1

agindre commented Dec 11, 2012

+1

@sgzijl

This comment has been minimized.

Show comment Hide comment
@sgzijl

sgzijl Feb 22, 2013

+1

sgzijl commented Feb 22, 2013

+1

@justinfx

This comment has been minimized.

Show comment Hide comment
@justinfx

justinfx Mar 3, 2013

+1

justinfx commented Mar 3, 2013

+1

@pgaertig

This comment has been minimized.

Show comment Hide comment
@pgaertig

pgaertig Mar 11, 2013

+1. At the moment this one works for me: https://github.com/geronime/es-reindex. Implemented in Ruby, it has nice progress status with ETA and Ctrl+C.

+1. At the moment this one works for me: https://github.com/geronime/es-reindex. Implemented in Ruby, it has nice progress status with ETA and Ctrl+C.

@clintongormley

This comment has been minimized.

Show comment Hide comment
@clintongormley

clintongormley Apr 5, 2013

Owner

Closed in favour of #1242

Owner

clintongormley commented Apr 5, 2013

Closed in favour of #1242

@algesten

This comment has been minimized.

Show comment Hide comment
@algesten

algesten Apr 30, 2013

I must be thick. How does a change notification feature solve the reindex-because-i-changed-my-mapping problem?

I must be thick. How does a change notification feature solve the reindex-because-i-changed-my-mapping problem?

@inspire22

This comment has been minimized.

Show comment Hide comment
@inspire22

inspire22 Jul 25, 2013

Seems like the issue referenced when closing isn't the correct one. Is there a better (faster?) solution that es-reindex for this now? Seems stupid to download and re-upload the data if it could all be copied within elasicsearch

Seems like the issue referenced when closing isn't the correct one. Is there a better (faster?) solution that es-reindex for this now? Seems stupid to download and re-upload the data if it could all be copied within elasicsearch

@pgaertig

This comment has been minimized.

Show comment Hide comment
@pgaertig

pgaertig Jul 25, 2013

@inspire22 @algesten I think the problem we have deserves separate feature request. #514 seems to be good candidate for one-shot reindex. I use es-reindex mostly to do cross-index migrations on one cluster. As you mentioned this has poor performance. This is because data is serialized to JSON, retrieved via HTTP, collected into bulk package and then sent via HTTP and deserialized from JSON to be finally reindexed.

I consider reimplementing it in Java as native ES client-node to avoid all this serialization+HTTP roundtrip. However "copy"/"reindex"/"move" bulk command would be appreciated. That could work even as one bulk entry per document.

@inspire22 @algesten I think the problem we have deserves separate feature request. #514 seems to be good candidate for one-shot reindex. I use es-reindex mostly to do cross-index migrations on one cluster. As you mentioned this has poor performance. This is because data is serialized to JSON, retrieved via HTTP, collected into bulk package and then sent via HTTP and deserialized from JSON to be finally reindexed.

I consider reimplementing it in Java as native ES client-node to avoid all this serialization+HTTP roundtrip. However "copy"/"reindex"/"move" bulk command would be appreciated. That could work even as one bulk entry per document.

@Tobion

This comment has been minimized.

Show comment Hide comment
@Tobion

Tobion Oct 4, 2013

This is needed. The referenced issue has nothing to do with it. So should it be reopened.

Tobion commented Oct 4, 2013

This is needed. The referenced issue has nothing to do with it. So should it be reopened.

@bcackerman

This comment has been minimized.

Show comment Hide comment
@bcackerman

bcackerman Dec 8, 2014

+1

+1

@gouketsu

This comment has been minimized.

Show comment Hide comment
@gouketsu

gouketsu Jan 12, 2015

+1

+1

@dibeesh

This comment has been minimized.

Show comment Hide comment
@dibeesh

dibeesh Mar 5, 2015

+1

dibeesh commented Mar 5, 2015

+1

@mehmetozerr

This comment has been minimized.

Show comment Hide comment
@mehmetozerr

mehmetozerr Jun 9, 2015

+1

+1

@dadoonet

This comment has been minimized.

Show comment Hide comment
@dadoonet

dadoonet Jun 9, 2015

Member

For people who are interested in reindexing elasticsearch, I wrote some time ago a blog post about it on how to use LS for this purpose. I hope this could help people who are reaching this issue: http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/

Member

dadoonet commented Jun 9, 2015

For people who are interested in reindexing elasticsearch, I wrote some time ago a blog post about it on how to use LS for this purpose. I hope this could help people who are reaching this issue: http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/

@lukas-lansky

This comment has been minimized.

Show comment Hide comment
@lukas-lansky

lukas-lansky Sep 17, 2015

+1

+1

@eamocanu

This comment has been minimized.

Show comment Hide comment
@eamocanu

eamocanu Nov 26, 2015

+1

+1

@lizhenmxcz

This comment has been minimized.

Show comment Hide comment
@lizhenmxcz

lizhenmxcz Mar 7, 2016

+1

+1

@carltierney

This comment has been minimized.

Show comment Hide comment
@carltierney

carltierney Mar 7, 2016

+1 Seriously after the mapping disaster of the 1.x to the 2.1 upgrade this is why we need this feature. I have 200GB of indexes I now have to write code to migrate to new maps to address the fact I can't upgrade from 1.7 to 2.1.

+1 Seriously after the mapping disaster of the 1.x to the 2.1 upgrade this is why we need this feature. I have 200GB of indexes I now have to write code to migrate to new maps to address the fact I can't upgrade from 1.7 to 2.1.

@brusic

This comment has been minimized.

Show comment Hide comment
@brusic

brusic Mar 7, 2016

Contributor

This issue has been closed for almost 3 years. If you need to reindex, there is a forthcoming API: #15201

Contributor

brusic commented Mar 7, 2016

This issue has been closed for almost 3 years. If you need to reindex, there is a forthcoming API: #15201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment