Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Upgrade fields with dot character to 2.0.0 #15122
I have a small number of fields with "." character in my 1.5 elasticsearch stack which gets data from logstash 1.4.2. I take snapshots to S3 daily.
The problem is that I can't start a 2.0.0 elasticsearch and use the snapshot to restore because it complains about "." character. I checked the fields with "." character using curl -XGET 'http://localhost:9200/_all/_mapping . I know about the logstash de-dot filter but this does not help me as it cannot go back and fix existing data.
How do I restore my snapshot? If the option is to delete the offending data then I'm ok with it. Could anyone let me know how?
Dots are no longer allowed because they introduce an ambiguity in field lookup that is impossible to work around. So your options are:
I went for the second option and deleted the data using delete by query and taken another snapshot. But the _restore still fails. I checked the _mapping and the mapping still exists for the offending fields. Is this causing the problem now for restore?
How do I get around this?
That's not an option for me. I keep 45-60 days of logs pushed in by logstash for display by Kibana. These field names appear in all of the indexes logstash-yyyy.mm.dd .
Is there a script that renames the existing fields and reindexes anywhere that I can use? I'm stuck.
The Perl and Python clients provide helpers for reindexing data, eg see:
Thanks Clinton. The python library is quite easy to install. I can't say the same about the perl library which fails to install using cpan. It complains about 0.20 Hijk. Anyway, I have worked out a way and here are my steps:
A. Open all indexes
B. Identify my dodgy fields with a "." character
C. Search (count) my records with dodgy field
D. Once we know how many records will be affected and we are OK with purging them. I'm ok with purge but others may not be. I don;t know how to rename it. Maybe you could help us rename it. Anyway, to delete those records.
E. Now the records are deleted but the mapping still exists and snapshot restore will still fail. So we need to reindex. So install python library:
I have written a script that reindexes the old index to a new name. It works on 1.5.x version. Older versions of elasticsearch may have a problem if indexes are closed as cat did not list closed indexes. For older versions, simply open all indexes using curl. First argument is elasticsearch host and the second one is the indexname that we want to reindex. The target is indexname + 'a'. It will close the old index after reindex.
This will result in logstash-2015.12.02 being closed and a new index created called logstash-2015.12.02a
F. Call the above script for each and every index that contains dodgy field. Once the reindex is complete, list the indexes:
It will show both old and new indexes (as in 1.5.0). For older elasticsearch, all indexes will have to be opened with a wildcard in a curl command.
G. Once satisfied, delete the old indices (ones not ending in 'a')
H. Once the new indices are created and old ones deleted, take the snapshot using the snapshot API. Take it over to the new 2.0.0 cluster and it should import as it does not have the fields with "." character.
Hope this helps someone. Ideally, a migration script should've been provided. Reindex is a slow process. I have about 45 days worth of logs (each 1 GB) - it will take a couple of overnight jobs to do it.
For new events, de-dot filter on logstash does the job fine on the fly renaming dodgy fiedls to underscores.
For rename - I don't know how to do it. My process is based upon deleting the offending records.