Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ElasticSearch Backup #15

Closed
toxisch opened this issue Nov 25, 2019 · 14 comments
Closed

[Bug] ElasticSearch Backup #15

toxisch opened this issue Nov 25, 2019 · 14 comments

Comments

@toxisch
Copy link

toxisch commented Nov 25, 2019

On large DBs the backup does not work - where large is relative. Sometimes a little data + system data is enough to allow the backup to fail.

I also played with the sources and limited the ElasticDump script to single indexes. In this case the backup works fine. Therefore I assume that it is due to the size of the backup.

It is also no timeout issue! I did my tests with 10h timeout.

@toxisch toxisch changed the title ElasticSearch Backup [Bug] ElasticSearch Backup Nov 25, 2019
@JamesClonk
Copy link
Member

@toxisch I what way exactly does the backup fail? Do you have any error messages in the app log? Does the app run out of memory perhaps? Does it work if you run elasticdump yourself manually with your DB?

@toxisch
Copy link
Author

toxisch commented Dec 20, 2019

Hi @JamesClonk, sorry for the long response time. There is no difference between automatic and manual backup. Also no mem problem. Here is an ElasticSearch backup log. It ends with a termination in the S3 service. But S3 works fine. Mongo and Maria backups work on this system without problems.

Dec 19, 2019 @ 16:15:09.812 level=error msg="could not upload service backup [analytics-els] to S3: Put https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: context canceled"
Dec 19, 2019 @ 16:15:09.778 level=error
Dec 19, 2019 @ 16:15:09.778 level=error msg="requested backup for service [analytics-els] failed: elasticdump: signal: killed"
Dec 19, 2019 @ 16:15:09.778 level=error msg="could not backup service [analytics-els]: elasticdump: signal: killed"
Dec 19, 2019 @ 15:27:48.878 level=debug msg="upload S3 object [elasticsearch/analytics-els/analytics-els_20191219152748.gz]"
Dec 19, 2019 @ 15:27:48.877 level=debug msg="executing elasticsearch backup command: elasticdump --quiet --input=https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --output=$"

@pvolkemer
Copy link

pvolkemer commented Mar 5, 2020

Hi guys, I'm having the same issue trying to back up my elastic instances with backman to an S3 storage.
The log says this:

2020-02-11T13:54:41.31+0100 [APP/PROC/WEB/0] OUT level=debug msg="executing elasticsearch backup command: elasticdump --quiet --input=https://full-access-<Username>:<Password>@<ElasticID>.<mydomain> --output=$"
2020-02-11T13:54:41.31+0100 [APP/PROC/WEB/0] OUT level=debug msg="upload S3 object [elasticsearch/elasticsearch-dev/elasticsearch-dev_20200211135441.gz]"

But the upload to S3 never completes nor is anything stored there at all.

@eatsan
Copy link

eatsan commented May 4, 2020

Hi everyone,
I am having a similar issue with the elasticsearch backups with backman. In my case, I found out that there are (by default) 7 system indices called ".monitoring-es-7-YYYY.MM.DD" that is used by the ES "Stack monitoring" feature (https://www.elastic.co/guide/en/kibana/current/xpack-monitoring.html), which is part of X-Pack. Each of these 7 indices were relatively large in my case (each 1M docs count, 1.1 GB in size) and elasticdump was taking quite sometime to read & gzip these. On the other side, the CPU usage was around 4% and memory usage was relatively low (~100 MB). So, I am planning to test with --concurrency & --limit options of the elasticdump executable to see how much benefit it brings.

Are you also using a ES with Stack monitoring feature? Can this issue be affecting you as well?

@pvolkemer
Copy link

To me it seems elasticdump is never even started or ends immediately.
From my 2 log lines above you can see that "executing elasticsearch backup command:" and "upload s3 object" happen at the exact same time.
@JamesClonk is there anything we/you can do about this?
Also, my S3 instances gets the servicename dynstrg-2 so I need to configure this in my backman config.

@toxisch
Copy link
Author

toxisch commented Jul 15, 2020

Hi @pvolkemer , for me the problem was solved by using the latest backman version.

@pvolkemer
Copy link

@toxisch Your Problem was different from mine. In my case, elasticdump doesn't seem to do anything so there is nothing that can be uploaded to S3.
In your case elasticdump seemed to create some file but upload to S3 failed.
I tried with 1.15.0 but still doesn't do anything.

@somehowchris
Copy link

So I do not really need backman to backup an ES instance since our parser vector (because logstash is a waste for parsing logs in my opinion) can also push logs to other destinations.

Now because some guys chose a too big instance I was forced to have a downgrad which would mean 1. backup, 2. recreate the service and 3. replay the backup. But it seems like that service never got backuped. There seems to be no config for the ES instance but backman should give it a default cron schedule.

Did that ever work? @JamesClonk

@akovov
Copy link

akovov commented Apr 19, 2021

Constantly getting this error
level=error msg="could not backup service [*********]: elasticdump: timeout: context deadline exceeded"
currently changed timeout to 7 days, but could be bakman improved in order to give ability to chose complete backup or indicies starting from some mask, it should be not so hard as elasticdump behind it support this ?

@akovov
Copy link

akovov commented Apr 22, 2021

failed before 7 days timeout reached @JamesClonk any suggestions regarding when it could be fixed?

@JamesClonk
Copy link
Member

I've created a new release https://github.com/swisscom/backman/releases/tag/v1.28.0 that adds a direct_s3 configuration option for Elasticsearch backups. This will make elasticdump directly stream from/to S3 itself instead of going through backman internally. Maybe this helps solve the problem, you could try enabling it in your configs.

Unfortunately I do not use Elasticsearch myself and there's also no integration tests, etc. currently for it in the CI workflow. I can't test and support it if it does not work.

@akovov
Copy link

akovov commented May 2, 2021

I checked and for me worked as expected. Didn't checked yet on big volumes, but plan to do so in 1 of June

@akovov
Copy link

akovov commented Nov 2, 2021

Works good for the last half year

@JamesClonk
Copy link
Member

thanks 👍️

michaelbeutler pushed a commit to michaelbeutler/backman that referenced this issue Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants