-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
documentation for backup and restore of Vault #5683
Comments
+1 from me. It will be helpful if there are some recommendations, success stories, etc around it. |
CoreOS has a little doc for backup vault to aws s3 bucket: https://coreos.com/tectonic/docs/latest/vault-operator/user/recovery.html |
@antcs I think even the CoreOS authors may have gotten it wrong. As stated in #7191 even if you can make an atomic snapshot of the backend Vault itself doesn't make it's changes in an atomic way in its backend. Meaning there is no way you can guarantee your backup is in a state which is consistent (and therefor usable) if Vault is running. The only way you can currently get a consistent snapshot of Vault's data is if you stop Vault, backup the backend and start Vault again. |
Backups aside, if Vault does not make transactional writes with any backend, and also does not know how to recover from an atomic point-in-time storage-level snapshot of these potentially logically incomplete writes (by applying redo/undo logs or such from the storage), does this not also mean that Vault cannot reliably recover from an abrupt instance failure in between two writes? Please tell me that is not the case ... @siepkes |
@thiloplanz I'm quite sure that's the case (in worst case). That's also why I freaked out reading the original response on Vault mailing list. |
@thiloplanz Yeah that thought occurred to me too. I'm no expert on Vault's low level storage so what follows is mostly my deduction and assumptions so I could be wrong. On the mailinglist Chris Hoffman (HashicCorp employee and Vault comitter) stated:
A quick glance at for example the PostgreSQL storage implementation shows that it exposes kind of a low level generic interface to the rest of the application. The rest of the application uses this interface to (sometimes) perform compound actions. For example call the update function 2 times to perform an operation which is functionally a single operation. This in contrast to for example an storage API which would expose high level operations and wraps the 2 updates in a single transaction or exposing a transaction API in the storage abstract itself so the caller can indicate what is a compound operation. So backend data can get corrupted during an abrupt failure like an application panic. So the only thing that could save you from a really bad day is if Vault is smart enough to recover (ie. start normally with minimal data loss) with an inconsistent (ie. corrupt) data backend. I can't really find anything that would point to such capabilities in the source (again, could be wrong). Though if this was the case it wouldn't be a problem to end up with an inconsistent backup since Vault would still be able to recover from it and the backup advice would simply be: "backup the backend with the tools provided by the backend". But thats not the case. |
Meaning that regardless of choice of storage backend, a sudden power outage at an unfortunate point in time can leave Vault in an undefined state. @chrishoffman Is this assessment correct? |
@chrishoffman I don't want to be pushy or sound alarmist (I realize you don't owe me anything) but I'm somewhat unsettled by the fact that currently I don't really see how one can make a proper backup of Vault (ie. a consistent dump while Vault is running). Automated shutdown and start of Vault seems kind of a risky operation to perform daily for backups. Could you give some feedback on this? Would love to hear it if I'm talking nonsense ;-). |
Did anyone find a working solution for creating backups? I feel very uncomfortable without backups on production 🙂 |
@pznamensky Sure -- take atomic snapshots at the storage level. Vault doesn't write everything transactionally because we can't rely on having that capability in storage, but instead we write the code such that a failure in the middle of a request can be tolerated. We do this in various ways, via how we order writes, using WALs, etc. We can always improve this, but the idea that Vault will be in some unworking undefined state if improperly shut down isn't the case, and thus atomic storage snapshots are also fine. |
@jefferai Thanks for your answer! So the definitive answer is that making an atomic snapshot of the backend is enough and Vault will work with that? I'm double checking because what your Hasicorp co-conspirator 😉 @chrishoffman says on the mailinglist seems to be contradictory to what your saying (emphasis mine):
|
I'm going to preface this post with an idea that But to me current backup situation seems extremely worrying to the point that I'm afraid to run vault in production environments Replies above stated that vault behavior when restoring from a hard crash (kill -9/power issues) is undefined even if storage backend can provide consistency guarantees (such as postgres or other dbms) |
I absolutely second everything @mouzfun said, including the preface. I'm thinking that with discrepancy in the comment above by @jefferai and the replies on the google groups, it would be best that this is simply clearly documented in official docs. to have the definite answer... nudge, nudge, pretty please Hashicorp? :-) |
Please add a backup/restore guide. I got here after I searched the documentation and didn't find a way for a backup. It would be great if such procedure is documented and battle-ready. Thank you! |
Is it possible to either get a statement from Hashicorp that the Open Source version of Hashicorp Vault cannot be backed up, or get an official documentation to backup data from it in a safe way? I think it is a show stopper issue for a lot of individuals and companies. Thank you in advance for your kind help! |
I worked out how to reinitialize raft directly, by using recovery mode - notes here. As far as I can see, just starting in recovery mode by itself is not enough. You have to generate a recovery token (which requires providing the unseal shares via the API), at which point raft sorts itself out. You don't actually need to use the recovery token itself. |
Apologies for possibly not directly related question but I was not able to find an answer neither in Vault docs nor in google search. |
I believe that's impossible, by design. The contents of the vault are encrypted with the unseal key. If you don't have the same unseal key available that it was originally encrypted with, then the data is unusable. |
to anyone using raft Integrated Storage, completely deleted vault cluster, with all pvc's and volumes, |
Can we please have |
@ArieLevs
After about 30 secs I got:
The snapshot file size is over 1GB, maybe it this could be a problem? I Already tried to raise the timeout values by setting environment variables like Does anyone has an idea what this message means? |
For those who are interested: Found out that the timeout comes from a default value which can be changed in the |
Hi folks! There is a document on learn about backup and restore for Vault: https://learn.hashicorp.com/tutorials/vault/sop-backup?in=vault/standard-procedures Is the ask here that there be a link to the tutorial on the docs site? Note that we are currently investigating ways of making the documentation easier to find and more streamlined. If there's something else that is needed, please feel free to let me know! |
Hi @hsimon-hashicorp the setting I mentioned above is due to the large file size of our Thanks for your help! |
There are learn articles already outlining how back & restores can be achieved using snapshots (the same as most apps) - including what was mentioned earlier: Simply put - if you have the correct (HCL) file with matching seal & snapshot - then it's a simple In the case of other storage backends (Consul, etc) a similar restore would be applicable with the need for matching Vault config. There's also Automated Integrated Storage Snapshots and other features in the Enterprise versions (DR / PR) which can achieve greater HA via a promotional model and not needing to resort to snapshot / restorations. In any case I believe this request should be closed. @weakcamel since you were the original requestor - can you kindly confirm if available options now address what you were originally after? PS - you also have standard orchestration / VM level options for backup / restorations too - which can also be a last resort - the same as other service deployments where a point-in-time restoration of a Vault cluster can very well boot up fine - assuming the state when it was captured was in good shape. The same thing is also applicable to snapshots too - you don't know if it's valid or usable backup unless you confirm it (SRE 101). |
@aphorise Thank you for checking!
The procedures point to the same link for me, I'm guessing the first one was meant to be https://learn.hashicorp.com/tutorials/vault/sop-backup?in=vault/standard-procedures instead? Just double checking.
Not completely, no. Two open issues and one suggestion: (1) No general SOP The procedures above are describing backups only in case of Consul or Raft backend. I appreciate those are the recommended ones, however "recommended" isn't the same as "the only supported". At the moment, 23 storage backends are listed as available in Vault documentation as so having a detailed SOP for just 2 of them but no suggestion at all about the remaining ones is IMO insufficient. After checking several of them, I didn't find any note stating that Vault doesn't support backups or restore on any of them. While the exact details of how to operate each backend may be out of scope for Vault Backup/Restore SOP, it would be very useful to have a general steps outlined in one way or the other. For example:
Which leads to the second issue below... (2) Hot vs cold backups This discussion on Vault mailing list created a big controversy which so far hasn't been addressed yet. Does Vault instance or cluster indeed need to be stopped to achieve a reliable backup or is that not necessary? (3) Cross-references - being able to find the Backup/Restore docs The tutorials for backup/restore linked above aren't easy to find. If you go to https://hashicorp.com, navigate to Vault and select Documentation - you won't be able to reach them. Search box on this page doesn't return any hits. Even if you choose Tutorials (which isn't an obvious step) and go throught the procedures available on the left side menu, there's nothing there pointing to Backup & Restore SOPs, which IMO are essential for any admin of any type of application. The only way I was able to find them here is to pub "backup" in search box and the results are somewhere there down the road. The first hit is the Backup Consul Data and State doc which isn't quite the same? It would be great if those documentation sources - especially https://www.vaultproject.io/docs - cross-referenced the backup and restore documentation so that they're easy to navigate to. |
Some really great points @weakcamel.
I believe the bigger issue is coming up with the proper size fits all in terms of SOPs there will likely be more SOPs on that learn guide. But I think only: Raft or Consul are what's officially supported as they deliver higher HA unlike others; the reset all being community driven. I believe, if presented, there would be reception to any additional material (docs or even learn guides possible) that could be listed for all others if there are any suggestions.
In my opinion - No. Application level snapshot (cold) are often taken on run-time and generally like in the case of Consul lower level file system would require released locks to be able to do anything with them. If I'm not mistaken the snapshot approach in both Raft & Consul perform some sort of sanitised bundling that includes sha256sum as well. To confirm reliability snapshot could be tested for proper boot and / startup - similar to testing common to most critical saves (DBs, etc). BTW - in the case of using file system - that's probably just for testing / demo purposes only (not production) then those backups would have to stopped Vault service before copy.
Totally agree 😃 - care to draft a mock or screen capture (dropped in here) with some arrows or depiction of where you're looking? - maybe we can reason to get it included and I'm always for better clarity and more re-enforced linking. |
IMO lack of crucial documentation is always worse than an oversized manual.
The issue is, documentation doesn't phrase it like that at all. If you go to https://www.vaultproject.io/docs/configuration/storage, it reads:
In some backends you'll see the note about support for HA, in some you won't. And Consul + Raft aren't the only ones: It's simply a bit vague. If something's not supported (e.g. because it wasn't written or adopted by Hashicorp) that's a shame but fair enough - as users, we just need a clear statement. "Only these 2/3 filesystems have an official support for backup and restore" is very diffferent from "Some backends are more robust than others".
That's very useful piece of information and IMO would be great to have that in the official docs.
That's definitely the case for in-memory backend, however filesystem has the following description:
As an officially supported backend - if only for simplest cases - it deserves a clear backup & restore documentation.
Ideally an "Admin guide", "Operator procedures" or similar could be added to include both backup, configuration, upgrade guides (maybe even installation?) A separate top-level "Backup and restore" page would be fine too. |
P.S. And to address the sentiment of others who have commented on this issue, see just the top results from a quick search:
The number of third-party methods to back up Vault is a clear indication that there's demand for a clearer, portable and more open data export/import method. |
@weakcamel - lets say if it was to go into into own new section as you're proposing - the content should really be a rehash of the SOPs excluding any of these 3rd party tools. Dont get me wrong while these tools looks may be great and you can always do your own and will always need to do your own in the case of all other storage types (excluding Consul or Raft). Maybe I'm understating it but in essence I see this is a typical save / restore operation + make sure you got the right config + unseal mechanism and or recovery keys (typical for Vault). If you have a something a bit more refined or specific in mind and feel you can already draft a PR then please do so. |
@aphorise Sorry if it wasn't clear - the last post (open source backup/restore methods) was only a side note. That said, from a user or admin perspective a simple export/import would have been ideal. If Vault provided that out of the box, we wouldn't have to worry about specific instructions for specific storage backend and all the corner cases. |
@weakcamel - While I can appreciate the convenience or easy that you are after there are I believe intentional design decisions and security factors why these things are separate that you may be over-looking. Simply put - the whole design is to ensure if storage is compromised (ie stolen or copied) then it will be of little use without access to to the seal - access to seal without configuration the same - access seal with conf without recovery keys the same, etc. What's more storage types are not likely to be interchangeable any time soon - and it's likely that may never be avaible for the same reasons that you can not have PNG interchanged to JPG without some potential for loss or other conversions issues. However there is already the migration approach in place where you can technically go from any back-end to another. The request on this issue from the onset was for backup recovery literature which is now in place by way of the aforementioned learn guide / SOPs which was not previously there: In the interest of closing this request before its 4th anniversary - my understanding at this stage is that what's pending is an explicit section for backup / restore elaboration within the general Documentation area (aka CLI docs). Technically what was requested from the onset is already delivered - however just not in the format and expected documentation areas. If that's correct then I can try to do a draft in the coming days / weeks or as I mentioned earlier if you have impressions or opinions on how it could be then please do share a textual template (by way of a PR). The Vault teams are typically very receptive to substantiated contributions / bits and I'm sure if it's submitted well with clear benefits for all then it will be accepted. |
Agreed 100%. The import/export is a whole different debate and I appreciate that something that sounds simple may in reality be far from it.
Mostly yes, except for backup & restore for the filesystem backend. Since it's also a backend officially supported by Hashicorp (even if for simplest deployments), it IMO also should have a backup/restore SOP - or a clear statement saying that backup/restore is not available on this backend. For the existing SOPs, cross-referencing them in https://www.vaultproject.io/docs and adding to a doc index on https://learn.hashicorp.com/vault would do the trick. |
Well technically snapshot save / restore is only for Integrated Storage / Raft right? - with the only other supported store by them being Consul. So actually even though they may provide input on the use of Filesystem - it's certainly not supported by them - especially as it's not intended for use in any production setting or environment and typically for nothing more than demonstration or other such exceptions (development). If I open an enterprise support ticket with them they will say the same. If you also refer to the very same Filesystem that you mentioned - it's not a HA backend - vs others that are for example and already you can envisage two major sets both with further sub-category and considerations and so a complete guide would really be impossible especially for some of those community stores that the Vault engineers have very little opinion on. Rather if the community had an opinion for example on how Filesystem should be backed up / restored then we may be able to get it included but Hashi is not likely to do that as it falls outside their development or support focus. Anyway thanks for confirming:
I hope to share something in the near future. PS - I forgot to mention there's also a preference from Hashi to opt for Integrated Storage - so for example if you look at the Consul reference architecture that also mentions to use Raft / Integrated Storage if you dont have any strong reasons to use Consul 😄 |
I'm just going to just leave this link here, let the official documentation speak for itself: |
Well durable is a bit of a subjective term there right?... if it's durable for you then why not but are file based web service durable like that in a single instance? 😃 (how many do you know?) Also support is an even more lose term there... yes code changes or improvements to filesystem they support! - but not going to production or expecting their support with millions of identities / secrets in any High Availability sense for any large enterprise setting. For some exceptional / single use sure why not use Filesystem but just cause it says it's supported there it does not make it co-equal to all others they support too. BTW the same definitions apply to In-Memory Storage Backend which they supported too but does that mean they'd support you to go to production with it? - maybe - but again I'd bet it would be more exception. What's more you'll only find two reference architectures; which is not possible with the other storage that are missing features / guarantees as I mentioned earlier. |
@aphorise Both "durable" (able to perform for a long time without loss of data / quality) and "supported" (something the producer considers valid and well within customer rights to use) have a pretty clear meaning to me. If you find them unclear, I guess you could ask the authors of the documentation?
There's another sentence in the source you're pointing to which clearly states:
Did you read the source you're referring to? I'm afraid I don't follow anymore, this is turning into a debate over semantics. Any chance you could explain your role in this project and whether you're a Hashicorp employee or not? You keep stating facts as if you were, but at the same time keep referring to Hashicorp as "them". |
Semantics is what we are after here? 😃 or am I misunderstanding? - so it's better to be clear & I'm trying to get how this request will come to a close. I'm a consumer of Vault the same as yourself I guess? I believe what you're asking for is a recovery / restoration reference to how backup & restores ought to be done right? - and I'm trying to help if you see it fit. The statement |
No, I'm after a missing piece of documentation for a supported backend. That's what I've been asking here 4 years ago and still am. |
@aphorise I have been following this issue and the endless back and forth for 4 years too. Can you please stop trying to derail the main ask of this issue and responding condescendingly?
At this point I too am curious about your role in Hashicorp. We users need to know if this response is coming from within the company or an external contributor cause that would help us in understanding the role Hashicorp is playing in deliberately not allowing users use a supported but 3rd party storage provider in production. |
Asking for clarity and offering help is not derailing - neither is it condescending to share opinions as I have been doing so far. The original ask of this issue was rather generic and simply states:
This would obviously depend on used store type which are numerous (20+?). My participation here, or lack thereof, wont materialize what's being sough after if we dont discuss some of the items that have been done so far with the screenshots which are a step closer right? @weakcamel if you are saying:
If this is with reference to Filesystem - then I personally am not sure what that guide would say? - stop Vault - copy files - then start Vault again? - other stores are similar - where you copy the contents of your stores I'm not contesting that a guide should be there but it would help to draft something around what's expected especially if there are so many interests and people in the community commonly using Filesystem. Maybe the title of issue could also be adjusted to read the same |
Hi, folks. I'm going to lock commenting down for now while we in the Community team look into this. Thanks for your patience. |
Is your feature request related to a problem? Please describe.
A very common task for any sysadmin is to automatically backup data of all applications. Same thing applies to Vault obviously (and since it's a secret management application, it's one of the critical assets). Unfortunately the only documentation for Vault's maintenance I was able to find was https://www.vaultproject.io/docs/install/index.html - installlation guide.
Backup and restore docs are IMO essential part of documentation.
Describe the solution you'd like
Ideally, I'd like to see an Administration (or Maintenance) section on https://www.vaultproject.io/docs/install/index.html which would include a manual how to (a) install (b) back up (c) restore data from backup. It should also mention which files/directories and other data should be preserved to be able to succesfully re-install Vault while preserving the data.
For example of such documentation, see https://docs.gitlab.com/omnibus/README.html
or https://www.jfrog.com/confluence/display/RTF/Managing+Backups
Describe alternatives you've considered
I've read through the docs, searched and decided to use mailing list: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/vault-tool/GDhj-KVqtHk/87iY0QwbDAAJ
It did the trick - I was answered with very helpful answers - which I believe belong to actual product documentation.
Explain any additional use-cases
I hope this issue is self-explanatory. Feel free to tell me to clarify if it's not.
Additional context
n/a
The text was updated successfully, but these errors were encountered: