Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchestration-only server #1628

Closed
dhopfm opened this issue Dec 28, 2021 · 2 comments
Closed

Orchestration-only server #1628

dhopfm opened this issue Dec 28, 2021 · 2 comments
Labels

Comments

@dhopfm
Copy link

dhopfm commented Dec 28, 2021

This is a collection of ideas that emerged from a recent discussion on Slack on how the existing Kopia Repository Server could be enhanced/changed. These ideas go well beyong a couple of added knobs here and there, rather proposing a new architecture.

Status quo

The existing Kopia Repository Server performs encryption on behalf of clients and thus has access to the unencrypted data. Furthermore, it holds the policy which clients apply (the policy is stored in the repository).

Proposal

Keep server-managed policies (the orchestration piece) but leverage client-side encryption (optionally using a private encryption key). Furthermore, facilitate uploading to a storage backend different from the orchestration instance to improve scalability. Use orchestration to glue these components together in a secure and user-friendly fashion.

Motivation

Privacy

The existing architecture (with server-side encryption) is suitable for corporate environments (where the company generally owns client data and there's little expectation of privacy). It's less ideal in environments where private data are involved, as it is the case with typical friends & family setups. In such cases, it's more feasible for every user/client to use their own keys.

Policy management/orchestration

The existing central policy management is a great achievement for setups where clients are remotely managed. Some ideas for expansion:

  • Allow the orchestration piece to exist independent of the data storage location (see below).
  • Allow for easier provisioning. Users don't necessarily need to deal with user/password combinations, a one-time registration token could be a convenient alternative.
  • Include additional config items such as the upgrade policy and storage backend configuration.
  • Allow remote configuration of encryption keys (corporate scenario) or local-only definition (private scenario). Consider approaches such as HashiCorp Vault (which is a solution for secure credential management).
  • Extend policy management to the storage backend where ACLs can be set automatically to provide secure, pre-configured access. See below for details.

Data storage

As of today, the storage backend is coupled to policy management. Decoupling the two could provide several benefits:

  • The orchestrator could only be involved with orchestration and doesn't become a bottleneck through which all clients have to funnel their data.
  • Client-side encryption not only improves privacy but also leverages existing computational power (hashing, encryption) instead of letting one central server perform these tasks.
  • Making the storage backend configurable would alleviate the burden to implement the whole storage backend (again) when solutions such as minio exist which already provide a sophisticated backend solution.
  • A configurable storage backend would also allow to steer clients to different backends, depending on policy.
  • Allowing clients to communicate with the storage backend directly would remove bottlenecks/SPOFs and generally increase available bandwidth.
  • By leveraging feature-rich backend ACLs (such as S3 ACLs), clients can be provided with direct storage access while still making sure that repositories can be shared between multiple users (deduplication) without risking that one client can corrupt the data of a different client. The idea here is, that the orchestrator automatically configures both the client and the storage backend including setting appropriate ACLs.
  • With client-side encryption, deduplication hinges on all clients applying the exact same splitting, hashing, encryption (convergent encryption). This should be much easier to achieve if all clients are managed centrally (incl. minimum software version, etc.). Even in cases where this isn't possible, an admin might be happy to trade a small amount of lost deduplication (between unrelated users) for greatly improved privacy (client-side encryption).

Looking forward to your feedback/ideas!

@jkowalski
Copy link
Contributor

Thanks for the write up. This looks very interesting. I think there are big challenges to solve with key management here, but it certainly could be done.

@kyzyl
Copy link

kyzyl commented Feb 22, 2023

I'm curious if there are any more recent thoughts on this topic? Kopia is wonderful software, and the current kopia server implementation is very useful in a corporate / smb type scenario, as clients don't need to be trusted with storage credentials. However as OP points out, the current arrangement results in the kopia server node having root access to all snapshots from all clients. In my in my mind, this has two major implications:

  1. As outlined above, the privacy implications. Even if corporate scenarios don't always have an expectation of privacy for the clients, the clients could all be encrypted with keys managed by the org. This way the privacy situation is identical, if not less susceptible to snooping, but the data is still secured at all times once it leaves the clients.
  2. The kopia server nodes become a high-value target. Compromising that node leads to effectively MITMing all clients in the org in plaintext.

Beyond key management issues, there are also obviously implications for cross-client deduplication. Short of some very fancy encryption schemes, if the management node can't read all the clients' data, it can't deduplicate (although I am naive on this topic...). But perhaps that is a choice that some might be okay with, in exchange for knowing that the fallout from a compromised system is limited?

In my case I mitigate this by hosting the grpc server (--no-ui) separately from the web server, IP whitelisting the latter, and putting them both behind a hardened proxy like nginx. This way the management nodes are not exposed directly, and the UI is not exposed publicly at all. But this is hardly a perfect solution.

@github-actions github-actions bot added the stale label Jun 4, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants