Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore deployment service and management API and configuration store options #13

Closed
2 tasks
christoferlof opened this issue Mar 9, 2021 · 2 comments
Closed
2 tasks
Assignees
Labels
design Figure out how we should do something
Milestone

Comments

@christoferlof
Copy link
Contributor

christoferlof commented Mar 9, 2021

This issue is to explore how to implement three components which are interconnected namely the:

  • Deployment Service - reliable and auditable deployments of workspaces and services (aka resources)
  • Configuration Store - metadata and desired configuration for resources
  • Management API - API surface for apps and scripts to invoke the deployment service and read (and update) the configuration store.

image

High-level requirements

  • Repeatable and reliable deployment runs - Ability to re-run and retry in cases of intermittent errors etc.
  • Auditable deployment runs - Need to know who initiated a deployment run when.
  • Declarative resource definitions
  • Open for extensibility - Allow others to extend with additional service types etc.
  • Management API needs to retrieve configuration data to serve requests from callers - i.e List services in workspace 'a'
  • Handle operations across several control planes such as identity provider, resource providers, resource data plane.
  • Clean up and removal - i.e delete a workspace
  • Deployment services are managed by an Azure operations generalist - monitor health and troubleshoot deployments.
  • Also see Design Management API surface #12

Edge cases we need to cater for

  • Simultaneous updates of a shared resource - Example: Two workspace deployment runs are trying to update the same shared Firewall with their required rules
  • Hide eventual consistency of the system to the users - show the actual resource state and not the desired state. Example: resource being updated or deleted.

Implementation

  • Review past implementations
  • Create spike

Ideas

Given above requirements, Terraform and GitHub actions would probably take us a long way. Let's discuss.

@christoferlof christoferlof added this to the March 2021.1 milestone Mar 9, 2021
@christoferlof christoferlof added the design Figure out how we should do something label Mar 9, 2021
@jjcollinge
Copy link

jjcollinge commented Mar 10, 2021

A few other requirements:

  • Must support a mechanism to expose the deployment outputs back to the management API. Or atleast support a method for reading them from another source from the management API.
  • Must support a mechanism to handle secrets securely i.e. return secret references rather than the secret in plaintext.
  • Must provide a feedback mechanism for seeing the progress and status of async operations.
  • Must not drop operations - all operations should finalize (Succeed or Fail).
  • Must try to finalize to a state that is consistent with the APIs view of the world.
  • Must support operations (create, update, delete) on resources concurrently.
  • Must be resilient to component failures (inc. API throttling).
  • Must support referencing dependent resource operation output data.
  • Must scale to support *many services within a workspace.

I'll probably add a few more as I think about them.

@TessFerrandez
Copy link
Member

Decision was made that at a high level we will re-use a lot of the design from an internal engagement by the Clarke team - but remove dependencies on CAF and moving to python where possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Figure out how we should do something
Projects
None yet
Development

No branches or pull requests

4 participants