Democratize shared cache implementations, redux

This report is effectively a request to revive #654, which will pave the way to allow users to provide their own `--shared` cache implementations. Right now, there is a skeleton framework to do HTTP posting, under `-fcloud`, but it is largely unused. I am essentially proposing changing that to an opt-in implementation that is provided via an API (potentially), or even in a 3rd party package.

The rationale for closing the original PR is that the needs of "anything but an HTTP implementation" for remote storage are probably not necessary, but I am not sure they are. For example, I may simply want to store all of my objects for my build system inside memcached, which has a *very* trivial wire protocol to implement. Requiring an HTTP proxy for this seems rather unnecessary for many use cases. This is a very reasonable request alone, IMO! For example, these systems provide mechanisms for authentication and authorization, which means Shake does not have to handle that, either.

Along the same lines, I do not even know *what* "an HTTP implementation" means! There are many possible authentication or storage schemes for objects over HTTP. It's very important to know what you're talking about. For instance, the Shake implementation as it stands does not support HTTPS, which would be a non-starter on its own. But are you going to support AWSv4 signatures too if I want to use S3? And GCS, etc?

The essential storage model underneath is a key value map from my understanding, and yes this is easy to expose over HTTP, but the hard parts are everything around it.

For example, at `$WORK`, we use SSO to manage all of our authentication across all systems and developer tooling in the fleet. There are no non-authenticated access points. How will Shake handle this use case? For instance, we might want to actually cache the object files from developer builds to a storage endpoint, and then download them later, say S3. In the Shake implementation currently, this can probably never work, or it would always require S3 in some particular configuration. But if I'm writing a build system at work I can take shortcuts: for example, integrating the build system with *another* third party service that can request temporary scoped tokens, etc. This simply isn't possible as it stands.

---

Here is my suggestion or sketch of what an API might look like at a high level. It should always be a user specified set of endpoints that are registered with URI handlers. For instance:

```haskell
import qualified Shake.Development.Cloud as Cloud

-- this version, using Nothing, always includes a `file://` URI, so you can test shared builds locally.
myShakeOptions = shakeOptions
  { shakeShared = Nothing
  }

-- this version specifies the default `file://` URI explicitly
-- use with: shake --shared=file://$(pwd)/build.shared
myShakeOptions = shakeOptions
  { shakeShared = Just
      [ ("file", (\fsdir -> Cloud.localFilesystemCache fsdir))
      ]
  }

-- this version specifies the default `file://` URI explicitly, and also an HTTPS URI that uses HTTP basic auth
-- use with: shake --shared=https://secret.example-corp.com/shake-cache/
-- maybe expose 'user' and 'pass' as environment variables!
myShakeOptions = shakeOptions
  { shakeShared = Just
      [ ("file",  (\fsdir -> Cloud.localFilesystemCache fsdir))
      , ("https",  (\url -> Cloud.httpBasicAuthCache (Secure 443) user pass url))
      ]
  }

-- an s3 endpoint example
-- use with: shake --shared=s3://shake-objects-bucket?region=us-east-1&endpoint=s3.us-east-1.wasabisys.com
myShakeOptions = shakeOptions
  { shakeShared = Just
      [ ("s3",  (\s3dir -> Cloud.s3Cache Authorized s3dir))
      ]
  }
```

So any build system can simply register a `<URI>://<URL>` handler function for their own URIs, and get the passed parameters. Then it can distinguish what to do. Several basic implementations (filesystem, S3, HTTP basic auth) could be included with Shake itself for ease of use.

---

I think that this *basic* API, not necessarily set in stone, should be pretty good. For instance there are other improvements you could make. The most notable would be to change this to use a `bracket_`-inspired API that provided an explicit context for the shared cache implementation to use. This would allow you to keep 1 connection open to service many cached requests that come in.

However I think this basic feature is essential in the long term, and even if the HTTP implementation plans right now come to fruition, you will probably come back to this at one point or another later on anyway!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Democratize shared cache implementations, redux #695

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Democratize shared cache implementations, redux #695

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions