Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gzip package documentation when sending to Pursuit #642

Closed
thomashoneyman opened this issue Jul 31, 2023 · 12 comments · Fixed by #659
Closed

Gzip package documentation when sending to Pursuit #642

thomashoneyman opened this issue Jul 31, 2023 · 12 comments · Fixed by #659
Labels
alpha bug Something isn't working good first issue Good for newcomers

Comments

@thomashoneyman
Copy link
Member

The registry publishes packages to Pursuit by sending the JSON output of the compiler's purs publish command in the body of a POST request:

-- | Handle Pursuit by executing HTTP requests using the provided auth token.
handleAff :: forall r a. GitHubToken -> Pursuit a -> Run (LOG + AFF + r) a
handleAff (GitHubToken token) = case _ of
Publish payload reply -> do
Log.debug "Pushing to Pursuit..."
result <- Run.liftAff $ withRetryRequest'
{ content: Just $ RequestBody.json payload
, headers:
[ RequestHeader.Accept MediaType.applicationJSON
, RequestHeader.RequestHeader "Authorization" ("token " <> token)
]
, method: Left Method.POST
, username: Nothing
, withCredentials: false
, password: Nothing
, responseFormat: ResponseFormat.string
, timeout: Nothing
, url: "https://pursuit.purescript.org/packages"
}

We're consistently seeing 502 error codes from Pursuit for some packages on upload. These are really 413 Content Too Large errors (they're getting garbled on their way through nginx). Pursuit won't accept package uploads larger than 1.5mb. See, for example:

However, I've noticed that some packages fail when being published via the registry, but succeed when published via Pulp. The difference: Pulp sends a gzipped response to Pursuit, and we don't.

First Pulp gzips the JSON from purs publish

https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L68

https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L170-L171

Then it passes that to the publish docs function:

https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L85

https://github.com/purescript-contrib/pulp/blob/78f7aa6ae76337ea6f8362cac03fb1c8ee1858cd/src/Pulp/Publish.purs#L326-L327

In contrast, we're just sending the JSON directly. We should gzip the JSON as well.

Implementation

We have a few options for the implementation. To actually gzip the data we could:

  1. Adapt the Pulp implementation
    This probably isn't our best bet because Pulp relies on a number of NPM libraries we don't want to bring in (concat-stream, etc.). I'd like to avoid pulling new JS dependencies unless absolutely necessary. But we can at least take a look at what they're doing.

  2. Use purescript-node-zlib by @JordanMartinez for the createGzip function, or write our own pure JS gzip function using the underlying Node code.
    This is a recent library Jordan published which gives us access to the zlib Node module (same as is used by Pulp) and has functions like createGzip already available. We could potentially write some glue around this to gzip the input JSON. Or we could write a thin FFI around createGzip that just gzips an input string.

  3. Use the gzip CLI
    We have access to the gzip CLI via Nix; this is already used, for example, in the implementation of the Registry.App.CLI.Tar module. We could potentially introduce a Registry.App.CLI.Gzip module that exposes a gzipCLI function and a gzip :: String -> Aff String Buffer function that uses gzip with the "-" argument to read/write over stdin/stdout.

Once we have the gzipped data we can send it with an appropriate Content-Encoding header in the Affjax request. (Presumably we'd send it as a RequestBody.string or RequestBody.blob — not sure which we should use.)

@thomashoneyman thomashoneyman added bug Something isn't working good first issue Good for newcomers alpha labels Jul 31, 2023
@thomashoneyman
Copy link
Member Author

@f-f feel free to chime in if you have preferences around how this gets implemented

@f-f
Copy link
Member

f-f commented Jul 31, 2023

Any of these sound great! I'm not too concerned about this since it's hopefully a quite temporary solution until we integrate Pursuit

@CharlesTaylor7
Copy link
Contributor

I volunteer to work on this!
Looks like there's enough context in the issue body to get started, but I'll ask questions if I get stuck.

@thomashoneyman
Copy link
Member Author

thomashoneyman commented Aug 23, 2023

Thanks! Hopefully we can write a thin FFI implementation via native Node functions and just use that — you could write the bindings in the foreign workspace and the relevant tests there as well — to avoid bringing in more dependencies.

I think you'll want to wait on #643 before actually using gzip in the main application. But we could get the binding there in the meantime.

@CharlesTaylor7
Copy link
Contributor

@thomashoneyman
Thanks for the context.
In that I case I'll just start on #643 instead.

@thomashoneyman
Copy link
Member Author

This is now unblocked!

@CharlesTaylor7
Copy link
Contributor

@thomashoneyman
Hey, I initially volunteered on the assumption I'd be plugging in Jordan's existing gzip library. If this involves writing my own lighter weight gzip function + testing it, that's a bit outside my comfort zone.

I'm going to withdraw and let someone else tackle this.

@flip111
Copy link
Contributor

flip111 commented Sep 2, 2023

Hi @CharlesTaylor7 the library seemed to have moved perhaps you can use this function https://pursuit.purescript.org/packages/purescript-node-zlib/0.4.0/docs/Node.Zlib#v:createGzip

@pete-murphy
Copy link
Contributor

Hopefully we can write a thin FFI implementation via native Node functions and just use that — you could write the bindings in the foreign workspace and the relevant tests there as well — to avoid bringing in more dependencies.

I'm going to withdraw and let someone else tackle this.

Hey, I could pick this up if it's still relevant!

I'm just familiarizing myself with the node:zlib library, but it seems like we could just FFI zlib.gzip instead of zlib.createGzip which might simplify things a little bit. If I'm understanding correctly, I think createGzip would be useful if we didn't have the payload in memory or wanted to send a streaming request body over the network, but those don't seem to be requirements here. zlib.gzip on the other hand seems like it would be for just gzip-ing an in-memory string into a buffer asynchronously (so gzip :: String -> Aff Buffer could be the signature of our FFI wrapper). Let me know if I'm mistaken!

@thomashoneyman
Copy link
Member Author

That sounds about right to me! I have little experience with it, but from some light reading I think we can just gzip a string to a buffer and send that along.

@thomashoneyman
Copy link
Member Author

I would be happy to test some payloads sent to Pursuit as part of reviewing your work

@pete-murphy
Copy link
Contributor

Opened a PR here #659. It's made a bit messy by the fact that Buffer can't be passed directly as fetch body, open to suggestions about a better approach there 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alpha bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants