Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cache and registry in combination makes fetch unstable #23

Closed
pgsandstrom opened this issue Feb 17, 2020 · 12 comments · May be fixed by nghia23d/tattooz#1
Closed

[BUG] cache and registry in combination makes fetch unstable #23

pgsandstrom opened this issue Feb 17, 2020 · 12 comments · May be fixed by nghia23d/tattooz#1

Comments

@pgsandstrom
Copy link

@pgsandstrom pgsandstrom commented Feb 17, 2020

What / Why

Whenever I have a cache and registry specified in the conf-object, some fetches fail. Its not 100%, but somewhere around 50% of the fetches fail. It only happens if I have both a cache and a registry. I have removed all other configurations.

How

My conf object:
{cache: '/tmp/new-cache', registry: 'my-company-internal-registry'}

My usage of npm-registry-fetch:
const json = (await npmRegistryFetch.json(dependencyName, conf))

Current Behavior

Sometimes I receive this response:
{"code":"FETCH_ERROR","errno":"FETCH_ERROR","type":"invalid-json"}

It never happens unless both registry and cache has been specified. In my testing I have done thousands of fetches and it is very reliable.

I'd be happy to supply more information if needed, or perform testing if it can help.

@isaacs
Copy link
Contributor

@isaacs isaacs commented Feb 18, 2020

That's really weird that specifying both would cause troubles, because the registry gets a default value anyway.

What's my-company-internal-registry? Maybe it's doing something unique that's messing with the cache in a way that the public registry doesn't?

@pgsandstrom
Copy link
Author

@pgsandstrom pgsandstrom commented Feb 18, 2020

Its a Nexus repository (https://www.sonatype.com/product-nexus-repository). The configuration of it seems to be quite standard. This is at a largish company and no one has detected anything weird with it. But as you mentioned the default repository, I did more testing.

This does produce errors:
conf = { cache: '/tmp/extra-new-cache', registry: 'my-company-internal-registry' }

But this does not produce errors:
conf = { cache: '/tmp/extra-new-cache', registry: 'https://registry.npmjs.org' }

So the registry is the source of the issue. But it still seems really strange that it bugs out so completely in this scenario, but works well otherwise. I mean, my npm cache seems to be working perfectly when using the npm client in my work. But I dont know enough about the npm cache to say in what way the registry could be buggy. Is there a sensible way to debug this?

@isaacs
Copy link
Contributor

@isaacs isaacs commented Mar 2, 2020

Can you please update your test to show the raw output?

Ie, instead of this:

const json = (await npmRegistryFetch.json(dependencyName, conf))

Do this:

const response = await npmRegistryFetch(dependencyName, conf)
const body = await response.text()
let json
try {
  json = JSON.parse(body)
} catch (er) {
  console.error('JSON failed', response.status, response.headers, body)
  throw er
}

And then share the output.

@pgsandstrom
Copy link
Author

@pgsandstrom pgsandstrom commented Mar 10, 2020

response.headers was an object and the body was too large to be logged to console. So the exact code I ran was like this:

console.error(
  'JSON failed',
  response.status,
  JSON.stringify(response.headers),
  body.substr(0, 1000),
)

All the responses I got was similar to this:

JSON failed 200 {} io.com","bugs":{"url":"https://github.com/Microsoft/vsce/issues"},"keywords":["vscode","vsce","extension"],"contributors":[{"name":"Microsoft Corporation"}],"author":{"name":"Microsoft Corporation"},"license":"MIT","main":"out/main.js","bin":{"vsce":"out/vsce"},"scripts":{"test":"mocha","prepublish":"gulp compile && mocha"},"engines":{"node":">= 0.12"},"dependencies":{"commander":"^2.8.1","denodeify":"^1.2.1","glob":"^5.0.14","lodash":"^3.10.1","mime":"^1.3.4","minimatch":"^2.0.10","osenv":"^0.1.3","read":"^1.0.7","tmp":"0.0.27","url-join":"0.0.1","vso-node-api":"^0.5.0","yazl":"^2.2.2"},"devDependencies":{"event-stream":"^3.3.1","gulp":"^3.9.0","gulp-filter":"^3.0.1","gulp-tsb":"^1.6.0","mocha":"^2.3.3","rimraf":"^2.4.3","source-map-support":"^0.3.2","xml2js":"^0.4.12"},"gitHead":"1af60269feede8d74fcd52fde591c9754781ec22","_id":"vsce@0.7.1","_shasum":"d2d430805a46e7693e0cd824a535e883920ab31d","_from":".","_npmVersion":"3.3.10","_nodeVersion":"0.12.7","_npmUser":{"name":"joaomoreno.ms"

It's really strange, it is as if they are cut of in the beginning. I guess that explains why the json cant be parsed.

I wrote some responses to file and inspected them, they all looked like this. Here is a pastebin of an entire response: https://pastebin.com/E7456cXz

@isaacs
Copy link
Contributor

@isaacs isaacs commented Mar 10, 2020

Ah, can you run it again, but don't JSON.stringify() the response.headers. Fetch headers is a Map object, so it'll JSON.stringify as just an empty object, but the headers might give us some indication if maybe we're incorrectly handling a chunked encoding or some other thing.

@isaacs
Copy link
Contributor

@isaacs isaacs commented Mar 10, 2020

Cutting down the body is fine. And yeah, it's definitely being cut off for some reason.

@pgsandstrom
Copy link
Author

@pgsandstrom pgsandstrom commented Mar 22, 2020

Maybe I'm missing something, but if I leave the headers as is then only [object Headers] is printed. However, I did manage to stringify it with an ugly hax. This is what I got:

{"cache-control":"public, max-age=300","cf-cache-status":"REVALIDATED","cf-ray":"5781460dbe9dd13b-GOT","connection":"keep-alive","content-encoding":"gzip","content-type":"application/json","date":"Sun, 22 Mar 2020 16:26:19 GMT","etag":"W/\"f217f19f947ba9ddec66a105b8085819\"","expect-ct":"max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"","last-modified":"Mon, 04 Feb 2019 10:34:16 GMT","server":"cloudflare","set-cookie":"__cfduid=<redacted>; expires=Tue, 21-Apr-20 16:26:19 GMT; path=/; domain=.npmjs.org; HttpOnly; SameSite=Lax","transfer-encoding":"chunked","vary":"accept-encoding, accept","x-fetch-attempts":"1"}

@isaacs
Copy link
Contributor

@isaacs isaacs commented May 4, 2020

Update on this! I was able to finally track down a way to reproduce this issue, and track it down to this bit of code in make-fetch-happen's cache.js, where we tee into the cache write stream if the response doesn't fit in memory:

      const tee = new Minipass()
      const cacheStream = cacache.put.stream(
        cachePath,
        ckey,
        cacheOpts
      )
      // XXX problem is right here:
      // if the cacheStream is slow, and there is more than one chunk
      // queued for consumption, then the first chunk is lost.  witaf?
      tee.pipe(cacheStream)
      // this works fine, though:
      // tee.on('data', d => cacheStream.write(d))
      // tee.on('end', () => cacheStream.end())
      // this fails in the same way (ie, not a cacache issue, just a minipass tee issue)
      // const cacheStream = new Minipass()
      // setTimeout(() => cacheStream.resume())
      cacheStream.promise().then(cacheWriteResolve, cacheWriteReject)
      newBody.unshift(tee)

I think this might be a bug in how Minipass handles multiple pipe destinations, so I'm going to dig into that. If so, it's a bug lurking for later anyway. If I can prove that minipass is behaving properly, then it means it's somewhere further up the stack, maybe in minipass-pipeline or minipass-flush (though those modules are super simple and just rely on the Minipass semantics, so I'm skeptical that they're to blame here).

@isaacs
Copy link
Contributor

@isaacs isaacs commented May 4, 2020

For posterity: the repro case is, in the npm v7 branch that uses arborist, doing npm i tap@14 --before=2020-01-01 with an empty cache, so that it has to fetch a large number of full-metadata packuments and cache them all. The fs streams get backed up, and the responses are large, so it hits this code path where the tee target is slower than the incoming or "main" pipeline of data.

Going to try to reduce it down to just a set of minipass streams in a synthetic environment today or tomorrow. In the meantime, I can float the "no backpressure pipe" patch on make-fetch-happen, since it's really not too much of a hazard to let the cache writes queue up in memory a bit anyway.

@isaacs
Copy link
Contributor

@isaacs isaacs commented May 4, 2020

Oh, wild! It's not just dropping the first chunk, it's getting the full data, but out of order!

 FAIL  test/cache.js
 ✖ should have same body

  --- expected
  +++ actual
  @@ -1,1 +1,1 @@
  -"hello, world!"
  +"ld!hello, wor"

  test: "test/cache.js put method stream (not in memory): caches correctly"

isaacs added a commit to npm/make-fetch-happen that referenced this issue May 4, 2020
Something weird is going on here.  This SHOULD be fine as a simple
pipe(), but for some reason, backpressure from the cache stream can
cause the pipeline to drop the first chunk of data, resulting in invalid
JSON.  Until that is fixed, just write into the cache without any
backpressure.

The only hazard is that, if the fs is truly very slow, and the rest of
the consumption pipeline is very fast, then we'll back up into memory
and use more than we ought to, rather than pushing back on the incoming
stream.  However, this isn't likely to ever be a problem due to how npm
does HTTP.  Either it's fetching a JSON response, or a tarball (which is
also either unpacking to disk, or streaming directly to a tarball file
on disk).  So, if the disk is slow, and it's a tarball request, we're
likely to get backpressure from the main pipeline anyway.  It can only
become a problem if the JSON response is large enough to span multiple
chunks, and also the fs is loaded enough to start slowing down.  In the
JSON response case, we're going to load the whole thing in memory
anyway, so nothing is made particularly *worse* by this lack of
backpressure.

It is possible that the root cause of this bug exists either in cacache,
minipass-pipeline, or minipass itself.  But since we don't do a
multi-pipe tee stream anywhere else in npm's stack, this is the only
spot where it can make itself known.

Re: npm/npm-registry-fetch#23
isaacs added a commit that referenced this issue May 4, 2020
@isaacs
Copy link
Contributor

@isaacs isaacs commented May 4, 2020

Ok, update this to npm-registry-fetch 8.0.2, and it should not hit the problem any more. I'm leaving this issue open though, as the root cause still has to be determined and fixed properly either in cacache, minipass-pipeline, or minipass.

@isaacs isaacs closed this in 4c5a106 May 13, 2020
@isaacs isaacs reopened this May 13, 2020
@isaacs isaacs closed this in 3b6c5d0 May 13, 2020
@isaacs
Copy link
Contributor

@isaacs isaacs commented May 13, 2020

Alright! Removed the kludge in make-fetch-happen, and fixed this properly at the root.

That was a beast to track down.

isaacs added a commit to isaacs/minipass-pipeline that referenced this issue May 13, 2020
If the pipeline as a whole is not flowing, then it should return `false`
from any write operation.  Since the Pipeline listens to the tail
stream's `data` event, the streams in the pipeline always are in flowing
mode.  However, the Pipeline itself may not be, so it would return
`true` from writes inappropriately, allowing data to be buffered up in
the Pipeline excessively.

This would not cause any significant issues in most cases, except
excess memory usage.

Discovered while debugging npm/npm-registry-fetch#23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants