Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CONTENT_DOWNLOAD_MISMATCH on prematurely closed streams #69

Closed
rexxars opened this issue Oct 24, 2017 · 17 comments
Closed

CONTENT_DOWNLOAD_MISMATCH on prematurely closed streams #69

rexxars opened this issue Oct 24, 2017 · 17 comments
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. 🚨 This issue needs some love. triage me I really want to be triaged.

Comments

@rexxars
Copy link

rexxars commented Oct 24, 2017

Issue

When you use the bucket.file(...).createReadStream() and pipe it to a target target stream, and that target stream is ended/closed before the Google Cloud Storage client is finished reading the remote file, the validation fails because it tries to compare the partially received data with the remote hash.

I've created a repository containing a simplified example of the problem. In the real-world, we experienced this problem when an HTTP server behind a reverse proxy was piping from GCS to the HTTP response, and the upstream proxy closed the socket.

Environment details

  • OS: Ubuntu Linux 16.04
  • Node.js version: 8.7.0
  • npm version: 5.4.2
  • @google-cloud/storage version: 1.4.0

Steps to reproduce

  1. git clone git@github.com:rexxars/gcs-premature-eos-error.git
  2. cd gcs-premature-eos-error
  3. npm install
  4. Define the following environment variables (alternatively, modify test.js directly):
    • GCS_KEY_PATH - Path to a Google Cloud key file with access to a bucket
    • GCS_BUCKET - Name of Google Cloud Storage bucket to fetch from
    • GCS_FILE_PATH - Path to a file inside of the bucket
  5. npm test

After 60 seconds, the script will exit with the following error:

Error: The downloaded data did not match the data from the server. To be sure the content is the same, you should download the file again.
    at onComplete (.../node_modules/@google-cloud/storage/src/file.js:638:29)
    at .../node_modules/@google-cloud/storage/src/bucket.js:1439:7
    at Object.handleResp (.../node_modules/@google-cloud/common/src/util.js:135:3)
    at .../node_modules/@google-cloud/common/src/util.js:465:12
    at Request.onResponse [as _callback] (.../node_modules/retry-request/index.js:179:7)
    at Request.self.callback (.../node_modules/request/request.js:186:22)
    at emitTwo (events.js:125:13)
    at Request.emit (events.js:213:7)
    at Request.<anonymous> (.../node_modules/request/request.js:1163:10)
    at emitOne (events.js:115:13)
@stephenplusplus stephenplusplus self-assigned this Oct 25, 2017
@stephenplusplus
Copy link
Contributor

Thanks for the reproduction case! I have received the error from the test, and will be looking into it.

@stephenplusplus
Copy link
Contributor

A quick question about:

In the real-world, we experienced this problem when an HTTP server behind a reverse proxy was piping from GCS to the HTTP response, and the upstream proxy closed the socket.

When the upstream proxy closed the socket, did you get an error event? I believe the right thing to do at that point is to close the GCS stream by calling .end().

Alternatively, this can be handled for you by using pumpify:

pumpify(sourceStream, brake(10), targetStream)
  .on('error', err => {
    // err === expected error about the socket breaking.
    // sourceStream (the GCS stream) was auto-closed
  })

@rexxars
Copy link
Author

rexxars commented Oct 25, 2017

I am using pump already to handle the piping. When the situation occurs, pump calls the callback with a premature close error, and after a little while the content mismatch error is emitted.

Since we're piping directly to the end-user HTTP response, there really isn't any good way to retry the fetch, so I've simply disabled the validation as there's really nothing I can do when this situation occurs - even if it was an actual content mismatch.

I still thought you might want to check if there is something you can do on your end to stop the error from being emitted, but I couldn't manage to reproduce the case 100% - using pump solves the test case I created, but somehow doesn't solve the problem we have in a real-world case :/

It seems determining premature closed streams is non-trivial.

@stephenplusplus
Copy link
Contributor

It still seems like there should only be one error emitted since you're using pump. Is the GCS stream being piped anywhere else?

@joelharkes
Copy link

We have the same issues.

We create a readstream and pipe() it directly to a Socket, these errors pop up.
Is there any way to fix this properly?

we only started getting these errors since updated to latested version of @google-cloud/storage

@stephenplusplus
Copy link
Contributor

Could you provide reproduction code? When you get an error from any other stream, the GCS stream needs to be ended as well.

@joelharkes
Copy link

Seems like we are experiencing same that we can only reproduce it when it is in a docker container in the google kubernetes container, it always seems to happen with the same file. (96.03 KB size). somehow it seems to happen differently on different clients so i'm wondering if it might have to do with chunking.

also we have the same issue if we pipe createReadStream() through a transform:

const readStream = file.createReadStream()
const transform = new MetaDatTransform();
const dest = readStream.pipe(transform);

readStream.on('error', (err) => {
   dest.destroy(err);
});
return dest;

i was able to get rid of this error by first reading stream till end and then reading it manually into a new stream. (bit of a work around but it worked).

@joelharkes
Copy link

joelharkes commented Oct 27, 2017

Can't seem to reproduce it now :/ it seems to happen only when an OPH scanner requests (directly over TCP socket). but i wonder how this can influence the google library to throw this error.

this is what we do with the readstream:

 const rs = file.createReadStream();
rs.pause();
rs.on('error', () => {...})//do logging here, close tcp socket
rs.on('data', function (buffer) {
  readLength += buffer.length;
});
rs.on('end', function (info) {}); //close tcp socket.
self.respond('150 Opening ' + self.mode.toUpperCase() + ' mode data connection', () => {
  rs.pipe(passiveSocket);
  rs.resume();
});

(this code is from lib: https://github.com/nodeftpd/nodeftpd)

@stephenplusplus
Copy link
Contributor

@joelharkes @rexxars any updates on how this is going / reproduction code?

@rexxars
Copy link
Author

rexxars commented Nov 13, 2017

Sorry, we were using as port of a relatively large and complex application and I found it hard to reproduce in an isolated fashion. Having said that, I find Node streams notoriously hard to get right, so I can't rule out that I was doing something wrong.

We disabled the verification, since there's no way to retry the request when piping directly to an HTTP response. Sorry I can't be of more help here.

@joelharkes
Copy link

joelharkes commented Nov 13, 2017

Idem, we had:

  • Error in combination with http sockets to OPH scanner.
  • Using stream transformer

The first we fixed by downgrading to storage version 1.2
The second by first converting to a promise and then transforming string.

Couldn't find the source problem debugging through the node modules. Maybe it's one of the sub dependencies.

@stephenplusplus
Copy link
Contributor

Having said that, I find Node streams notoriously hard to get right

That is definitely true. Feel free to ping me if you ever need an extra set of eyes on a problem.

I'm going to close this issue for now, because as said above, streams are hard to get right, and the blame could be one of many different points in an application, making it very hard to debug. If it's possible to create a reproduction case, I'll be happy to keep digging.

@DavideArgellati
Copy link

Hey @stephenplusplus I get the same with a cloud function where I react to upload events and try to append files to a zip in the bucket.
If I put a timeout between creating the write stream of the zip file and start writing to it is fine otherwise I get that error.
This happens only when I open the write stream in 'a' mode not 'w'

@stephenplusplus
Copy link
Contributor

@DavideArgellati would you mind opening a new issue with reproduction code I can try?

@joelharkes
Copy link

Cool good to know there is a 5000ms second timout that was removed.

We also found that downloading and uploading the same file at the same time. could create this message.

Downloads where done with streams and by clients with slower connections and small buffers.
it could take a while to download the file. (Im not sure if this could be related to the 5000ms timeout as well.)

changing the upload schedule seemed to have reduced/removed these problems.

@jlindberg-oss
Copy link

I am getting CONTENT_DOWNLOAD_MISMATCH every time I upload and download an image.

Using the completely vanilla upload and download samples here:
https://github.com/googleapis/nodejs-storage/blob/master/samples/files.js#L101
https://github.com/googleapis/nodejs-storage/blob/master/samples/files.js#L155

I try with any non-image file => always works.
I try with any image file => always CONTENT_DOWNLOAD_MISMATCH when downloading.

I thought maybe I need to set contentType explicitly, so I followed
https://stackoverflow.com/questions/41229918/google-storage-nodejs-upload-mime-type, but that didn't help.

This is super vanilla setup, no concurrency. Running on Electron on OSX. Does anyone know what could be the problem here?

@jlindberg-oss
Copy link

By the way the download is actually succeeding, in that the correct file is getting saved to disk. So this is just a false positive.

@google-cloud-label-sync google-cloud-label-sync bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Jan 31, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. 🚨 This issue needs some love. triage me I really want to be triaged.
Projects
None yet
Development

No branches or pull requests

6 participants