storage: nil pointer crash in writer goroutine after CloseWithError #4167
Extracted/adapted from a more complex reproduction, but more or less:
c := storage.NewClient(...)
w := c.NewWriter(...)
The write is aborted.
The process crashes:
It looks like
So it looks like the caller has called
So it appears that the writer was somehow still running after it was closed, which meant it was then poised to crash when the client was closed out from under it?
The text was updated successfully, but these errors were encountered:
Thanks for the detailed writeup and for your proposed fix!
I wasn't able to reproduce the issue locally with a couple versions of your repro. Your reasoning seems correct, but I have a couple follow up questions to try to understand better what's going on:
My reproduction was unfortunately a bit more complicated: I was running several processes across a cluster of machines that were all writing files to GCS while under heavy sustained CPU load. They'd usually crash within about 5min, sometimes as long as 10min, after successfully writing 2-10k files. Sorry, I realize that isn't exactly a simple reliable local reproduction but it was reliable in that it did always crash before finishing the overall task.
Yes and no: I was calling
No -- after switching to only calling
While it was still happening, I did a little quick and dirty println debugging, specifically I adding a package var
My shot in the dark guess is that this is down extreme CPU contention and whether the
I also ran it a few times with the patch in #4168 to add
EDIT: never mind, I think this more recent one -- since my application changes to avoid calling
Thanks for following up and sorry for the delay (I've been on vacation and then a busy week last week). Just wanted to check in, has it continued to work alright for you in the intervening 10 days or so?
I think I need to come up with some kind of repro myself for this (or at least do a little more research) before following up with your PR. I'm reluctant to mess with
Yeah, since removing
(also, just while I poking around in writer.go, I think it looked like
Thanks for following up! Good to hear that strategy has been working better.
I think you're right about the check on