Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beef up and harden import/export tests #7140

Merged
merged 3 commits into from Apr 10, 2020
Merged

Conversation

ribasushi
Copy link
Contributor

@ribasushi ribasushi commented Apr 10, 2020

Fixes ( or at least makes cleaner ) #7123

At this point I am convinced that this comment I made way back is in fact real: https://github.com/ipfs/go-ipfs/pull/7011/files/aa5019413741f2d34c9723d376591cf8c4b6b465#diff-15172926c4147422f472139313467179R330-R341
We did switch away from gclock a while back, and likely other changes I made cause it to trigger less frequently. Reproducing this locally is nigh impossible however :(

So instead the timeout+QUIT contraptions should tell us more next time it happens on the slower circleci boxes.

@ribasushi ribasushi force-pushed the chore/harden_impexp_test branch 4 times, most recently from 7f52d1b to 8dc2a2a Compare April 10, 2020 14:04
@ribasushi ribasushi marked this pull request as ready for review April 10, 2020 14:22
@Stebalien
Copy link
Member

Are we talking about two separate issues here? There's:

  1. The sharness issue.
  2. The GC issue.

Your tests prove that this issue can't be related to GC locks by successfully verifying that none of the pinned blocks in the CARs are garbage collected. If this were a GC race, we'd be deleting pinned data.

We know what at least part of the GC issue is. The issue is that we're garbage collecting parts of the pin set. We could be generating and garbage collecting additional data, but we'd have to look into that by looking at the blocks GC is deleting.

@ribasushi
Copy link
Contributor Author

@Stebalien not a GC race - a GC deadlock. I am 99% sure this is what you observed in 7123. What I am modifying is that if we trigger a generous timeout: we will dump a full stacktrace where I "expect nothing" otherwise.

The other thing ( ephemeral pinsets showing up ) is tracked in a different ticket.

@Stebalien
Copy link
Member

Ah, got it.

The bug we were running into there was that we were invoking the CoreAPI while already holding the PinLock. However, we could definitely have a different deadlock somewhere.

@Stebalien
Copy link
Member

Doing some spelunking: I don't think it's a starvation issue. That is, repeated GC should eventually allow a dag import to proceed.

@@ -0,0 +1,46 @@
From 6a43df1919da862f56c9af867a28bf2d7858e5ca Mon Sep 17 00:00:00 2001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the behavior of sharness is surprising, but I'm concerned that this will cause more problems than it fixes. It already has a bunch of special cases to avoid rejecting valid code.

I'd rather document this in the sharness README and move on unless we can find a more reliable way to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody will ever discover the sharness README :)
No worries though - I will pull this commit out, and use it locally instead.
Unless you are open to have it gated on an environment variable...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force-pushed a removal ( it was the first commit )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we'd trigger it when we get a failure. However, in this case, that wouldn't have helped regardless because we weren't checking for errors.

You're right that nobody will read the README... Honestly, it's not terrible given that we can catch and deal with any false positives. What if we include it but put in a comment explaining exactly what it does, where it comes from (it's non-obvious that we're patching it in), and that it's a heuristic and that we should remove it if it causes problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate PR once I had a chance to write up better prose. The the error will link to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now let's merge the rest to catch the lock if it happens again ( all the "$( stuff, while annoying was a red herring )

@Stebalien Stebalien merged commit 0c619ef into master Apr 10, 2020
@Stebalien Stebalien deleted the chore/harden_impexp_test branch April 10, 2020 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants