Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: update object bucket provisioner library #6699

Merged
merged 1 commit into from
Dec 3, 2020

Conversation

BlaineEXE
Copy link
Member

@BlaineEXE BlaineEXE commented Nov 24, 2020

The library for object bucket provisioning is updated to fix errors
during provisioning.

Signed-off-by: Blaine Gardner blaine.gardner@redhat.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #6650

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

@mergify mergify bot added the ceph main ceph tag label Nov 24, 2020
@BlaineEXE
Copy link
Member Author

BlaineEXE commented Nov 24, 2020

@acejam how comfortable do you feel building Rook containers locally and testing them in your environment? I believe this should fix #6650 for you though I will have to wait for kube-object-storage/lib-bucket-provisioner#198 to merge before Rook can be patched. If you don't feel comfortable, I can figure out how to get a Rook build onto Dockerhub for you somehow to test if you'd be willing.

go.mod Outdated Show resolved Hide resolved
@@ -74,6 +74,7 @@ func (p Provisioner) Provision(options *apibkt.BucketOptions) (*bktv1alpha1.Obje

s3svc, err := cephObject.NewS3Agent(p.accessKeyID, p.secretAccessKey, p.getObjectStoreEndpoint(), true)
if err != nil {
p.deleteOBCResourceLogError("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just defer instead of these multiple calls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we only want to do this on error... I suppose we could do what the lib-bucket-provisioner code does and have an if err != nil block in the deferred function...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also given that some of the cleanup calls use p.bucketName I'd rather duplicate the lines and be more clear rather than have a complicated cleanup call. But let me know if you still think we should defer cleanup.

@BlaineEXE
Copy link
Member Author

I think I still want to update the object integration test to verify that the OBC makes it into the bound state and does not regress to perpetually "Pending" like before.

leseb
leseb previously requested changes Dec 2, 2020
cluster/examples/kubernetes/ceph/crds.yaml Show resolved Hide resolved
@BlaineEXE BlaineEXE force-pushed the update-lib-bucket-provisioner branch from 88c5fe2 to 83f977c Compare December 2, 2020 17:23
@BlaineEXE
Copy link
Member Author

BlaineEXE commented Dec 2, 2020

There seems to be a race condition that is getting caught by the integration tests but not my local system. From what I can tell, the OBC should be getting re-resolved after spec.objectBucketName is filled in after the OB is created. The spec.objectBucketName determines whether the OBC is already bound or not. For some reason either (1) the OBC is starting to get re-resolved while the initial provision is still in progress, or (2) the OBC isn't keeping the change to spec.objectBucketName... maybe an issue with ETCD being slow to commit changes?

I'm adding extra logging temporarily to the object bucket provisioner library to determine when a new request is being queued and what the OBC looks like when work on it starts.

Update: This was a result of forgetting to put the objectBucketName property into the manifests in the integration tests. 🤦‍♀️

@BlaineEXE BlaineEXE force-pushed the update-lib-bucket-provisioner branch from 573ccd4 to 90b89a3 Compare December 2, 2020 22:55
The library for object bucket provisioning is updated to fix errors
during provisioning.

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
// resource being modified to refer to its OB.
timeToWaitForReversion := 2 * time.Minute
timeToCheckForReversion := timeBucketCreateVerified.Add(timeToWaitForReversion)
for time.Now().Before(timeToCheckForReversion) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for loop the same as a single 2 minute sleep? Why not just do this?
time.Sleep(2 * time.Minute)
Actually, we really should not have a two minute sleep in the test. That's pretty expensive when the whole suite is less than 15 minutes. I would suggest we not test for this revert bug in the integration tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to just blindly wait 2 minutes, so I check every 5 seconds if it's been 2 minutes since the OBC was created timeBucketCreateVerified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best thing to do might be to make the object and bucket its own test outside of the smoke suite. Maybe a separate PR? I really think we should do a better job of adding regression tests into our integration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we really do need a "daily" test suite that will run regression tests that may take a lot longer to run. For development we really need to keep the test suites shorter though.

Ideally we would add a test to prevent every regression that we find. In reality, it's frequently just not feasible to do that, so we have to strike a balance based on the cost of creating the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to adding this regression in a separate PR with the ability to enable it only with a REGRESSION_TEST env var or something, but I really don't want to just drop the responsibility of writing regression tests. Maybe I can work with Seb to get another GH action to run regressions daily and on release builds?

Copy link
Member

@travisn travisn Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a separate PR sounds good for the tests so we can get this fix into the release.

@BlaineEXE BlaineEXE force-pushed the update-lib-bucket-provisioner branch 2 times, most recently from 89ee2cd to 3b4ee6c Compare December 3, 2020 21:08
@travisn travisn dismissed leseb’s stale review December 3, 2020 21:21

Feedback addressed

@BlaineEXE BlaineEXE merged commit bc08f51 into rook:master Dec 3, 2020
@BlaineEXE BlaineEXE deleted the update-lib-bucket-provisioner branch December 3, 2020 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ceph main ceph tag object-bucket-claims Object Bucket Claims (OBC)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ObjectBucketClaim is not properly created using 1.5.0: resource name may not be empty
4 participants