Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Support for GCS and Azure #1221

Merged
merged 9 commits into from May 27, 2022
Merged

Storage Support for GCS and Azure #1221

merged 9 commits into from May 27, 2022

Conversation

mbolt35
Copy link
Collaborator

@mbolt35 mbolt35 commented May 24, 2022

What does this PR change?

  • Adds storage.Storage implementations for GCS and Azure

How will this PR impact users?

  • Adds support for shared configuration and backups over GCS and Azure.

How was this PR tested?

  • Live on GCS environment for GCS storage
  • Azure Storage still needs testing...

Does this PR require changes to documentation?

  • Yes, will update Storage Documentation and ETL Backup Documentation.

@mbolt35 mbolt35 added next release This PR/issue is expected to be merged/addressed in the next release v1.94 labels May 24, 2022
Copy link
Collaborator

@michaelmdresser michaelmdresser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By and large I think this looks good. This bucket interaction is pretty complex. I'm very close to approving, just a few questions.

Also, there are quite a few unwrapped errors that might make future debugging challenging.

Comment on lines +50 to +53
IdleConnTimeout: model.Duration(90 * time.Second),
ResponseHeaderTimeout: model.Duration(2 * time.Minute),
TLSHandshakeTimeout: model.Duration(10 * time.Second),
ExpectContinueTimeout: model.Duration(1 * time.Second),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't these time.Duration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is used to match the Thanos Bucket configuration format (Forked from thanos)

My guess is that they leverage the prometheus Durations which can parse a larger variety of duration strings (This is likely identical to the way we forked go duration code to add support for custom units).

https://github.com/kubecost/cost-model/blob/868089f78585c40307bacb61d9aa54e39bc9170a/pkg/util/timeutil/timeutil.go#L95-L213

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

Comment on lines +62 to +69
// Disable `ForceLog` in Azure storage module
// As the time of this patch, the logging function in the storage module isn't correctly
// detecting expected REST errors like 404 and so outputs them to syslog along with a stacktrace.
// https://github.com/Azure/azure-storage-blob-go/issues/214
//
// This needs to be done at startup because the underlying variable is not thread safe.
// https://github.com/Azure/azure-pipeline-go/blob/dc95902f1d32034f8f743ccc6c3f2eb36b84da27/pipeline/core.go#L276-L283
pipeline.SetForceLogEnabled(false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

pkg/storage/azurestorage.go Outdated Show resolved Hide resolved
if conf.UserAssignedID == "" {
if conf.StorageAccountName == "" ||
conf.StorageAccountKey == "" {
errMsg = append(errMsg, "invalid Azure storage configuration")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a little more descriptive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. I know you're not a fan of most of this configuration stuff, but it's all 1-1 with the Thanos bucket storage config. The parsing code, how it handles auth, etc... The entire point is to leverage the same bucket storage configuration format that Thanos uses to avoid having to replicate buckets AND/OR auth schemes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I wasn't sure how much, if anything, was original authorship. I'll trust the robustness of the Thanos code, and we always have room to change in the future.

name = trimLeading(name)
ctx := context.Background()

log.Infof("AzureStorage::Read(%s)", name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a Debugf? Same question for the other methods, like Write.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it, and probably should reduce the level at some point. It's only for Read/Write/Delete I think.

func (b *AzureStorage) getBlobReader(ctx context.Context, name string, offset, length int64) (io.ReadCloser, error) {
log.Debugf("Getting blob: %s, offset: %d, length: %d", name, offset, length)
if name == "" {
return nil, errors.New("X-Ms-Error-Code: [EmptyContainerName]")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason this string format is being used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, thanos forked code :)

Comment on lines +592 to +593
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these long enough? Especially the Timeout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we're not requiring more than 30s to upload a single ETL file, BUT this is also just default thanos settings as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

Comment on lines +3 to +5
// Fork from Thanos GCS Bucket support to reuse configuration options
// Licensed under the Apache License 2.0.
// https://github.com/thanos-io/thanos/blob/main/pkg/objstore/gcs/gcs.go
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this all we need for attribution? Should we reference a commit it came from, or any specific authors?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific changes you made to the fork worth pointing out, to help us poor reviewers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha! I see that this was the last thing you noticed as well. It's the same for all the bucket storage implementations. Based on what I have read concerning Apache 2, this is all that is required, but I would love a confirmation there.

Mainly, the fork was to ensure 1-1 with the Thanos Bucket Configuration for all of these bucket types. So, any functionality which could be configured in the Thanos format was forked and used. However, their "Bucket" API isn't the same as ours at all, it's designed around specific thanos use-cases (like recursively diving into sub-directories), so most of the implementation of the storage.Storage interface is "guided" by their implementation, but it's mostly different. The Azure implementations are probably the closest since the blob storage APIs for Azure are a complete disaster.

Copy link
Collaborator

@nikovacevic nikovacevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Sounds like we maybe want to test Azure still? Besides that, addressing logging, and addressing Michael's review, I'd be happy to approve.

pkg/storage/gcsstorage.go Outdated Show resolved Hide resolved
mbolt35 and others added 3 commits May 26, 2022 21:36
@mbolt35
Copy link
Collaborator Author

mbolt35 commented May 27, 2022

Azure was tested on a live environment and monitoring reported greens for everything, so we're good to go! Pulling the trigger on both PRs.

@mbolt35 mbolt35 merged commit b5d01f3 into develop May 27, 2022
@Adam-Stack-PM
Copy link
Contributor

@mbolt35 Great work.

@michaelmdresser michaelmdresser deleted the bolt/storage-impls branch June 23, 2023 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next release This PR/issue is expected to be merged/addressed in the next release v1.94
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants