New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage Support for GCS and Azure #1221
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By and large I think this looks good. This bucket interaction is pretty complex. I'm very close to approving, just a few questions.
Also, there are quite a few unwrapped errors that might make future debugging challenging.
IdleConnTimeout: model.Duration(90 * time.Second), | ||
ResponseHeaderTimeout: model.Duration(2 * time.Minute), | ||
TLSHandshakeTimeout: model.Duration(10 * time.Second), | ||
ExpectContinueTimeout: model.Duration(1 * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't these time.Duration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This is used to match the Thanos Bucket configuration format (Forked from thanos)
My guess is that they leverage the prometheus Durations which can parse a larger variety of duration strings (This is likely identical to the way we forked go duration code to add support for custom units).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me!
// Disable `ForceLog` in Azure storage module | ||
// As the time of this patch, the logging function in the storage module isn't correctly | ||
// detecting expected REST errors like 404 and so outputs them to syslog along with a stacktrace. | ||
// https://github.com/Azure/azure-storage-blob-go/issues/214 | ||
// | ||
// This needs to be done at startup because the underlying variable is not thread safe. | ||
// https://github.com/Azure/azure-pipeline-go/blob/dc95902f1d32034f8f743ccc6c3f2eb36b84da27/pipeline/core.go#L276-L283 | ||
pipeline.SetForceLogEnabled(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️
if conf.UserAssignedID == "" { | ||
if conf.StorageAccountName == "" || | ||
conf.StorageAccountKey == "" { | ||
errMsg = append(errMsg, "invalid Azure storage configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a little more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. I know you're not a fan of most of this configuration stuff, but it's all 1-1 with the Thanos bucket storage config. The parsing code, how it handles auth, etc... The entire point is to leverage the same bucket storage configuration format that Thanos uses to avoid having to replicate buckets AND/OR auth schemes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I wasn't sure how much, if anything, was original authorship. I'll trust the robustness of the Thanos code, and we always have room to change in the future.
pkg/storage/azurestorage.go
Outdated
name = trimLeading(name) | ||
ctx := context.Background() | ||
|
||
log.Infof("AzureStorage::Read(%s)", name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a Debugf
? Same question for the other methods, like Write
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it, and probably should reduce the level at some point. It's only for Read/Write/Delete I think.
func (b *AzureStorage) getBlobReader(ctx context.Context, name string, offset, length int64) (io.ReadCloser, error) { | ||
log.Debugf("Getting blob: %s, offset: %d, length: %d", name, offset, length) | ||
if name == "" { | ||
return nil, errors.New("X-Ms-Error-Code: [EmptyContainerName]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason this string format is being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, thanos forked code :)
Timeout: 30 * time.Second, | ||
KeepAlive: 30 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these long enough? Especially the Timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we're not requiring more than 30s to upload a single ETL file, BUT this is also just default thanos settings as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
// Fork from Thanos GCS Bucket support to reuse configuration options | ||
// Licensed under the Apache License 2.0. | ||
// https://github.com/thanos-io/thanos/blob/main/pkg/objstore/gcs/gcs.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all we need for attribution? Should we reference a commit it came from, or any specific authors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific changes you made to the fork worth pointing out, to help us poor reviewers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha! I see that this was the last thing you noticed as well. It's the same for all the bucket storage implementations. Based on what I have read concerning Apache 2, this is all that is required, but I would love a confirmation there.
Mainly, the fork was to ensure 1-1 with the Thanos Bucket Configuration for all of these bucket types. So, any functionality which could be configured in the Thanos format was forked and used. However, their "Bucket" API isn't the same as ours at all, it's designed around specific thanos use-cases (like recursively diving into sub-directories), so most of the implementation of the storage.Storage
interface is "guided" by their implementation, but it's mostly different. The Azure implementations are probably the closest since the blob storage APIs for Azure are a complete disaster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! Sounds like we maybe want to test Azure still? Besides that, addressing logging, and addressing Michael's review, I'd be happy to approve.
Co-authored-by: Michael Dresser <michael@kubecost.com>
Azure was tested on a live environment and monitoring reported greens for everything, so we're good to go! Pulling the trigger on both PRs. |
@mbolt35 Great work. |
What does this PR change?
storage.Storage
implementations for GCS and AzureHow will this PR impact users?
How was this PR tested?
Does this PR require changes to documentation?