Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom S3 CSV billing export endpoint #1991

Merged
merged 4 commits into from Aug 28, 2023

Conversation

klaasjand
Copy link
Contributor

@klaasjand klaasjand commented Jun 20, 2023

What does this PR change?

  • This PR adds support for custom S3 endpoints such as Minio for exporting CSV billing data.

Does this PR relate to any other PRs?

  • No

How will this PR impact users?

  • Adds ability to use custom S3 endpoints for exporting billing data.

Does this PR address any GitHub or Zendesk issues?

  • No

How was this PR tested?

  • Manually via k8s deployment (via Helm chart).
  • Tested with empty bucket, export file create.
  • Tested with non-empty bucket, export file update.
  • Tested with non-empty bucket, export file no updates needed.
  • Tested with invalid S3 path.
  • Tested with invalid secret key.
  • Ran duration test.

Does this PR require changes to documentation?

  • Yes, export path: see example in code, re-use AWS credential variables.

Have you labeled this PR and its corresponding Issue as "next release" if it should be part of the next OpenCost release? If not, why not?

Signed-off-by: Klaas Jan Dijksterhuis <klaasjand@users.noreply.github.com>
@klaasjand klaasjand changed the title Add custom s3 export endpoint Add custom S3 CSV billing export endpoint Jun 26, 2023
@mattray mattray added the opencost OpenCost issues vs. external/downstream label Aug 1, 2023
Copy link
Contributor

@r2k1 r2k1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it's a nice addition. I'd like just to see a non-conflicting way to configure it.

@@ -44,13 +46,91 @@ func NewFileManager(path string) (FileManager, error) {
return NewGCSStorageFile(path)
case strings.Contains(path, "blob.core.windows.net"):
return NewAzureBlobFile(path)
case strings.HasPrefix(path, "http://"), strings.HasPrefix(path, "https://"):
Copy link
Contributor

@r2k1 r2k1 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change s3 protocol will be assumed for any http(s) link.
It will require a breaking change for a potential extension in the future like adding a new destination type.

I would suggest instead using one of the 3 next approaches:

  1. Introduce new protocol prefix string, like customs3://endpoint.com/my-file.csv or minio://endpoint.com/my-file.csv
  2. Add domain filter like, minio.com (probably the least flexible option)
  3. Extend existing S3File struct to support new destination. (I'm not familiar with minio, but it looks like you're using existing aws library and the same method calls.). Despite this being my preferred option, there might be potential challenges that I may not fully comprehend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards option 1 because I'd rather don't want to mess up the current AWS implementation too much or add extra complexity to it. I would like to hear your thoughts on the following. Introduce a new protocol prefix string sort of like you mentioned, e.g. customs3://fqdn:port/bucket-name/path/to/file.csv and always enforce secure https so this translates to: https://fqdn:port/bucket-name/path/to/file.csv.

Copy link
Contributor

@r2k1 r2k1 Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the protocol prefix string/scheme to alts3 (alternative S3) so alts3://fqdn:port/bucket-name/path/to/file.csv translates to https://fqdn:port/bucket-name/path/to/file.csv.

@@ -7,6 +7,7 @@ import (
"io"
"net/url"
"os"
gp "path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I would rather change a variable name instead of creating an alias for a standard package.

Makes code harder to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the alias and renamed variables/arguments.

return nil, fmt.Errorf("invalid s3 path: %s", path)
}

const defaultRegion = "us-east-1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to hardcode the region?
I suspect this won't be correct for the majority of the people. If it's a required parameter I would rather fail with an error early and let the user fix it.

Copy link
Contributor Author

@klaasjand klaasjand Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to hardcode the region? I suspect this won't be correct for the majority of the people. If it's a required parameter I would rather fail with an error early and let the user fix it.

For the region I used this: minio/minio#15063 which is also described in the Minio documentation but both are based on using the AWS cli and not the SDK. I realize that's quite a bold assumption. After reading this: https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/endpoints/ it's probably better to make it customizable as there is already an environment variable present for that AWS_REGION.

Copy link
Contributor Author

@klaasjand klaasjand Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the AWS SDK docs for endpoint configuration I changed the implementation which made it a little bit more straightforward. See also: https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/endpoints/

We are using the v1 endpoint because OpenCost currently uses github.com/aws/aws-sdk-go-v2/service/s3 v1.31.0, from v1.38.0 and above we have to switch to the v2 endpoint, see: https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/endpoints/#migration also added a comment about it in the code.

s3Client: s3.NewFromConfig(cfg),
bucket: bucket, // bucket
key: key, // path/to/file.csv
}, nil
Copy link
Contributor

@r2k1 r2k1 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just return S3File{} here? I don't see the difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just return S3File{} here? I don't see the difference.

You're right, my initial thought was to keep the original AWS S3 and the custom S3 code completely separated and have each their own Upload and Download implementation. But in retrospect it's probably better to reuse the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see arguments for having it separate as well.
I don't have a strong opinion on this. I think either way is fine.

… sdk based s3 client endpoint construction

Signed-off-by: Klaas Jan Dijksterhuis <klaasjand@users.noreply.github.com>
…rt alias for path

Signed-off-by: Klaas Jan Dijksterhuis <klaasjand@users.noreply.github.com>
Copy link
Contributor

@r2k1 r2k1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@r2k1
Copy link
Contributor

r2k1 commented Aug 16, 2023

@mattray Can you merge it?

@mattray mattray merged commit 326cf1b into opencost:develop Aug 28, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
opencost OpenCost issues vs. external/downstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants