Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add virtual hosted-style support for signurl gen. #6183

Merged
merged 9 commits into from
Oct 1, 2019

Conversation

houglum
Copy link

@houglum houglum commented Aug 27, 2019

Adds new methods to produce a SignUrlOption that results in generated
signed URLs using virtual-hosted-style URLs (i.e.
mybucket.storage.googleapis.com/myobject instead of
storage.googleapis.com/mybucket/myobject).

One option allows specifying the virtual hostname explicitly (for the
case where someone might have a custom subdomain, with a bucket of the
same name, CNAME'd to c.storage.googleapis.com), while the other will
implicitly construct the hostname using the bucket from the passed-in
BlobInfo.

Addresses part of https://issuetracker.google.com/issues/130190655.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 27, 2019
@houglum
Copy link
Author

houglum commented Aug 27, 2019

@frankyn Here is the implementation for generating signed URLs using virtual-hosted style hostnames. I haven't added any tests to the conformance tests library yet; I wasn't sure what order I should add things in, given that the conformance test data is shared across client libraries, but none of the other client libraries have this functionality yet. Please advise :)

I manually tested this using a separate project:
https://github.com/houglum/maven-multimodule-google-cloud-java

...specifically, this module shows the new methods being used:
https://github.com/houglum/maven-multimodule-google-cloud-java/blob/ba198b910693522b5ba8de6b6e118ec8dc6de7ee/main-module/src/main/java/com/houglum/signurlexample/MainModule.java#L53

Adds new methods to produce a SignUrlOption that results in generated
signed URLs using virtual-hosted-style URLs (i.e.
mybucket.storage.googleapis.com instead of
storage.googleapis.com/mybucket).

One option allows specifying the virtual hostname explicitly (for the
case where someone might have a custom subdomain, with a bucket of the
same name, CNAME'd to c.storage.googleapis.com), while the other will
implicitly construct the hostname using the bucket from the passed-in
BlobInfo.

Addresses part of https://issuetracker.google.com/issues/130190655.
@codecov
Copy link

codecov bot commented Sep 3, 2019

Codecov Report

Merging #6183 into master will decrease coverage by 0.66%.
The diff coverage is 62.9%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #6183      +/-   ##
============================================
- Coverage     47.51%   46.85%   -0.67%     
- Complexity    27460    28130     +670     
============================================
  Files          2526     2601      +75     
  Lines        274681   287320   +12639     
  Branches      31402    33375    +1973     
============================================
+ Hits         130522   134629    +4107     
- Misses       134527   142447    +7920     
- Partials       9632    10244     +612
Impacted Files Coverage Δ Complexity Δ
...n/java/com/google/cloud/storage/SignatureInfo.java 78.51% <ø> (ø) 10 <0> (ø) ⬇️
...rc/main/java/com/google/cloud/storage/Storage.java 79.39% <60%> (-0.33%) 0 <0> (ø)
...ain/java/com/google/cloud/storage/StorageImpl.java 76.72% <63.15%> (-1.66%) 108 <10> (+8)
.../stub/mutaterows/MutateRowsUserFacingCallable.java 0% <0%> (-100%) 0% <0%> (-2%)
...igtable/gaxx/retrying/ApiResultRetryAlgorithm.java 15.38% <0%> (-15.39%) 3% <0%> (-3%)
...src/main/java/com/google/cloud/bigquery/JobId.java 81.81% <0%> (-12.92%) 10% <0%> (ø)
...c/main/java/com/google/cloud/bigquery/TableId.java 84% <0%> (-7.67%) 14% <0%> (ø)
...c/main/java/com/google/cloud/bigquery/ModelId.java 72% <0%> (-7.17%) 13% <0%> (ø)
...main/java/com/google/cloud/bigquery/RoutineId.java 70.37% <0%> (-6.56%) 13% <0%> (ø)
...main/java/com/google/cloud/bigquery/DatasetId.java 81.25% <0%> (-5.42%) 13% <0%> (ø)
... and 286 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1cd99a8...fc6e31e. Read the comment docs.

@houglum
Copy link
Author

houglum commented Sep 3, 2019

Friendly ping! Do either of you have any objections (usability, style, etc.) to the new methods?

I've already gotten some user feedback after a previous draft, and as a result, I added in the method that allows creating the URL implicitly (not having to specify the full base URL as an argument).

The only thing I thought about changing here was removing support for this for use with V2 signing, since it's not actually enforced for the signature (you always have to sign using the resource string /<bucketname>/<objectname>, and once you have the URL, you can change the URL's host style and the link still works). But, it was easy enough to add logic for, so I kept it. The only downside I can think of is that supporting this for V2 signatures doesn't steer people as strongly toward using V4 signatures. Thoughts?

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting discussion on design choice.

@@ -1035,7 +1035,8 @@ public static BlobListOption fields(BlobField... fields) {
EXT_HEADERS,
SERVICE_ACCOUNT_CRED,
SIGNATURE_VERSION,
HOST_NAME
HOST_NAME,
VIRTUAL_HOST_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houglum, instead of creating a new option. Could you update the generated URL instead to use the virtual host name?

If it's considered better practice to use bucket.storage.googleapis.com/, I'd push for implementations to use this format instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think changing the resulting URLs might break some users, e.g. those who have special networking rules set up? What if a user is utilizing VPC and has a rule to allow requests with the host set to storage.googleapis.com? Suddenly, their requests to signed URLs begin to fail because they're using the host somebucket.storage.googleapis.com.

I do agree that it's better practice to use the virtual-hosted style of URL though. If you think the potential to break existing users is minimal, I'd be fine with changing the default URL format to virtual-hosted style. If we do that, do we want to add an option to allow people to use the old path-style format?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI for those following along: We've written up a doc to summarize our options and get input on what's the most desirable approach here.

This allows creating a signed URL in path-style (bucket in the path
component of the URI instead of in the hostname, i.e. virtual
hosted-style). Note that this was already the default behavior, but
adding this method allows us to more easily switch the default behavior
in the future; virtual hosted-style is considered best practice.
@houglum
Copy link
Author

houglum commented Sep 16, 2019

@frankyn - I added the withPathStyle() decorator, as discussed in the design doc mentioned above. Looks like all the tests passed except the code formatter -- thought I ran it, but I guess not. I'll re-run it and send a follow-up commit.

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly have concerns with the naming of virtual host and path style. Otherwise it's looking solid to me.

PTAL and let me know what you think.

HOST_NAME
HOST_NAME,
PATH_STYLE,
VIRTUAL_HOST_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the names, are these expected from product?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed up on Hangouts; pending feedback in the design doc. However, I've gone ahead and renamed these methods/options to be VIRTUAL_HOSTED_STYLE (and withVirtualHostedStyle) and PATH_STYLE (and withPathStyle).

*
* @see <a href="https://cloud.google.com/storage/docs/request-endpoints">Request Endpoints</a>
*/
public static SignUrlOption withVirtualHostName(String virtualHostName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this overload is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need it unless someone asks; I was mostly trying to plan ahead in case anyone needed support for CNAME-based signed URLs. I went ahead and removed it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha, what about withHostName()? Doesn't this cover this case?

Copy link
Author

@houglum houglum Sep 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I thought withHostName() only was useful for devs, e.g. sending requests to staging/test endpoints... meaning this would be interchangeable with anywhere you'd use storage.googleapis.com in a URL. But looking at #3140, it seems this was written to be used with CDNs.

But, what's confusing is... from all the info I can find (our docs, along with external CDN docs) on CNAME'ing and using CDNs with GCS, you're supposed to make the CDN or CNAME record to point at <bucket_name>.storage.googleapis.com (virtual hosted-style URL)... but from reading the code, it's producing path-style URLs (https://myurl.somealias.com/this-bucket-name-is-in-the-path/object-name).

From a semantics standpoint, this should be used in the manner I mentioned at the top. However, I'm not actually sure how it's being used by the user(s) that requested this functionality. I'll comment and ask on the issue they filed, but I'm not sure how likely I am to get a reply 🙃

Note that if they are using it in the manner that's being tested (bucket-in-path), then whenever we switch the default URL style to virtual hosted-style (in the future), it will break them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the tests again, when using withHostName(String), we're definitely expecting to replace the normal hostname and for the bucket to be in the path portion of the URL; this applies for scenarios where we're using withHostName() and generating a path-style, V2-signed URL. None of the other scenarios are currently tested. That being said, I've done manual testing with other hostnames (i.e. against our test environment) and am confident that what I've got coded up is correct behavior. If users need a different host header to be used (but note that from all cases I can find, this would cause the signature to become invalid), they can override it using the withExtHeaders() method.

  • For V2 and V4 path-style URLs, the hostname should change, and the bucket should be in the path.
    • It looks like we don't even bother signing the "host" header for V2 URLs. But for V4 URLs, the "host" header should be set to the new hostname. We haven't been doing that, but I just added a fix for that in another commit. This is another option combination that we need a test for.
  • For V2 and V4 virtual hosted-style URLs, the hostname should have the bucket name (plus a period), as supplied in the BlobInfo, prepended to it, and the bucket should not appear in the path. We currently have no tests for virtual hosted-style URLs, but I've manually tested them with V2 and V4 signing, both with and without an alternative hostname (I used our test environment and its hostname for testing).
    • Again, V2 URLs don't care about the host since we don't sign it. But V4 URLs should set the host to the virtual hosted-style hostname; I've added a fix to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the bug with cname usage with v4 signature and the host header.

IIUC, with the overloads you suggested the following would be simplified with the example below.

Without overloads

Storage storage = StorageOptions.getDefaultInstance().getService();

String bucketName = "my-bucket";
String objectName = "my-object";

// Define resource
BlobInfo blobinfo = BlobInfo.newBuilder(BlobId.of(bucketName, objectName)).build();

// Generate Signed URL
URL url =
    storage.signUrl(blobinfo, 15, TimeUnit.MINUTES,
                               Storage.SignUrlOption.withV4Signature(),
                               Storage.SignUrlOption.withHostName("storage.example.com"),
                               Storage.SignUrlOption.withVirtualHostName());

// Effectively being:
// <bucket_name>.storage.example.com

With Overloads

Storage storage = StorageOptions.getDefaultInstance().getService();

String bucketName = "my-bucket";
String objectName = "my-object";

// Define resource
BlobInfo blobinfo = BlobInfo.newBuilder(BlobId.of(bucketName, objectName)).build();

// Generate Signed URL
URL url =
    storage.signUrl(blobinfo, 15, TimeUnit.MINUTES,
                               Storage.SignUrlOption.withV4Signature(),
                               Storage.SignUrlOption.withVirtualHostName("storage.example.com"));

// Effectively using:
// <bucket_name>.storage.example.com

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "with overloads" part is currently incorrect; what I had before (supplying a hostname string arg to the withVirtualHostedStyle method) would be the correct way to do the "with overloads" part, since the bucket name is the hostname, and you wouldn't want to prepend the bucket name again, e.g. ending up with subdomain.domain.tld.subdomain.domain.tld instead of just subdomain.domain.tld for your hostname.

I took out the overload in the last couple commits; the virtual hosted style currently assumes you won't be using a CNAME hostname. But, we can add the overload in a subsequent commit. Seems worth breaking that functionality out into its own PR. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acking we discussed this yesterday and splitting off overloads into a separate request is a follow-up action.

The Signature V2 doesn't require us to sign the "host" header, but the
Signature V4 process requires it. We need to ensure we're populating and
signing the same host header value as is used in the request URL.
Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- Additionally add an integration test.

* Generate a path-style URL, which places the bucket name in the path portion of the URL
* instead of in the hostname. Note that this cannot be used alongside {@code
* withVirtualHostName()}. Virtual hosted-style URLs, which can be used via the {@code
* withVirtualHostName()} method, should generally be preferred instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example of using withPathStyle() in these docs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it in the signUrl method's docs, since there were already examples of other options there.

* obtained from the resource passed in. Note that this cannot be used alongside {@code
* withHostName()}. For V4 signing, this also sets the "host" header in the canonicalized
* extension headers to the virtual hosted-style host, unless that header is supplied via the
* {@code withExtHeaders()} method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example of using withPathStyle() in these docs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it in the signUrl method's docs, since there were already examples of other options there.

I discovered a bug with one of my previous commits, where I was
incorrectly determining whether to use Signature Version 2. This commit
makes two new (private) methods that determine the correct default
option to use, taking default values into account.
@houglum
Copy link
Author

houglum commented Sep 26, 2019

I amended the existing integration tests to loop over all the URL styles (path, virtual hosted) and test using each of them. The tests uncovered a small bug, which I added fixes for, encapsulated in a couple new methods with TODOs for issue #6362.

@frankyn
Copy link
Member

frankyn commented Sep 27, 2019

Thanks @houglum, we'd need PM checkoff when v4 should be default and work backwards with the documented internal migration plan.

@houglum
Copy link
Author

houglum commented Sep 27, 2019

Yep, I was just saying that for whenever we end up doing the switch, I've isolated a lot of the default-based logic into methods, so it should be easier down the line :)

I think this is ready to be merged, unless you have any more questions/suggestions. Looks like the only presubmit failure is from the Cloud Spanner tests, which seem unrelated.

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One requested change, otherwise it lgtm.

* <p>Example of creating a signed URL that is valid for 2 weeks, using the default credentials
* for signing the URL.
* <p>Example of creating a signed URL that is valid for 1 week, using the default credentials for
* signing the URL, the default signing method (V2), and the default URL style (path-style):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V2 can go longer than 1 week. (also supported no expiration too)

* BlobInfo.newBuilder(bucketName, blobName).build(),
* 1, TimeUnit.DAYS,
* Storage.SignUrlOption.withVirtualHostedStyle());
* }</pre>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cold you provide an example of a result so people could distinguish easier?

Copy link
Author

@houglum houglum Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added example URLs in the other docstrings (for withPathStyle() and withVirtualHostedStyle()). It seems like the specifics belong there, rather than in the signUrl() docstring. Thoughts?

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @houglum for your patience.

@frankyn frankyn merged commit 6db5a9f into googleapis:master Oct 1, 2019
houglum added a commit to houglum/maven-multimodule-google-cloud-java that referenced this pull request Oct 2, 2019
googleapis/google-cloud-java#6183 was merged
into google-cloud-java, and it changed some of the syntax (e.g. the
method name for virtual hosted urls changed, and explicitly specifying
the virtual hostname is no longer possible with that method). The syntax
updates have been reflected in the main-module examples.

This also updates the google-cloud-java submodule to use the merged
"master" branch ("master" of my fork is up-to-date with "master" of the
googleapis upstream as of today), along with changing main module to
depend on the newly-updated version google-cloud-storage
(1.93.1-SNAPSHOT).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants