Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid character '\x1f' looking for beginning of value when uploading via java apis #142

Closed
DanieleSassoli opened this issue Nov 28, 2019 · 28 comments
Labels

Comments

@DanieleSassoli
Copy link

DanieleSassoli commented Nov 28, 2019

Hi, I'm trying to create a bucket in the fake gcs server using the java apis(which don't seem to be documented, happy to contribute with some docs once I get this working) but getting

 invalid character '\x1f' looking for beginning of value
[info]   at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:229)
[info]   at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:272)
[info]   at com.google.cloud.storage.StorageImpl$2.call(StorageImpl.java:121)
[info]   at com.google.cloud.storage.StorageImpl$2.call(StorageImpl.java:118)
[info]   at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
[info]   at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
[info]   at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
[info]   at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:117)

This is the code:

    val transportFactory = new HttpTransportFactory {
      override def create(): HttpTransport =
        new NetHttpTransport.Builder().doNotValidateCertificate().build()
    }

    val storage = StorageOptions.newBuilder
      .setCredentials(NoCredentials.getInstance)
      .setTransportOptions(HttpTransportOptions.newBuilder().setHttpTransportFactory(transportFactory).build())
      .setHost(s"https://$host:$port")
      .setProjectId(projectId)
      .build()
      .getService

    storage.create(BucketInfo.of(bucketName))

Can't really figure out what I'm doing wrong, does anybody have any idea?

@DanieleSassoli
Copy link
Author

DanieleSassoli commented Nov 28, 2019

Just an update on this, as a workaround I decided to try to create a bucket with a file when starting up the container, as documented in the read me, after which I tried doing storage.list(bucketName) which worked, this makes me think that the StorageOptions are correct and something isn't quire right with the server when creating the bucket?

I'm now not able to create files in it due to #84 but that's a different issue.

@fsouza
Copy link
Owner

fsouza commented Dec 9, 2019

Hey @DanieleSassoli, sorry for the delayed response. Yeah currently we don't support the GRPC API. I haven't looked into how much work that would be, do you have an idea? If not, I can take a look over the holidays.

@fsouza
Copy link
Owner

fsouza commented Dec 9, 2019

Opened #146 to track support for the grpc api

@dnatic09
Copy link

dnatic09 commented Dec 9, 2020

Hey Team. I have been investigating this issue. I am leaning towards a gzip compression issue. I am digging through the Java code and it clearly looks like HTTP. Furthermore, the \x1f is likely a character you would see in a gzipped POST payload over HTTP. See screenshot attached.

image

@dnatic09
Copy link

dnatic09 commented Dec 9, 2020

@fsouza , a follow-up. I simply hacked the debugger and set this.disableGzipContent to true. The request WORKED! So, the Fake GCS server is failing to decode gzipped content. Either we need to find a way to disable gzip encoding in the SDK or the GCS server needs to parse the Content-Encoding HTTP header and decompress the data. Can we do this?

@flylo
Copy link

flylo commented Dec 10, 2020

@dnatic09 I've made a bit of progress based on your finding.

I was able to patch the HttpStorageRpc class and use reflection to add a request initializer to the google HTTP client:

public class StorageEmulatorRpc extends HttpStorageRpc {
    public StorageEmulatorRpc(StorageOptions options) {
        super(options);
        try {
            var field = FieldUtils.getField(super.getClass(), "storage", true);
            field.setAccessible(true);
            var transportOptions = (HttpTransportOptions) options.getTransportOptions();
            var transport = transportOptions.getHttpTransportFactory().create();
            var initializer = transportOptions.getHttpRequestInitializer(options);
            var storage = new Storage.Builder(transport, new JacksonFactory(), initializer)
                    .setRootUrl(options.getHost())
                    .setGoogleClientRequestInitializer(request -> request.setDisableGZipContent(true))
                    .setApplicationName(options.getApplicationName())
                    .build();
            field.set(this, storage);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

I then specified a custom ServiceRpcFactory alongside the custom TransportFactory defined above:

var builder = StorageOptions.getDefaultInstance().toBuilder();
builder.setHost(clientOptions.host());
builder.setServiceRpcFactory(StorageEmulatorRpc::new);
builder.setTransportOptions(HttpTransportOptions.newBuilder()
                    .setHttpTransportFactory(new InsecureHttpTransportFactory())
                    .build());
return builder.build().getService();

This allowed me to successfully create both buckets and blobs using the generated storage client:

time="2020-12-10T05:11:11Z" level=info msg="172.17.0.1 - - [10/Dec/2020:05:11:11 +0000] \"POST /storage/v1/b?project=some-project&projection=full HTTP/1.1\" 200 126"
time="2020-12-10T05:11:41Z" level=info msg="172.17.0.1 - - [10/Dec/2020:05:11:41 +0000] \"POST /upload/storage/v1/b/some-bucket/o?projection=full&uploadType=multipart HTTP/1.1\" 200 367"

Obviously it would be superior to just handle gzip on the server side, but I'm going to continue testing with this patch tomorrow to see if I can get this emulator working in a java environment.

@dnatic09
Copy link

@flylo , great work here! I think we may get some traction. Getting Google to accept a PR will likely be difficult. On the flip side, there are only a handful of methods that need to be changed on the server to check for the Content-Encoding header, verify that the content is indeed GZIP encoded, and then inline-decode the content prior to JSON parsing.

I am not familiar in GO, but I imagine it will not be difficult to change. If I have spare time, I'll try it out and submit a PR against this repo.

@flylo
Copy link

flylo commented Dec 10, 2020

@dnatic09 Sorry for the confusion... when I said "server-side", I meant the server inside of this GCS emulator. The problem I'm running into now is this...

The response we get from the emulator for creating a StorageObject is missing the bucket-name:

result = {LinkedHashMap@6402}  size = 9
 "name" -> "some/path/to/file.prototxt"
 "contentType" -> "application/text"
 "contentEncoding" -> ""
 "crc32c" -> "RYvTUg=="
 "md5Hash" -> "koO04f0Ocki0M2g8mfUwqg=="
 "acl" -> {ArrayList@6426}  size = 1
 "created" -> "2020-12-10T14:14:19.488323Z"
 "updated" -> "2020-12-10T14:14:19.488358Z"
 "deleted" -> "0001-01-01T00:00:00Z"

Which results in a NullPointerException when we try to parse the response on the receiving end:

java.lang.NullPointerException
	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:878)
	at com.google.cloud.storage.BlobId.of(BlobId.java:114)
	at com.google.cloud.storage.BlobId.fromPb(BlobId.java:118)
	at com.google.cloud.storage.BlobInfo.fromPb(BlobInfo.java:1160)
	at com.google.cloud.storage.Blob.fromPb(Blob.java:958)
	at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:215)
	at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:171)

Specifically, it fails the checkNotNull(bucket) in the following line:

return new BlobId(checkNotNull(bucket), checkNotNull(name), generation);

I assume the fix here is simple, but I've also never used GO and will need some time to read the code. @fsouza any ideas?

@dnatic09
Copy link

@flylo , I ran into the same problem. The funny thing is that you get an NPE, but the file is actually in the fake GCS server and can be pulled. I already tested it. We should definitely help fix it, but there is definitely a path forward here.

@flylo
Copy link

flylo commented Dec 10, 2020

Have made it a little further, but now we're running into issues where the golang server is returning uppercased field names that the java JSON parser doesn't catch:

objectAccessControl = {ObjectAccessControl@4371}  size = 6
 "Entity" -> "projectOwner"
 "EntityID" -> ""
 "Role" -> "OWNER"
 "Domain" -> ""
 "Email" -> ""
 "ProjectTeam" -> {Object@4393} 

I'm looking for a way to set global JSON-serialization settings but haven't found it yet.

@dnatic09
Copy link

@flylo , GOT IT !

Follow this:

Kind string `json:"kind"`
ID string `json:"id"`
Name string `json:"name"`
Versioning *bucketVersioning `json:"versioning,omitempty"`
TimeCreated string `json:"timeCreated,omitempty"`

Then, you can change how the field is serialized. We do this and the casing will align!

@flylo
Copy link

flylo commented Dec 10, 2020

@dnatic09 nice! Yeah if there's no way to specify a global serialization option (like how you'd inject an ObjectMapper into your REST server in java), then we'll have to patch some of the google-owned structs. I patched ACLRule struct and added my own serialization names, which un-blocked the specific test-case I was working on. That being said, I'm sure there's others. @fsouza if you know of any way to change that casing globally, then we could make this java-compatible with minimal tweaking.

@dnatic09
Copy link

@flylo , I think Java users around the world would be WAY better off with just create-bucket and put-file functions working. The rest can be tacked on later. Your help is MUCH appreciated. There is no way I could have gotten to this anytime soon. I am so backed up with work.

@fsouza
Copy link
Owner

fsouza commented Dec 22, 2020

hey @flylo @dnatic09 thanks for looking into this and apologies for being absent. So a quick of notes / questions:

  • yes we can definitely support gzip, I don't think it would be too hard (opened Support for gzip #388 to track it)
  • regarding the JSON fields, it's kind of a mistake that we're embedding a third-party field in our response, we should fix that (moreover, I just had a quick look and the package we're using is actually deprecated, so a double no-no). That being said, it should it still work, since the Go SDK declares the json struct tags? Or am I missing something? Which request are you sending?

(as a side note, there isn't a way to change that behavior globally, it's one of the issues with Go's struct tags and why I am calling it a mistake to use a third-party struct in our JSON response - our struct is not exported, so we can change this without introducing a breaking change though)

@flylo
Copy link

flylo commented Dec 23, 2020

@fsouza I'm not sure why that was happening if the google go library defines the struct tags... Maybe it has something to do with the fact that you're serializing an array of those ObjectAccessControl instances? That's the only difference I could see.

@dnatic09
Copy link

@fsouza , I noticed that you closed #388 . Was that on purpose or did you take care of it under a different issue?

@fsouza
Copy link
Owner

fsouza commented Dec 23, 2020

@fsouza , I noticed that you closed #388 . Was that on purpose or did you take care of it under a different issue?

Yes, gzip should work now as of f1a575d

Can you give it a shot and let me know if you have any issues?

@dnatic09
Copy link

dnatic09 commented Dec 23, 2020

@fsouza , can you trigger a DockerHub push?

(Push image 1.22.0 as a docker tag in DockerHub)

EDIT: I think I can use latest .

@fsouza
Copy link
Owner

fsouza commented Dec 23, 2020

@dnatic09 ohh it should be there, it's supposed to be automatic. I'll check what's up with github actions.

@dnatic09
Copy link

@fsouza , no joy.
image

I purged latest from my MacBook. Verified the new digest.

REPOSITORY                          TAG           DIGEST                                                                    IMAGE ID       CREATED         SIZE
fsouza/fake-gcs-server              latest        sha256:d464a2c6a6a15c0012aa4b91e8164b6a3c3ace802fbf529e0d5772438b993e2b   c123c1811390   14 hours ago    23MB```

@fsouza
Copy link
Owner

fsouza commented Dec 23, 2020

@dnatic09 do you have a small reproducer that I can run locally? This way I can figure something out that will definitely work (+ maybe add it as a runnable sample in CI).

I'm afk right now, but I'll try to fix this later today once I get to the computer (and figure out what's up with the automatic tagging)

Thanks for checking back!

@dnatic09
Copy link

I can make a reproducer overnight. I will message back when it is ready.

@dnatic09
Copy link

@fsouza (@flylo ), I got it.

https://github.com/dnatic09/Fake-Gcs-Test

Just clone and run mvn test .

One test will pass (rigged CURL).
One test will fail due to GZIP (using library).

@dnatic09
Copy link

dnatic09 commented Jan 4, 2021

Hey team. Just checking in. Any updates?

@fsouza
Copy link
Owner

fsouza commented Jan 4, 2021

Hey @dnatic09, sorry ended-up taking an extended break but I'm back this week. I'll look at your example later today or tomorrow and see if I can figure out what's up.

@dnatic09
Copy link

dnatic09 commented Jan 4, 2021

No worries. Today is my first day back into work since Christmas. Completely understand. Thank you again for engaging on this.

@fsouza fsouza added the bug label Feb 1, 2021
@fsouza
Copy link
Owner

fsouza commented Feb 1, 2021

Ohhh I see what's the problem here. I enabled compressed responses, but compressed requests are not standard and we have to implement it manually. Ugh.

@fsouza
Copy link
Owner

fsouza commented Feb 1, 2021

@dnatic09 tests are passing now 😁

[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.397 s - in dnatic.FakeGcsTests
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  14.186 s
[INFO] Finished at: 2021-01-31T22:37:20-05:00
[INFO] ------------------------------------------------------------------------

I was able to include a test for this, so will skip adding a Java example. Closing this issue for now, feel free to reopen or to open a new one if something doesn't work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants