Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with creating binary on custom s3 instance #1314

Closed
BlackLotus opened this issue Mar 19, 2018 · 4 comments
Closed

problems with creating binary on custom s3 instance #1314

BlackLotus opened this issue Mar 19, 2018 · 4 comments

Comments

@BlackLotus
Copy link

BlackLotus commented Mar 19, 2018

I know this isn't officially supported, but ever since you added modeshape 5.4.0 it's theoretical possible.
I added a custom s3 instance through creating a repository.json with

        "username" : "${aws.accessKeyId}",
        "password" : "${aws.secretKey}",
        "bucketName" : "${aws.bucket}",
        "endPoint" : "${aws.endpoint}"

in it. And tested it on a ceph s3 server through adding

 -Daws.bucket=fcrepo -Daws.accessKeyId=XXXXXXXXXX \
-Daws.secretKey=YYYYYYYYYYYYYYYYYYYYYYYYYYYYYY \
-Daws.endpoint=http://localhost:7480

But the binary upload doesn't work. Here is the traffic log

HEAD / HTTP/1.1
Host: fcrepo.localhost:7480
x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Authorization: AWS4-HMAC-SHA256 Credential=XXXXXXXXXX/20180319/us-east-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-retry;content-type;host;user-agent;x-amz-content-sha256;x-amz-date, Signature=88dc9e55a92ba296cf13492ffabd2a86ff779ee67bfd92a048c22e64569c8fc9
X-Amz-Date: 20180319T175017Z
User-Agent: aws-sdk-java/1.11.95 Linux/4.15.9-1-ARCH OpenJDK_64-Bit_Server_VM/25.162-b12/1.8.0_162
amz-sdk-invocation-id: 18d273db-2e74-e2fc-12fc-bfbf78523662
amz-sdk-retry: 0/0/500
Content-Type: application/octet-stream
Connection: Keep-Alive

HTTP/1.1 200 OK
x-amz-request-id: tx000000000000000000128-005aaff859-bf992-default
Content-Type: application/xml
Content-Length: 0
Date: Mon, 19 Mar 2018 17:50:17 GMT
Connection: Keep-Alive

HEAD /972a0a71a4996707f3f4df56bb1cf7eae59f7539 HTTP/1.1
Host: fcrepo.localhost:7480
x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Authorization: AWS4-HMAC-SHA256 Credential=XXXXXXXXXX/20180319/us-east-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-retry;content-type;host;user-agent;x-amz-content-sha256;x-amz-date, Signature=5bc26040d68afed442758762cff31ed91a5d1c0cf5459018cdaae508d6543ec5
X-Amz-Date: 20180319T175028Z
User-Agent: aws-sdk-java/1.11.95 Linux/4.15.9-1-ARCH OpenJDK_64-Bit_Server_VM/25.162-b12/1.8.0_162
amz-sdk-invocation-id: 076b989f-a7ba-108a-4845-b895b742cfdc
amz-sdk-retry: 0/0/500
Content-Type: application/octet-stream
Connection: Keep-Alive

HTTP/1.1 404 Not Found
Content-Length: 252
x-amz-request-id: tx000000000000000000129-005aaff864-bf992-default
Accept-Ranges: bytes
Content-Type: application/xml
Date: Mon, 19 Mar 2018 17:50:28 GMT
Connection: Keep-Alive

and here is the fcrepo error log

org.modeshape.jcr.value.binary.BinaryStoreException: com.amazonaws.ResetException: Failed to reset the input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
    at org.modeshape.jcr.value.binary.S3BinaryStore.storeValue(S3BinaryStore.java:231)
    at org.modeshape.jcr.value.binary.AbstractBinaryStore.storeValue(AbstractBinaryStore.java:251)
    at org.modeshape.jcr.value.binary.BinaryStoreValueFactory.create(BinaryStoreValueFactory.java:257)
    at org.modeshape.jcr.value.binary.BinaryStoreValueFactory.create(BinaryStoreValueFactory.java:49)
    at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:149)
    at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:41)
    at org.fcrepo.kernel.modeshape.FedoraBinaryImpl.setContent(FedoraBinaryImpl.java:187)
    at org.fcrepo.http.api.ContentExposingResource.replaceResourceBinaryWithStream(ContentExposingResource.java:784)
    at org.fcrepo.http.api.FedoraLdp.createObject(FedoraLdp.java:608)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:816)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:513)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1113)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1047)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
    at org.eclipse.jetty.server.Server.handle(Server.java:517)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:302)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:238)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:57)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.ResetException: Failed to reset the input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
    at com.amazonaws.services.s3.internal.AWSS3V4Signer.getContentLength(AWSS3V4Signer.java:196)
    at com.amazonaws.services.s3.internal.AWSS3V4Signer.calculateContentHash(AWSS3V4Signer.java:103)
    at com.amazonaws.auth.AWS4Signer.sign(AWS4Signer.java:213)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1155)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4141)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1723)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1586)
    at org.modeshape.jcr.value.binary.S3BinaryStore.storeValue(S3BinaryStore.java:220)
    ... 58 more
Caused by: java.io.IOException: Resetting to invalid mark
    at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
    at org.modeshape.jcr.value.binary.SharedLockingInputStream$8.call(SharedLockingInputStream.java:238)
    at org.modeshape.jcr.value.binary.SharedLockingInputStream$8.call(SharedLockingInputStream.java:233)
    at org.modeshape.jcr.value.binary.SharedLockingInputStream.doOperation(SharedLockingInputStream.java:263)
    at org.modeshape.jcr.value.binary.SharedLockingInputStream.reset(SharedLockingInputStream.java:233)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:76)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
    at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
    at com.amazonaws.services.s3.internal.AWSS3V4Signer.getContentLength(AWSS3V4Signer.java:194)
    ... 73 more

My guess is that this is a modeshape problem, but since modeshape doesn't have an issue tracker I thought of asking here first.
Oh and here is the console log

INFO 18:57:44.567 (FedoraLdp) GET resource ''
INFO 18:57:59.459 (FedoraLdp) Ingest with path: /f8/fa/34/a0/f8fa34a0-4910-4d95-873f-32c64c2c2df2
WARN 18:57:59.539 (AmazonS3Client) No content length specified for stream data.  Stream contents will be buffered in memory and could result in out of memory errors.
ERROR 18:57:59.549 (RepositoryExceptionMapper) Caught a repository exception: com.amazonaws.ResetException: Failed to reset the input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
INFO 18:59:21.168 (FedoraLdp) Ingest with path: /0a/ea/6e/f7/0aea6ef7-15d4-4aa1-9680-bbd8f86d26fd
WARN 18:59:21.246 (AmazonS3Client) No content length specified for stream data.  Stream contents will be buffered in memory and could result in out of memory errors.
ERROR 18:59:21.249 (RepositoryExceptionMapper) Caught a repository exception: com.amazonaws.ResetException: Failed to reset the input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
INFO 18:59:29.117 (FedoraLdp) Ingest with path: /d5/18/43/ca/d51843ca-9722-46a2-852d-11207cd42182
WARN 18:59:29.170 (AmazonS3Client) No content length specified for stream data.  Stream contents will be buffered in memory and could result in out of memory errors.
ERROR 18:59:29.173 (RepositoryExceptionMapper) Caught a repository exception: com.amazonaws.ResetException: Failed to reset the input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
INFO 19:01:21.320 (FedoraLdp) GET resource ''
@dbernstein
Copy link
Contributor

@BlackLotus : A few questions: 1) are you able to reliably reproduce the issue? 2) have you tried this against S3 proper? Just looking at the stacktrace, it appears that there was a stream error in transferring between the s3 client and the s3 conforming service. That is the root issue. A secondary issue appears to be that the jcr clean up in the wake of the failure didn't occur for some reason.

It would be helpful if you would create an issue on https://jira.duraspace.org/ with a clear recipe for reproducing the error and against an environment that we could reasonably reproduce (ie against AWS s3.).

@BlackLotus
Copy link
Author

Ok sorry for the delayed response, but since I didn't use aws I didn't think I could reproduce the problem with aws. So here is what I did.

First I got an aws account and simply tried fcrepo with it - it worked.
Next I tried to find out what amazon aws was doing different than ceph for this I configured an nginx reverse proxy and set up my local dns to forward every request to 127.0.0.1 instead of the amazon server. My expectation was that I could easily monitor the traffic going to amazon and the traffic going to ceph and spot either the difference in behaviour or the offending request that didn't work with ceph. What I got instead was the same problem with amazon s3. I tried to mimic amazons s3 behaviour as good as I could and after failing I tried to follow https://aws.amazon.com/de/blogs/compute/nginx-reverse-proxy-sidecar-container-on-amazon-ecs/
Anyway this is one of my nginx configs I came up with

  gzip_proxied any;
  gzip_types text/plain application/json;
  gzip_min_length 1000;

server {
    listen 80;
    server_name .s3-proxy .s3.us-east-1.amazonaws.com;
    client_max_body_size 10g;

    access_log /var/log/nginx/s3proxy-access.log postdata;
    error_log /var/log/nginx/s3proxy-error.log;


    proxy_set_header Host $host;
    proxy_buffering off;

    location / {
        fastcgi_pass_header Authorization;
        fastcgi_pass_request_headers on;
        include fastcgi_params;
	proxy_pass https://52.216.101.21;
      proxy_pass_header Server;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection 'upgrade';
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_cache_bypass $http_upgrade;
    }
}

I also tried other reverse proxy solutions without success. I will later create an issue at jra.duraspace.org, but I would have really liked if I understood why it only seems to work with the original s3.

@emudojo
Copy link

emudojo commented Apr 6, 2021

I thought the custom endpoint flag was not allowed where can I set it, I would like to test this as well but with min.io

@Surfrdan
Copy link
Contributor

Fedora 5 is no longer in active development as Fedora 6 is now the LTS version of Fedora. It looks like the Samvera/Valkerye development was completed to support this though. Closing this here now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants