New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS repository snapshot fails intermittently on some shards "Failed to check if blob exists" java.io.IOException: insufficient data written #26636

Closed
hoffoo opened this Issue Sep 13, 2017 · 11 comments

Comments

Projects
None yet
4 participants
@hoffoo

hoffoo commented Sep 13, 2017

Elasticsearch version (bin/elasticsearch --version): 5.5.1

Plugins installed: [repository-gcs discovery-gce]

JVM version (java -version): 1.8.0_131

OS version (uname -a if on a Unix-like system): Linux XXX 4.10.0-27-generic #30~16.04.2-Ubuntu SMP Thu Jun 29 16:07:46 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Creating a snapshot fails on certain shards. Retrying a new snapshot works. For me it seems to fail on about 10% of shards (testing with 51 shards, 4 failed last test, 2 when I retried, finally 0 on the third try)

The exception is IndexShardSnapshotFailedException[BlobStoreException[Failed to check if blob [__79.part4] exists]; nested: SocketTimeoutException[Read timed out];]; nested: BlobStoreException[Failed to check if blob [__79.part4] exists]; nested: SocketTimeoutException[Read timed out];

This is using gcs coldstorage.

I see that there are further options i can give the plugin, mainly http.connect_timeout and http.read_timeout, but im not sure if they are relevant for the exception below: java.io.IOException: insufficient data written

I wouldn't mind this failing if I could detect it and retry. Could I do this by deleting the snapshot and recreating it? From what I understand the successfully backed up shards will not be deleted if I did this?

Steps to reproduce:

  1. Create a gcs snapshot with these settings {"gcs":{"type":"gcs","settings":{"bucket":"XXXX","compress":"true"}}}

Provide logs (if relevant):

org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to perform snapshot (index files)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1377) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:972) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:382) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:88) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:335) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.1.jar:5.5.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.io.IOException: insufficient data written
	at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3540) ~[?:?]
	at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81) ~[?:?]
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972) ~[?:?]
	at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545) ~[?:?]
	at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:417) ~[?:?]
	at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) ~[?:?]
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427) ~[?:?]
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) ~[?:?]```
@imotov

This comment has been minimized.

Show comment
Hide comment
@imotov

imotov Sep 14, 2017

Member

@tlrx this looks very GCE-specific and I am traveling until the end of the week. Could you take a look?

Member

imotov commented Sep 14, 2017

@tlrx this looks very GCE-specific and I am traveling until the end of the week. Could you take a look?

@imotov

This comment has been minimized.

Show comment
Hide comment
@imotov

imotov Sep 19, 2017

Member

According to googleapis/google-http-java-client#333 the error message insufficient data written that we are seeing hides the real issue. It seems the situation was improved in google-http-java-client v1.24. We are still on 1.20.0, which is almost 2 years old. @tlrx, should we upgrade the plugin to the latest?

Member

imotov commented Sep 19, 2017

According to googleapis/google-http-java-client#333 the error message insufficient data written that we are seeing hides the real issue. It seems the situation was improved in google-http-java-client v1.24. We are still on 1.20.0, which is almost 2 years old. @tlrx, should we upgrade the plugin to the latest?

@tlrx

This comment has been minimized.

Show comment
Hide comment
@tlrx

tlrx Sep 20, 2017

Member

@imotov Sorry, I didn't manage to find the time to look at this. I think it makes sense to upgrade the dependency but we'll have to wait for 1.24 to be released.

Member

tlrx commented Sep 20, 2017

@imotov Sorry, I didn't manage to find the time to look at this. I think it makes sense to upgrade the dependency but we'll have to wait for 1.24 to be released.

@imotov

This comment has been minimized.

Show comment
Hide comment
@imotov

imotov Sep 20, 2017

Member

You are right, for some reason I thought they are already did, but it looks we will have to wait quite a bit.

Member

imotov commented Sep 20, 2017

You are right, for some reason I thought they are already did, but it looks we will have to wait quite a bit.

@imotov

This comment has been minimized.

Show comment
Hide comment
@imotov

imotov Sep 25, 2017

Member

@tlrx can you think of any way to move this thing forward besides waiting for 1.24 release? The patch was merged almost a year ago and it still didn't make it into any releases.

Member

imotov commented Sep 25, 2017

@tlrx can you think of any way to move this thing forward besides waiting for 1.24 release? The patch was merged almost a year ago and it still didn't make it into any releases.

@tlrx

This comment has been minimized.

Show comment
Hide comment
@tlrx

tlrx Sep 26, 2017

Member

@imotov They recently updated the development version of the lib so I asked here if they will release 1.23 soon. Let's wait a bit for an answer, ok?

Member

tlrx commented Sep 26, 2017

@imotov They recently updated the development version of the lib so I asked here if they will release 1.23 soon. Let's wait a bit for an answer, ok?

@bw2

This comment has been minimized.

Show comment
Hide comment
@bw2

bw2 Nov 5, 2017

I'm currently also blocked by this.

bw2 commented Nov 5, 2017

I'm currently also blocked by this.

@bw2

This comment has been minimized.

Show comment
Hide comment
@bw2

bw2 Nov 5, 2017

Using the following jar versions

bash-4.3# ls -1 repository-gcs/
commons-codec-1.10.jar
commons-logging-1.1.3.jar
google-api-client-1.21.0.jar
google-api-services-storage-v1-rev66-1.21.0.jar
google-http-client-1.21.0.jar
google-http-client-jackson2-1.21.0.jar
google-oauth-client-1.21.0.jar
httpclient-4.5.2.jar
httpcore-4.4.5.jar
repository-gcs-5.6.3.jar
...

bw2 commented Nov 5, 2017

Using the following jar versions

bash-4.3# ls -1 repository-gcs/
commons-codec-1.10.jar
commons-logging-1.1.3.jar
google-api-client-1.21.0.jar
google-api-services-storage-v1-rev66-1.21.0.jar
google-http-client-1.21.0.jar
google-http-client-jackson2-1.21.0.jar
google-oauth-client-1.21.0.jar
httpclient-4.5.2.jar
httpcore-4.4.5.jar
repository-gcs-5.6.3.jar
...

@bw2

This comment has been minimized.

Show comment
Hide comment
@bw2

bw2 Nov 6, 2017

Increasing max_snapshot_bytes_per_sec as below seems to improve % of successful shards in snapshot, but I still get the error for 1 or 2 shards on large indices (300gb+):

body = {
            "type": "gcs",
            "settings": {
                "bucket": bucket,
                "base_path": base_path,
                "compress": True,
                "chunk_size": "100mb",
                "max_snapshot_bytes_per_sec": "1tb",
            }
        }
es.snapshot.create_repository(repository=snapshot_repo, body=body)

bw2 commented Nov 6, 2017

Increasing max_snapshot_bytes_per_sec as below seems to improve % of successful shards in snapshot, but I still get the error for 1 or 2 shards on large indices (300gb+):

body = {
            "type": "gcs",
            "settings": {
                "bucket": bucket,
                "base_path": base_path,
                "compress": True,
                "chunk_size": "100mb",
                "max_snapshot_bytes_per_sec": "1tb",
            }
        }
es.snapshot.create_repository(repository=snapshot_repo, body=body)

@imotov imotov removed their assignment Nov 6, 2017

tlrx added a commit to tlrx/elasticsearch that referenced this issue Nov 14, 2017

Update Google SDK to version 1.23
This commit updates the google-api-client library to version 1.23.

Closes elastic#26636

tlrx added a commit that referenced this issue Nov 15, 2017

Update Google SDK to version 1.23 (#27381)
This commit updates the google-api-client library to version 1.23.0.

Related to #26636

tlrx added a commit that referenced this issue Nov 15, 2017

Update Google SDK to version 1.23 (#27381)
This commit updates the google-api-client library to version 1.23.0.

Related to #26636

tlrx added a commit that referenced this issue Nov 15, 2017

Update Google SDK to version 1.23 (#27381)
This commit updates the google-api-client library to version 1.23.0.

Related to #26636

tlrx added a commit that referenced this issue Nov 15, 2017

Update Google SDK to version 1.23 (#27381)
This commit updates the google-api-client library to version 1.23.0.

Related to #26636
@tlrx

This comment has been minimized.

Show comment
Hide comment
@tlrx

tlrx Nov 15, 2017

Member

Thanks @bw2 and @hoffoo for your feedback.

A new version (1.23.0) of google-http-client has been released in October 2017. I updated the versions used in the repository-gcs and discovery-gce plugins in #27381. This 1.23.0 version includes the change googleapis/google-http-java-client#333 so the underlying exception should bubble up instead of being hidden by the insufficient data written exception.

At that stage, I suggest to close this issue for now and to wait for more tests and feedback on plugins that use the new version of google-http-java-client. This will be released in Elasticsearch 6.0.1 (and potentially in 5.6.5 if this version is released).

Member

tlrx commented Nov 15, 2017

Thanks @bw2 and @hoffoo for your feedback.

A new version (1.23.0) of google-http-client has been released in October 2017. I updated the versions used in the repository-gcs and discovery-gce plugins in #27381. This 1.23.0 version includes the change googleapis/google-http-java-client#333 so the underlying exception should bubble up instead of being hidden by the insufficient data written exception.

At that stage, I suggest to close this issue for now and to wait for more tests and feedback on plugins that use the new version of google-http-java-client. This will be released in Elasticsearch 6.0.1 (and potentially in 5.6.5 if this version is released).

@tlrx tlrx closed this Nov 15, 2017

@bw2

This comment has been minimized.

Show comment
Hide comment
@bw2

bw2 commented Nov 15, 2017

Thanks @tlrx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment