New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GKE Workload Identity support for GCS backups repository plugin. #5230
Comments
I followed the GKE documentation:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch-gcs-sample
namespace: test-gcs
spec:
version: 7.15.2
nodeSets:
- name: default
config:
node.roles: ["master", "data", "ingest", "ml"]
node.store.allow_mmap: false
podTemplate:
metadata:
labels:
# additional labels for pods
foo: bar
spec:
initContainers:
- name: install-plugins
command:
- sh
- -c
- |
bin/elasticsearch-plugin install --batch repository-gcs
automountServiceAccountToken: true
serviceAccountName: gcs-sa
count: 3
{
"acknowledged" : true
}
{
"snapshot" : {
"snapshot" : "my_snapshot",
"uuid" : "p3FNqAoJQ9Wv9e4Fe3FFsg",
"repository" : "my_gcs_repository",
"version_id" : 7150299,
"version" : "7.15.2",
"indices" : [
".ds-ilm-history-5-2022.01.05-000001",
".kibana-event-log-7.15.2-000001",
".kibana_7.15.2_001",
".apm-custom-link",
"kibana_sample_data_flights",
".geoip_databases",
"kibana_sample_data_ecommerce",
".tasks",
".kibana_security_session_1",
"kibana_sample_data_logs",
".kibana_task_manager_7.15.2_001",
".apm-agent-configuration",
".security-7"
],
"data_streams" : [
"ilm-history-5"
],
... It seems to be working as expected. Do you think there is anything else to do to support workload identity ? |
Hm. This is probably what I need:
I will try to specify it in my config. |
Thank you @barkbay ! |
@barkbay this is likely very useful in our docs, would you mind documenting this? |
I found another detail. Adding this was not enough:
Looks like ES should be also upgraded to at least 7.13.0 because of this MR - elastic/elasticsearch#71239, which was included in 7.13.0 . My cluster was 7.6.x so it didn't work even when I added those parameters to a CRD. @barkbay thanks again! |
I did some additional testing today. {"type": "server", "timestamp": "2022-01-14T13:52:37,369Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch-sample", "node.name": "elasticsearch-sample-es-all-2", "message": "failing shard [failed shard, shard [restored-metricbeat-7.15.0-2022.01.14-000001][0], node[A4ho5S_YQuGpIkEEkB-ymg], [P], recovery_source[snapshot recovery [YPBPtXOnTu-dIpSWnyQz_Q] from my_gcs_repository:2022.01.14-metricbeat-7.15.0-2022.01.14-000001-metricbeat-0zxvjnt8s9cu8kcsymsx3a/gzA1H59gRLGPozqnWZvfqQ], s[INITIALIZING], a[id=RgS957QlQUS9_-PrYJ7fGQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2022-01-14T13:52:37.175Z], failed_attempts[4], failed_nodes[[A4ho5S_YQuGpIkEEkB-ymg]], delayed=false, details[failed shard on node [A4ho5S_YQuGpIkEEkB-ymg]: failed recovery, failure RecoveryFailedException[[restored-metricbeat-7.15.0-2022.01.14-000001][0]: Recovery failed on {elasticsearch-sample-es-cold-0}{A4ho5S_YQuGpIkEEkB-ymg}{QW91sl1ERlSVj26-2vZMgg}{10.XX.XX.XX}{10.XX.XX.XX:9300}{c}{k8s_node_name=gke-barkbay-auto-gcs-default-pool-xxxxx, xpack.installed=true, transform.node=false}]; nested: NotSerializableExceptionWrapper[storage_exception: 401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/my-gcs-repository/o/indices%2F9t8ndNlyQXuauWtDKx0NVw%2F0%2Fsnap-gzA1H59gRLGPozqnWZvfqQ.dat?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object.]; nested: IOException[401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/my-gcs-repository/o/indices%2F9t8ndNlyQXuauWtDKx0NVw%2F0%2Fsnap-gzA1H59gRLGPozqnWZvfqQ.dat?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object.]; ], allocation_status[fetching_shard_data]], expected_shard_size[46031607], message [failed recovery], failure [RecoveryFailedException[[restored-metricbeat-7.15.0-2022.01.14-000001][0]: Recovery failed on {elasticsearch-sample-es-cold-0}{A4ho5S_YQuGpIkEEkB-ymg}{QW91sl1ERlSVj26-2vZMgg}{10.XX.XX.XX}{10.XX.XX.XX:9300}{c}{k8s_node_name=gke-barkbay-auto-gcs-default-pool-xxxxx, xpack.installed=true, transform.node=false}]; nested: NotSerializableExceptionWrapper[storage_exception: 401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/my-gcs-repository/o/indices%2F9t8ndNlyQXuauWtDKx0NVw%2F0%2Fsnap-gzA1H59gRLGPozqnWZvfqQ.dat?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object.]; nested: IOException[401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/my-gcs-repository/o/indices%2F9t8ndNlyQXuauWtDKx0NVw%2F0%2Fsnap-gzA1H59gRLGPozqnWZvfqQ.dat?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object.]; ], markAsStale [true]]", "cluster.uuid": "lKo5VXDGRGeIg6aQAIEZ5g", "node.id": "UJb53T7TTVq3Z65PAmdrkA" ,
"stacktrace": ["org.elasticsearch.indices.recovery.RecoveryFailedException: [restored-metricbeat-7.15.0-2022.01.14-000001][0]: Recovery failed on {elasticsearch-sample-es-cold-0}{A4ho5S_YQuGpIkEEkB-ymg}{QW91sl1ERlSVj26-2vZMgg}{10.XX.XX.XX}{10.XX.XX.XX:9300}{c}{k8s_node_name=gke-barkbay-auto-gcs-default-pool-xxxxx, xpack.installed=true, transform.node=false}",
"at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$21(IndexShard.java:3234) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:144) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:306) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:2358) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:3160) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:771) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]",
"at java.lang.Thread.run(Thread.java:833) [?:?]",
"Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: storage_exception: 401 Unauthorized",
"GET https://storage.googleapis.com/download/storage/v1/b/my-gcs-repository/o/indices%2F9t8ndNlyQXuauWtDKx0NVw%2F0%2Fsnap-gzA1H59gRLGPozqnWZvfqQ.dat?alt=media",
"Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.",
"at com.google.cloud.storage.StorageException.translate(StorageException.java:97) ~[?:?]",
"at org.elasticsearch.repositories.gcs.GoogleCloudStorageRetryingInputStream.lambda$openStream$3(GoogleCloudStorageRetryingInputStream.java:122) ~[?:?]",
"at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105) ~[?:?]",
"at com.google.cloud.RetryHelper.run(RetryHelper.java:76) ~[?:?]",
"at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) ~[?:?]",
"at org.elasticsearch.repositories.gcs.GoogleCloudStorageRetryingInputStream.openStream(GoogleCloudStorageRetryingInputStream.java:104) ~[?:?]",
"at org.elasticsearch.repositories.gcs.GoogleCloudStorageRetryingInputStream.<init>(GoogleCloudStorageRetryingInputStream.java:83) ~[?:?]",
"at org.elasticsearch.repositories.gcs.GoogleCloudStorageRetryingInputStream.<init>(GoogleCloudStorageRetryingInputStream.java:65) ~[?:?]",
"at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.readBlob(GoogleCloudStorageBlobStore.java:210) ~[?:?]",
"at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainer.readBlob(GoogleCloudStorageBlobContainer.java:63) ~[?:?]",
"at org.elasticsearch.common.blobstore.support.FilterBlobContainer.readBlob(FilterBlobContainer.java:48) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.xpack.searchablesnapshots.store.SearchableSnapshotDirectory$RateLimitingBlobContainer.readBlob(SearchableSnapshotDirectory.java:762) ~[?:?]",
"at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.read(ChecksumBlobStoreFormat.java:88) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.repositories.blobstore.BlobStoreRepository.loadShardSnapshot(BlobStoreRepository.java:3367) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.xpack.searchablesnapshots.store.SearchableSnapshotDirectory.lambda$create$15(SearchableSnapshotDirectory.java:666) ~[?:?]",
"at org.elasticsearch.common.util.LazyInitializable.maybeCompute(LazyInitializable.java:92) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.common.util.LazyInitializable.getOrCompute(LazyInitializable.java:70) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.xpack.searchablesnapshots.store.SearchableSnapshotDirectory.loadSnapshot(SearchableSnapshotDirectory.java:227) ~[?:?]",
"at org.elasticsearch.xpack.searchablesnapshots.allocation.SearchableSnapshotIndexEventListener.ensureSnapshotIsLoaded(SearchableSnapshotIndexEventListener.java:75) ~[?:?]",
"at org.elasticsearch.xpack.searchablesnapshots.allocation.SearchableSnapshotIndexEventListener.beforeIndexShardRecovery(SearchableSnapshotIndexEventListener.java:67) ~[?:?]",
"at org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:276) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1733) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:505) ~[elasticsearch-7.16.0.jar:7.16.0]",
"at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:301) ~[elasticsearch-7.16.0.jar:7.16.0]",
"... 8 more",
"Caused by: java.io.IOException: 401 Unauthorized", I'll try to investigate later. |
Root cause is the {
"type": "server",
"timestamp": "2022-01-17T16:09:49,963Z",
"level": "WARN",
"component": "o.e.r.g.GoogleCloudStorageService",
"cluster.name": "elasticsearch-sample",
"node.name": "elasticsearch-sample-es-cold-1",
"message": "failed to load default project id",
"cluster.uuid": "lKo5VXDGRGeIg6aQAIEZ5g",
"node.id": "lC0hhssQRfW9fdow0zYZJg",
"stacktrace": ["java.security.AccessControlException: access denied (\"java.lang.RuntimePermission\" \"accessDeclaredMembers\")",
"at java.security.AccessControlContext.checkPermission(AccessControlContext.java:485) ~[?:?]",
"...",
"at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainer.readBlob(GoogleCloudStorageBlobContainer.java:63) [repository-gcs-7.16.1.jar:7.16.1]",
"at org.elasticsearch.common.blobstore.support.FilterBlobContainer.readBlob(FilterBlobContainer.java:48) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.xpack.searchablesnapshots.store.SearchableSnapshotDirectory$RateLimitingBlobContainer.readBlob(SearchableSnapshotDirectory.java:763) [searchable-snapshots-7.16.1.jar:7.16.1]"
]
} |
Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo. The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault` are get called under the SecurityManager (which they should because they perform network calls). Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification automatically, which works around the bug. Writing a unit test also seems not possible, because `ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable. See elastic/cloud-on-k8s#5230 Resolves elastic#82702
* Support GKE Workload Identity for Searchable Snapshots Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo. The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault` are get called under the SecurityManager (which they should because they perform network calls). Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification automatically, which works around the bug. Writing a unit test also seems not possible, because `ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable. See elastic/cloud-on-k8s#5230 Resolves #82702
* Support GKE Workload Identity for Searchable Snapshots Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo. The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault` are get called under the SecurityManager (which they should because they perform network calls). Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification automatically, which works around the bug. Writing a unit test also seems not possible, because `ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable. See elastic/cloud-on-k8s#5230 Resolves elastic#82702
* Support GKE Workload Identity for Searchable Snapshots Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo. The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault` are get called under the SecurityManager (which they should because they perform network calls). Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification automatically, which works around the bug. Writing a unit test also seems not possible, because `ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable. See elastic/cloud-on-k8s#5230 Resolves elastic#82702
* Support GKE Workload Identity for Searchable Snapshots Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo. The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault` are get called under the SecurityManager (which they should because they perform network calls). Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification automatically, which works around the bug. Writing a unit test also seems not possible, because `ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable. See elastic/cloud-on-k8s#5230 Resolves #82702
Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo. The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault` are get called under the SecurityManager (which they should because they perform network calls). Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification automatically, which works around the bug. Writing a unit test also seems not possible, because `ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable. See elastic/cloud-on-k8s#5230 Resolves #82702
* Support GKE Workload Identity for Searchable Snapshots Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo. The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault` are get called under the SecurityManager (which they should because they perform network calls). Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification automatically, which works around the bug. Writing a unit test also seems not possible, because `ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable. See elastic/cloud-on-k8s#5230 Resolves elastic#82702
* Support GKE Workload Identity for Searchable Snapshots Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo. The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault` are get called under the SecurityManager (which they should because they perform network calls). Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification automatically, which works around the bug. Writing a unit test also seems not possible, because `ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable. See elastic/cloud-on-k8s#5230 Resolves elastic#82702
Hi.
Is there any plan for adding support for GKE Workload Identity to GCS backups repository plugin?
I know what there is a way to configure credintials for GCS repository plugin using Service Account credential files but it creates additional toil. And Workload Identity is currently a recommended way (by Google) for authenticating GKE workloads to Google Cloud services - https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity
Asking here because all other discussions which I found are outdated and have very little info:
Thanks.
The text was updated successfully, but these errors were encountered: