Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask pod service account access to non public storage (s3, gs buckets) #485

Open
scottyhq opened this issue Dec 2, 2019 · 9 comments
Open

Comments

@scottyhq
Copy link
Member

scottyhq commented Dec 2, 2019

For better security and cost-savings we are moving towards non-public (requester pays) buckets for data storage. To access these buckets on AWS we recently reconfigured the hubs to assign an IAM Role to Kubernetes Service Account. Specifically, the daskkubernetes service account gets an iam role that has a policy for accessing specific buckets in the same region. The daskkubernetes service account gets assigned to jupyterhub users in the pangeo helm chart here: `https://github.com/pangeo-data/helm-chart/blob/56dc755ed0b56ad00571373d70c7fe0eaae5d556/pangeo/values.yaml#L25

This works great for pulling data into a jupyter session, but we're currently encountering errors when loading data with dask workers via s3fs/fsspec. The errors are not always clear as to a permissions issue: returned non-zero exit status 127. and KilledWorker: ('zarr-df194f82d92e97d5d5e60f0de5da8a42', <Worker 'tcp://192.168.169.195:33807', memory: 0, processing: 3>) .

I think the root if the issue is that Dask worker pods currently are assigned the default service account and therefore do not have permissions for accessing non public pangeo datasets.
kubectl get pod -o yaml -n binder-staging dask-scottyhq-pangeo-binder-test-xg8nlaic-f8372c69-9mmg6m | grep service

    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  serviceAccount: default
  serviceAccountName: default

One solution is linking cloud-provider permissions to the default service account, but should we instead create a new service account exclusively for dask worker pods?

pinging @jacobtomlinson @TomAugspurger and @martindurant per @rsignell-usgs and @jhamman 's suggestion

@TomAugspurger
Copy link
Member

I think the root if the issue is that Dask worker pods currently are assigned the default service account and therefore do not have permissions for accessing non public pangeo datasets.

To verify this, you could try

def func():
    # function to load the data. Something like
    fs = s3fs.S3FileSystem()  # rely on the service account
    fs.open("path/to/private/object")

In theory, func() should work on the client, but client.run(func) would fail.

@scottyhq
Copy link
Member Author

scottyhq commented Dec 2, 2019

Thanks @TomAugspurger - forgot to include a code block! Here is output from your test case run on the aws-uswest2 hub:

(s3fs=0.4, dask=2.8.1, botocore=1.13.29)

def func():
    import s3fs
    # function to load the data. Something like
    fs = s3fs.S3FileSystem()  # rely on the service account
    fs.open("pangeo-data-uswest2/esip/NWM2/2017")

client.run(func)
/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/auth.py in add_auth()
    355     def add_auth(self, request):
    356         if self.credentials is None:
--> 357             raise NoCredentialsError
    358         datetime_now = datetime.datetime.utcnow()
    359         request.context['timestamp'] = datetime_now.strftime(SIGV4_TIMESTAMP)

NoCredentialsError: Unable to locate credentials

Note also that the AWS docs suggest a minimum cli version of 1.16.283 to resolve credentials via the service account, which seems to install botocore 1.13.19.

@martindurant
Copy link

It would make sense to me if the dask workers and the normal user interactive pods had the same ownership and permissions. The only difference is that a dask worker would not normally want to create new pods (but it perhaps could).

Is the above situation with dask-kubernetes or dask-gateway?

@scottyhq
Copy link
Member Author

scottyhq commented Dec 2, 2019

It would make sense to me if the dask workers and the normal user interactive pods had the same ownership and permissions. The only difference is that a dask worker would not normally want to create new pods (but it perhaps could).

Agreed. Is it possible for any dask pods created by a user pod to inherit the same service account? A short-term easy fix is to assign all dask pods the daskkubernetes service account in some dask config setting (here? https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/dask_config.yaml#L31). But further down the line it would be useful for each user to have a unique service account / iam role (for granular permissions and cost-tracking), and then it would be best for dask pods to inherit.

Is the above situation with dask-kubernetes or dask-gateway?

dask-kubernetes.

Still haven't tried with dask-gateway. Maybe @jhamman has?

@martindurant
Copy link

I suspect dask-gateway does the right thing here, and yes, I know that trials are underway, but I don't know how far they have progressed. @jcrist would also know both these things.

@jcrist
Copy link
Member

jcrist commented Dec 2, 2019

A short-term easy fix is to assign all dask pods the daskkubernetes service account in some dask config setting (here? https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/dask_config.yaml#L31).

Yeah, that should work. This wouldn't be any less secure than the status-quo, and should get things working for now.

But further down the line it would be useful for each user to have a unique service account / iam role (for granular permissions and cost-tracking), and then it would be best for dask pods to inherit.

This should be doable with dask-gateway, but nothing is builtin. How would you map usernames to IAM roles/service accounts? If there's a way to do this where dask-gateway doesn't need to store and manage this mapping then this should be fairly easy to hack up with no additional changes to the gateway core itself.

@scottyhq
Copy link
Member Author

scottyhq commented Dec 2, 2019

How would you map usernames to IAM roles/service accounts? If there's a way to do this where dask-gateway doesn't need to store and manage this mapping then this should be fairly easy to hack up with no additional changes to the gateway core itself.

I don't think there is a straightforward way to do this currently in Zero2JupyterHubK8s config. See dask/dask-kubernetes#202 (comment) and jupyterhub/kubespawner#304.

  1. If 304 linked above is merged, it would be straightforward to create a per-user IAM Role as part of a pod startup script and link it to the service account in the per-user namespace https://docs.aws.amazon.com/eks/latest/userguide/specify-service-account-role.html

  2. Alternatively it seems possible to have an 'assume role' API call as part of a startup script and inject temporary credentials as environment variables A secrets friendly version of extraEnv jupyterhub/zero-to-jupyterhub-k8s#1103

@jcrist
Copy link
Member

jcrist commented Dec 3, 2019

I think you could do this right now by configuring a post_auth_hook to create a new serviceaccount/IAM role for the user (if not already created). The serviceaccount could then be configured for the notebook by adding a modify_pod_hook (alternatively these could be combined to just a modify_pod_hook, probably fine either way). This would allow jupyterhub to manage creating the service accounts per user. I don't think a separate namespace per user would be needed at all here, but may be wrong.

@scottyhq
Copy link
Member Author

scottyhq commented Jan 6, 2020

In a recent chat with @yuvipanda - he pointed me to a nice model for provisioning per-user policies and buckets on GCP that would be relevant once we get around to trying some of the suggested approaches in this issue https://github.com/berkeley-dsep-infra/datahub/blob/staging/images/hub/sparklyspawner/sparklyspawner/__init__.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants