Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to using kubernetes_asyncio client #744

Open
yuvipanda opened this issue Dec 2, 2018 · 4 comments
Open

Switch to using kubernetes_asyncio client #744

yuvipanda opened this issue Dec 2, 2018 · 4 comments

Comments

@yuvipanda
Copy link
Collaborator

Currently we use the official kubernetes python API Client. This is a purely synchronous client, and we must use threadpools / threads to use it from our tornado app.

We should experiment with using https://github.com/tomplus/kubernetes_asyncio instead, which is the asynchronous version of the library. This should (hopefully) simplify our code and let us stop thinking about threads.

@betatim
Copy link
Member

betatim commented Dec 2, 2018

We should consider how well supported the async k8s library will be (in a year or two from now). Right now it seems like a hobby project with one main contributor.

In the past we have had some problems with leaks/things breaking because the official library had bugs/made changes which causes trouble "in real time", compared to getting thread pools right for which you have infinite amounts of time. I prefer that we struggle during coding to in real time. Especially as the thread pool pattern is well established and setup in our code base already.

Is there a particular bug that would be fixed by switching or feature it would enable?

Right now my assessment is:

  • would be cool from a coder's perspective
  • introduces a risky new dependency (low maintainer count)
  • reduces compatibility with other cloud k8s deployments(??)

Should we put this on our "to watch" list and assess how the project has progressed in attracting more maintainers in a few weeks or months?

@yuvipanda
Copy link
Collaborator Author

I stepped back from this a little bit and realized what really needs to happen is a port of the 'SharedInformer' concept from the k8s go client to Python. It is an encapsulation of a list + watch that handles coalescing, retries and resyncs for you. The existence of this library is why most k8s related stuff is in Go, and why they function pretty well. We've fixed a number of issues in kubespawner where we (mostly @minrk) learnt why we need an encapsulation like this.

More information on the SharedInformer is at:

I don't believe it is possible for a human being to write a correct implementation of this while mixing threads & async.

re: compatibility, I don't think that's a problem. When running inside a k8s cluster, clients always use in-cluster. Minikube uses certs here, which are also supported. The only thing not natively supported is directly talking from your local computer to an AKS kubernetes cluster, which isn't something BinderHub can do anyway now. In the future, everything is moving to using the 'exec' provider (where an external command is executed for auth tokens). This library seems to support that too.

Maintainership is the only long term issue, but I'm personally committed to contributing to this too if necessary. The following things make me not too concerned about it:

  1. It is generated from the same official repository as the official client: add script to create kubernetes_asyncio python client kubernetes-client/gen#60
  2. It's using asyncio support in swagger-gen to do most of its work, and this is maintained upstream in swagger-gen
  3. The async client for kubernetes will always be a separate package, since the official one wants to always be synchronous (and be compatible with 2.7 for the foreseeable future) support for asyncio kubernetes-client/python#323

Kubespawner is a bit more complex than binderhub, so I wanted to try this out on binderhub to gain experience running it to see how it goes. BinderHub's use of the k8s API is fairly minimal, and I am 100% sure there are bugs here we aren't noticing.

Since kubernetes_asyncio wants to be backwards compatible with the synchronous library, the changes are somewhat minimal. I've started work on this with #745. I need to figure out all the threads in use and make sure I've converted them right, and understand enough of the testing infrastructure to fix that too.

In parallel, I'm also working on a separate library that should implement sharedinformers - https://github.com/yuvipanda/kubernetes-informers. This should have thorough tests for various issues that might arise. I'm happy to wait for that library to mature before using that in BinderHub if that sounds better.

/cc @minrk @tomplus

@tomplus
Copy link

tomplus commented Dec 3, 2018

@yuvipanda Thanks for nice introduction 👍

Yes, my library is very similar to the official library which you're using. It's also generated from a specification but it uses the asyncio variant of python module from swagger-codegen. I always contribute to both repository to make these libraries almost identical. Now it exists in a separate repository (as owners of the official library suggested) but I hope it'll be incorporated to kubernetes-client in the future. I think your use case can help with it by showing that such library is also necessary for community.
Please let me know if you need any help with integration.

@yuvipanda
Copy link
Collaborator Author

Thanks, @tomplus!

I think I want to get https://github.com/yuvipanda/kubernetes-informers to a stable place first. It has been more complicated than I originally thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants