Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

Dynamic provisioner for Hostpath #15

Closed
tamalsaha opened this issue Mar 6, 2017 · 24 comments
Closed

Dynamic provisioner for Hostpath #15

tamalsaha opened this issue Mar 6, 2017 · 24 comments

Comments

@tamalsaha
Copy link
Contributor

Hi,
I am interested in a hostpath provisioner that can work with StatefulSet with replica > 1. I had a conversation in slack api-machinery room. They pointed me to this project. Here is my conversation: appscode/k8s-addons#14 (comment)

I am new to writing my own PV provisioner. Do you think what I am asking for is possible? How do I take the demo hostpath provisioner in this repo and make that a proper one.

Thanks.

@wongma7
Copy link
Contributor

wongma7 commented Mar 6, 2017

Yes, I think it's possible. Don't know how easy it will be though.

First consider that if you want the hostpath provisioner to handle pre-creating + deleting directories (MkdirAll & RemoveAll) on the host when corresponding PVs are created + deleted (with reclaim policy Delete), an instance of it will need to run on every node and mount every hostPath. And the logic in Delete will have to be improved.

If you don't care about that, if you are willing to handle cleaning up PVs' data yourself (effectively reclaim policy Retain), then you can throw away the assumption that each hostPath provisioner needs to actually mount its node's hostPath. So you can run just one hostpath provisioner for the whole cluster that just creates PVs and ignores creating/deleting directories. You will need to remove the RemoveAll and MkdirAll and such from the code.

Now, the hard part, the scheduling. tbh I don't know much about how to achieve this but I believe you have to extend the scheduler to be aware of those node hostname labels on the PV clayton is talking about. If you go with the first option I said where you have to run an instance of hostPath provisioner on every node, then obviously each provisioner labels its PVs with the node it's running on. If you go with the second option, then I guess you can just round-robin.

@wongma7
Copy link
Contributor

wongma7 commented Mar 6, 2017

Also pvDir is hardcoded,you'll want to change it https://github.com/kubernetes-incubator/external-storage/blob/master/docs/demo/hostpath-provisioner/hostpath-provisioner.go#L59

Plus the identity is not very useful as-is, if you understand the principle of it then you can remove/improve it if you want.

@tamalsaha
Copy link
Contributor Author

@wongma7 , thanks for the detailed explanation. I have read the code mostly. I think I understand your comments regarding retention policy, I think those can be made parameterized. Users can use them based on parameters passed to StorageClass. For one more they can use Deployment for a singel pod. In the other mode, they can run DaemonSet and manage host paths.

Regarding Scheduling, I am also new to this level of details. I read the articles:

From these articles I don't see any mention of default scheduler picks nodes based on label of PV. It is possible to write custom scheduler. But that raises the level of complexity. I will as in sig-scheduler google groups.

@wongma7
Copy link
Contributor

wongma7 commented Mar 7, 2017

Yes you can do whatever you want re:retention policy, making it an SC parameter sounds good. Note that the controller will always pass to Provision 'options.ReclaimPolicy=delete" but you can ignore it, it's only because upstream hasn't decided the "official" way to do retention policy yet. But again yeah if you want to set it to Retain and do the 'Deployment' style thing then sure why not.

RE: scheduler, please also check out https://kubernetes.io/docs/admin/multiple-schedulers/ . I believe you need write a predicate that looks at a pod->PVC->PV node label to determine which node the pod can be scheduled to. It doesn't look too complex but yeah, hostPath PV support is nonexistent in kube so you may have to write a scheduler and deploy it alongside the default scheduler. (Or patch the existing scheduler but I believe that's way less desirable.)

@tamalsaha
Copy link
Contributor Author

tamalsaha commented Mar 7, 2017

It doesn't look too complex but yeah, hostPath PV support is nonexistent in kube so you may have to write a scheduler and deploy it alongside the default scheduler. (Or patch the existing scheduler but I believe that's way less desirable.)

The existing scheduler will work fine as-is as long SS are not scaled up after scaling down. Say I have a SS with replica =3 (also uses PV). Now, later I reduce # of replicas to 2. This deleted the pod-2. Then if I increase the replicas to 3, I see that the original PVC, if exists, will be reused. If this is a required behavior, this is not possible without extending the default scheduler for hostpath provisioner. Do you know who can give me a definitive answer?

@tamalsaha
Copy link
Contributor Author

tamalsaha commented Mar 8, 2017

@wongma7 , regardless of the scheduler issue do you think the hostpath provisioner can be moved from demo to a supported provisioner? I will be willing to send prs if that is necessary.

@tamalsaha
Copy link
Contributor Author

tamalsaha commented Mar 8, 2017

@wongma7
Copy link
Contributor

wongma7 commented Mar 8, 2017

Yes, it's coming though it will be a while (another release or two?) before it is a reality.

In the meantime I'm reluctant to make the hostpath provisioner any more official. It will have to come with a big disclaimer plus the scheduling caveat. Prs are always welcome but the scheduling issue unless solved will make it hard to use in typical clusters.

@tamalsaha
Copy link
Contributor Author

tamalsaha commented Mar 8, 2017

Yeah. At least another 6 months to wait for official solution. I think we want something sooner. I have read the scheduler code and I think the actual change necessary is fairly simple. Currently default scheduler set PodSpec.Nodename once scheduling decision is made. Custom scheduler also need to apply the scheduler.beta.example.com/nodename:<nodename> on PVs that can bound (directly or via PVC). During the filtering process, I need to check this in addition to checking PodSpec.NodeName.

There is an edge case. Say SS was reduced from replica = 3 to replica=2. Then the node where replica 3 was scheduled originally was removed from cluster. The correct action in this scenario will be left to the user via scheduler.beta.example.com/reassign-missing-node:true (default false) annotation on PodSpec. Scheduler can either fail to schedule the pod or reassign to a new node. If scheduler fails to assign the pod, then user can manually fix the underlying issue, remove the nodename annotation from PV and then reschedeule this pod.

Obviously I have to figure out how to build and distribute the patched scheduler.

@ghost
Copy link

ghost commented Mar 9, 2017

So if we implement this, we can use hostpaths across cluster as a PV(I mean PV that is local to pod's node) but without scheduling pod guarantees, isolation etc. of "local persistent volumes" at kubernetes/community#306 ?

My only priority on coming local PV is

  • defining PVC with my yamls as "local" without knowing how underlying PV achieve that (works in bare metal cluster, minikube...)
  • support for match labels of PVC to try to bind used PV before
  • and use node's local ssd or hdds
  • having durability (retain)
  • without installing anything to host OS, install over just kubernetes
  • when we get volume through PVC, we shouldn't deal with permissions like host path wants user to be root or needs some hack

I am not in hurry for isolation, resource limitations, security, ephermal storage aspects of coming proposal.

I would like to contribute once our goals here is clear because we don't want to wait for local PVs streamlined

@tamalsaha
Copy link
Contributor Author

tamalsaha commented Mar 9, 2017

@thenamli , I have started here from v1.5.4 tag: https://github.com/appscode/kubernetes/commits/hostpath

So far I have figured out how to build and push the binary. My docker image : https://hub.docker.com/r/appscode/hostpath-scheduler/tags/

I am following: https://kubernetes.io/docs/admin/multiple-schedulers/

@tamalsaha
Copy link
Contributor Author

/cc: @sadlil, @aerokite

@eparis
Copy link

eparis commented Mar 9, 2017

reuse of a PVC/PV after a SS is scaled down and up is expected behavior as I recall.

@tamalsaha
Copy link
Contributor Author

tamalsaha commented Mar 9, 2017

@thenamli, I have got a bare bones implementation in place. Code compiles. I need to do actual testing. If you want to do a code review that will be handy.

Diff: kubernetes/kubernetes@release-1.5...appscode:hostpath

@wongma7
Copy link
Contributor

wongma7 commented Mar 15, 2017

So apparently there may be a way to co-opt the zone predicate of the scheduler to get hostPath PVs working...disclaimer this is extremely hacky, the zones annotation are still in beta to begin with, there's no guarantee that if this works now it will continue to work later, and you try this at your own risk.

  1. label each node with its own "failure-domain.beta.kubernetes.io/zone"
  2. label each PV your provisioner creates with a "failure-domain.beta.kubernetes.io/zone"
  3. when your statefulset creates PVCs asking for your provisioner's PVs the scheduler respects the label and schedules the statefulset pods to their respective PVs' "zones" aka nodes.

So again, this is extremely hacky and my official recommendation is still to write & use your own scheduler (which is already laxer than upstream's official recommendation to wait patitently for them to implement local storage). But thought it was interesting and worth sharing. :)

@tamalsaha
Copy link
Contributor Author

tamalsaha commented Mar 15, 2017

Thanks @wongma7 . I think writing scheduler might be the best option. Unfortunately, the new official solution for local storage will not solve my problem :( kubernetes/community#306 (comment)

The new local storage will add the PV label based scheduling to scheduler. So, I might not have to keep my forked scheduler for long term. But for my case, where my nodes have a single primary partition, I have to use my custom hostpath provisioner for durable local storage.

@ghost
Copy link

ghost commented Mar 20, 2017

@tamalsaha I read your messages there in local PV. Unfortunately, every usage scenario outside of gce and aws seems hacky at best.

First I thought I can contribute k8s to get better k8s support on bare metal. Neither contributing to mainstream k8s nor having own scheduler doesn't give clear and complete solution in the near future. At least it will take 1 year for k8s to mature on persistent storage area.

I am still looking forward to contribute k8s in this area.

So I ended up thinking using network storage service of our bare metal provider and writing a provisioner for it soon.


As a metaphor, you are on the business of selling shoes. Then you decide to build your own factory. Then you realize your factory needs more electricity.

Then you realize dam nearby is not enough and people are building a new nuclear reactor for this near the lake. Then you found out that nuclear reactor have some problems for your needs.

At the end, you find yourself at the situation where you are producing coolant systems, control rods, etc. for nuclear reactor when you just want to sell shoes at first :)

@tamalsaha
Copy link
Contributor Author

tamalsaha commented Mar 20, 2017

So I ended up thinking using network storage service of our bare metal provider and writing a provisioner for it soon.

If you would share what you come up with that will be great!

Btw, I smiled at your metaphor out of pain. This is most definitely how we became contributor to many open source stuff. 😭

@wongma7
Copy link
Contributor

wongma7 commented Mar 31, 2017

folks, https://docs.google.com/document/d/1so67pZPtBwv3uBg9d3pk4VLzfn9qtuZrbauv1DnNDSk/edit?ts=58dd9c72#heading=h.syg5p31g5672 may interest you as well.

It won't have dynamic provisioning in the first phase (1.7), but if you want to steer it in the right direction I encourage you to comment!

Seems there is a lot of movement around local storage :)

@tamalsaha
Copy link
Contributor Author

I don't have read access on this file.

@wongma7
Copy link
Contributor

wongma7 commented Mar 31, 2017

@tamalsaha join/subscribe to the sig-storage mailing list, the original post is here https://groups.google.com/forum/#!topic/kubernetes-sig-storage/mYZ7hTmOO7U

@nailgun
Copy link

nailgun commented May 18, 2017

While dynamic provisioning is not in upstream I continued @tamalsaha work and modified example hostPath provisioner to support multiple storage classes: https://github.com/nailgun/k8s-hostpath-provisioner
There is also custom scheduler based on @tamalsaha work but more simple because it depends on this custom provisioner

@tamalsaha
Copy link
Contributor Author

tamalsaha commented May 19, 2017

Awesome @nailgun ! I am going to give it a try.

wongma7 added a commit to wongma7/external-storage that referenced this issue Jul 12, 2017
1d5f4e8 Use gometalinter's built-in recursion instead of find
11974b1 Merge pull request kubernetes-retired#20 from ixdy/import-gazel
fd3ec96 Fix go fmt/lint/vet errors
0c6a35f Fix boilerplate
3de4084 Autogenerate BUILD file for verify/boilerplate/test
4777038 Rename gazel to kazel
7a0f810 Replace mikedanese/gazel with k8s.io/repo-infra
2cc30c9 Merge mikedanese/gazel into kubernetes/repo-infra
d38d0e4 Move README into gazel subdirectory
9dedd5f Merge pull request kubernetes-retired#18 from ixdy/go-genrule
fa9cf8a Update go_genrule to work with recent rules_go changes
61b7247 Support staging to multiple paths, and create local dir if necessary
ba94d69 Run buildifier on travis
9a0b26c Merge pull request kubernetes-retired#38 from ixdy/rules_go-load
821d981 Delete the rules_go load rule if it's not needed
684e550 Merge pull request kubernetes-retired#14 from ixdy/gcs-upload-updates
cbed6ce Improve the progress message on md5 and sha1 genrules
f3a7b5d Merge pull request kubernetes-retired#13 from ixdy/gcs-upload-updates
da48222 Update the gcs_upload rule and add a new release_filegroup rule
0769f2a Merge pull request kubernetes-retired#37 from mikedanese/1.8
469de90 support go 1.8
a2c785a Merge pull request kubernetes-retired#33 from apelisse/master
3db41bc Use unified diff with explicit filenames
94c00ea Merge pull request kubernetes-retired#34 from spxtr/gen
55e4425 Add an option to generate Kubernetes' OpenAPI rule.
2797841 Add a .travis.yaml (kubernetes-retired#10)
4667983 Add go-tools verification scripts  (kubernetes-retired#9)
904fadc Merge pull request kubernetes-retired#32 from dixudx/update_readme_rst
6529db7 update README to rST format
74e2dd9 Merge pull request kubernetes-retired#30 from spxtr/travis
45c38a8 Add simple .travis.yml.
2bf2ba6 Merge pull request kubernetes-retired#31 from ixdy/load-go-rule
fe80689 Fix reconcileLoad again
5248d3f Update to latest go rules
997b6ca Merge pull request kubernetes-retired#28 from mikedanese/fixer
252897f Merge pull request kubernetes-retired#27 from mikedanese/multi-vendor
2bd22f8 fix bad load statements
41da84b support multiple BUILD files in vendor
252729c Merge pull request kubernetes-retired#26 from ixdy/cleanup-diff-files
477fb36 Remove tmp diff files
7ebf649 Search for BUILD.bazel before BUILD
ece8716 bump tag
7f8600f Merge pull request kubernetes-retired#24 from ixdy/chdir-into-root
89f2bc8 chdir into --root before walking repo
6f9c0f1 bump stable version of gazel in the README
351bcba Handle sources in GoPrefix properly
d25bf90 Merge pull request kubernetes-retired#20 from ixdy/sourcerer
176bcde Fix typo and add comment
20821fc Add a sourcerer which walks all BUILD files in the repo
be33354 Merge pull request kubernetes-retired#19 from ixdy/refactor-write-files-at-end
c3d8143 Fix diff order
caaa662 --validate: silent on success
1c763e6 Add --validate mode which exits nonzero if any files need updating.
2c6f77f make writeFile, ReconcileRules, etc return bools again
0a18e61 review feedback: remove single-arg case, sort package paths before writing
0a3220f A few additional no-op cleanups
e701689 Refactor gazel to write all BUILD files in one pass at the end
f7b3ade add note on how to get the latest stable version
cb72dd8 Merge pull request kubernetes-retired#18 from mikedanese/fix-vendor
186cebe typo
7da81d1 fix test src names in vendor
9728189 remove testdata from vendor
8f5807d build gazel with bazel
7d9c8de add a README
a499824 Merge pull request kubernetes-retired#17 from mikedanese/pretty
b76deaf omit deps = [] for prettier BUILD files
6dda217 Merge pull request kubernetes-retired#16 from mikedanese/better
4e0d490 refactoring of main package testing change
9f14def Merge pull request kubernetes-retired#15 from spxtr/absroot
0e4c3b0 Use an absolute path for the root.
d23f7b4 Merge pull request kubernetes-retired#14 from spxtr/maintest
8d0f27b package main can have tests too.
188516f Merge pull request kubernetes-retired#13 from spxtr/root
5e248f0 Use . as default root.
07fbf7a Merge pull request kubernetes-retired#11 from mikedanese/deprecate
8d48152 support diffs
f438ae5 deprecate old paths
a3db1b1 move stdlib code into vendor
83a4e8c merge cgo support into subpackage
8596519 Merge pull request kubernetes-retired#8 from mikedanese/perf
0af79e8 10x speed up of full run by caching imported packages
37199c9 Merge pull request kubernetes-retired#7 from mikedanese/cfg
ceb35ae rename Venderor -> Vendorer
a033e39 support file based config
1e36482 Merge pull request kubernetes-retired#4 from mikedanese/license
51d846d add license
a468a7e Revert "add cgo support"
391f709 Merge pull request kubernetes-retired#3 from mikedanese/cgo-support
9fa4534 add cgo support
e57fb82 Merge pull request kubernetes-retired#2 from mikedanese/verify-vendor
c59a88e verify vendor
743bfd4 skip old clients
72686a4 add glog to vendor
05e7507 Merge pull request kubernetes-retired#1 from timstclair/master
dd338a4 Check filepath error while walking directories
2b57891 walk symlinks
b306325 revert move
f8205e7 move gazel into subdirectory
7ec36c5 only support linux_amd64 for now
8e0a1b7 pass root explicitly
e6ed9a7 better error output
6a40a1e remove old docs
af37135 fix flag bug
737c1ce dry run mode
400dfab don't print not updated packages
86de7f2 stable
8d4dcf7 vendor buildifier
0c3560e last commit
REVERT: ed9dd35 Update repo-infra

git-subtree-dir: repo-infra
git-subtree-split: 1d5f4e862a9b51e2a07edb704e80f4a2a7213f52
@tamalsaha
Copy link
Contributor Author

Base don my conversation with @msau42 , it seems that the new Local PV will eventually address this use-case. I am looking forward to that. Closing this one.

princerachit pushed a commit to princerachit/external-storage that referenced this issue Jul 11, 2018
yangruiray pushed a commit to yangruiray/external-storage that referenced this issue Jul 19, 2019
StorageClass parameters to CSI CreateVolume call
yangruiray pushed a commit to yangruiray/external-storage that referenced this issue Jul 19, 2019
build.make: Replace 'return' to 'exit' to fix shellcheck error
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants