- 
                Notifications
    
You must be signed in to change notification settings  - Fork 5.3k
 
Add proposal for volume spec reconstruction #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * Versioning the meta-data is offloaded to the plugin. | ||
| 
               | 
          ||
| ## Meta-data location: | ||
| Store the meta-data file \<volume name\>.json in the plugin path. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controller-manager can run on several machines in hot-standby HA, where one controller-manager is master (and runs controllers) and the other controller-managers just wait for the master to die so they can quickly resume. The metadata won't be available to a new master on another machine if you store the metadata on the filesystem. I am afraid that these metadata must be stored in API server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The information will be stored as part of mountdevice call, which is only executed on worker nodes. Do we need this information and spec reconstruction in controller-manager code path too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and would you please add a note that this will be stored only on nodes and not on controller-manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just introduce a standard 'provider_info' key in json format and the plugin can determine what it needs to store here on it's own? In addition, it would be up to the driver to reconnect if needed and it would utilize that provider_info data.
| 
           CC @kubernetes/sig-storage-feature-requests @kubernetes/sig-storage-misc 
 I think option 3 is a non-starter. That will lead to fragmentation and counter to our end goal of providing k8s kubelet a reliable way to reconstruct lost internal state. Option 1 is insufficient for all the information we may need, and even option 2 does not go far enough. The information we need for kubelet volume reconstruction is spread across 3 API objects: 
 
 
 
 
 
 A 4th option I did not see proposed, is to figure out what fields Kubernetes needs and to store only those. I admit that would be more difficult to implement and harder to maintain (you need to worry about backwards compatibility, you need to find a way to encode volume plugin specific information, etc.). So, it sounds like the easiest options would be to just save the entity of the involved objects (pod and PV) so that we are sure to have everything we need at reconstruction time. At a minimum we'd need to store some part of the Pod API object and the PV object (if not direct reference volume). But, like you pointed out, we need to worry about versioning and the easiest way around that would be to store the entire object (including version). Not sure if this is kosher, however, esp for pod objects.  | 
    
| 
           @saad-ali: These labels do not exist in this repository:  In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.  | 
    
| 
           @shilpamayanna PTAL  | 
    
| 
           Thanks @saad-ali. I prefer storing the PV Object too, but if the spec is inlined in pod spec, we have to store the entire pod spec. I don't like that part. Lets pick a choice, before it's too late. Between, @shilpamayanna is working on implementing this support.  | 
    
| 
           @saad-ali PTAL. Waiting on you for LGTM.  | 
    
| 
           Freeze date for 1.7 was June 1, so we missed the boat. Let's prioritize it high for 1.8. In the meantime please make sure that any bug fixes that need to go in for 1.7 to mitigate kubernetes/kubernetes#44737 are in the master/1.7 branches.  | 
    
| 
           @saad-ali PTAL. Waiting on you for LGTM.  | 
    
| 
           Spoke with @thockin for some ideas. His suggestion was to use Kubernetes Component Config which gives us versioned APIs, backward compatibility, deprecation policies, documentation, etc for arbitrary configuration files. He also was leaning towards a per volume hook. My concern with a per volume hook is whether it would be sufficient to capture all the information that the volume code would need (PV name, pod volume name, etc.). I feel like we may need to store the entire PV, PVC, and lots (if not all) the pod (volumes, mounts, etc.). Tim's suggestion was to store only what we need, and create our own versioned API to do that. Problem with building our own API is that it is one more thing to maintain, and as the external k8s volume API (or internal volume state management) changes, we'd need to make changes (and maintain backwards compatibility) in this layer. @shilpamayanna what are your thoughts?  | 
    
| 
           Kubelet checkpointing is similar concept for all of kubelet: kubernetes/kubernetes#489  | 
    
| 
           There are a few things I want to discuss based on the meeting. 1. Disable some volume plugin reconstruct functionI think disable some volume plugins reconstructVolume is still needed for some plugins. So I think it would be better disable reconstructVolume for glusterfs and some others (I will double check) 2. UniqueVolumeName issueAlso we checked that actually for non-attachable volumes, the unique volume name could be fully reconstruct correctly. It only uses pod name, plugin name, and volume spec name which are used in constructing the mount path. https://github.com/kubernetes/kubernetes/blob/886e04f1fffbb04faf8a9f9ee141143b2684ae68/pkg/volume/util/volumehelper/volumehelper.go#L66 For attachable volume plugin, most of plugins work fine like gce pd, aws ebs. There was an issue for flex volume which is fixed by kubernetes/kubernetes#46136 2. Reconstruct state at master controllerOn master controller side, we recovered state from node status "VolumeAttached" field, kubernetes/kubernetes#39732. There might be corner cases that node status is not in sync with real world. Now I kind of wondered what important issues we haven't addressed. Please comment if I miss any points. Thanks!  | 
    
| 
           For now, we will not implement GetVolumeName in flex volume to avoid volumes from getting unmounted and update flex volume documentation with limitations for inline volumes in pod spec.  | 
    
| 
           This PR hasn't been active in 90 days. Closing this PR. Please reopen if you would like to work towards merging this change, if/when the PR is ready for the next round of review. You can add 'keep-open' label to prevent this from happening again, or add a comment to keep it open another 90 days  | 
    
| 
           This PR hasn't been active in 90 days. Closing this PR. Please reopen if you would like to work towards merging this change, if/when the PR is ready for the next round of review. You can add 'keep-open' label to prevent this from happening again, or add a comment to keep it open another 90 days  | 
    
* Update T&R WG Lead, alphabetize list by last name * Also update teams.yaml for change in T&R WG lead.
Add proposal for volume spec reconstruction.
Addresses: kubernetes/kubernetes#44737