Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix resourcVersion = 0 in cacher #13910

Merged
merged 1 commit into from Sep 17, 2015

Conversation

wojtek-t
Copy link
Member

@k8s-bot
Copy link

k8s-bot commented Sep 14, 2015

GCE e2e build/test passed for commit c7b7f55baed2d02217289aacf39e03e3bbcd360c.

@liggitt
Copy link
Member

liggitt commented Sep 14, 2015

Does this restore the previous behavior where you get modified events for all objects that exist when the watch was started? That was what made watch from 0 useful

@liggitt
Copy link
Member

liggitt commented Sep 14, 2015

cc @stefwalter

@wojtek-t
Copy link
Member Author

Does this restore the previous behavior where you get modified events for all objects that exist when the watch was started?

@lavalamp
It doesn't (in fact I just noticed that it was the case before - btw it seems very misleading for me to have completely different behavior for resourceVersion = 0 and resourceVersion != 0).

But yeah... I will try to fix that.

@derekwaynecarr derekwaynecarr self-assigned this Sep 14, 2015
@derekwaynecarr
Copy link
Member

I spent more time debugging this to make sure I was not mistaken.

if I do the following direct to etcd on a running Kubernetes system:

curl '127.0.0.1:4001/v2/keys/registry?recursive=true&wait=true'

It waits until there was a node change event from Kubernetes, and only then returns the action for a Node update. I do not get any other data in the system. Meaning I do not get all the existing data.

I think the change proposed here actually is consistent with the behavior that we used to have.

@wojtek-t
Copy link
Member Author

@derekwaynecarr - I agree it's what etcd does (and I agree it's the most intuitive solution).
However, it's not what we had in v1.0 Kubernetes version (we had what @liggitt wrote above). So in my opinion we should restore it to the previous bahavior for backward compatibility. WDYT?

@derekwaynecarr
Copy link
Member

@wojtek-t - I am testing what we actually had in 1.0 right now, will comment on original issue.

@smarterclayton
Copy link
Contributor

We need to restore to the previous behavior - we can deprecate and remove
it for the v2 api if we can reach consensus for clients. In the future,
the event history of an object could be very very long, so there are some
advantages to moving away from the list pattern.

On Mon, Sep 14, 2015 at 10:33 AM, Wojciech Tyczynski <
notifications@github.com> wrote:

@derekwaynecarr https://github.com/derekwaynecarr - I agree it's what
etcd does (and I agree it's the most intuitive solution).
However, it's not what we had in v1.0 Kubernetes version (we had what
@liggitt https://github.com/liggitt wrote above). So in my opinion we
should restore it to the previous bahavior for backward compatibility. WDYT?


Reply to this email directly or view it on GitHub
#13910 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

@derekwaynecarr
Copy link
Member

I tested on all the stuff in question as I was confused myself tracking etcd and kube versions.

see #13809 (comment)

I think we need to special case resourceVersion=0 to internally kube does LIST and then WATCH from the last observed watch state to maintain consistency with 1.0 release.

@lavalamp
Copy link
Member

Yeah, this is not the old behavior. IMO this behavior makes more sense, but since the point of this is to maintain compatibility, we should stick to the old behavior, which was list, send each item individually, then watch. With no way for user to tell where one ended and the other began.

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 14, 2015
@k8s-github-robot
Copy link
Contributor

Labelling this PR as size/M

@wojtek-t
Copy link
Member Author

I agree - will fix that PR tomorrow.

@wojtek-t
Copy link
Member Author

I've updated the PR to be compatible with what we have in v1.0.

PTAL

@k8s-bot
Copy link

k8s-bot commented Sep 15, 2015

GCE e2e build/test passed for commit 0007794f68239f54b8803baeb08dba16f9e2f607.

@stefwalter
Copy link
Contributor

I can see items like this showing up from the watch while smoke testing this:

$ curl http://localhost:8080/api/v1/watch/nodes
{"type":"get","object":{"kind":"Node","apiVersion":"v1","metadata":{"name":"127.0.0.1","selfLink":"/api/v1/nodes/127.0.0.1","uid":"a3729acf-5b8f-11e5-86cb-10c37bdb8410","resourceVersion":"126","creationTimestamp":"2015-09-15T09:53:38Z","labels":{"kubernetes.io/hostname":"127.0.0.1"}},"spec":{"externalID":"127.0.0.1"},"status":{"capacity":{"cpu":"8","memory":"32817760Ki","pods":"40"},"conditions":[{"type":"Ready","status":"True","lastHeartbeatTime":"2015-09-15T09:58:09Z","lastTransitionTime":"2015-09-15T09:53:38Z","reason":"KubeletReady","message":"kubelet is posting ready status"}],"addresses":[{"type":"LegacyHostIP","address":"127.0.0.1"},{"type":"InternalIP","address":"127.0.0.1"}],"nodeInfo":{"machineID":"ef00b79be229463cbb844c3e715de96c\n\noxx","systemUUID":"60CAA732-DAD7-DD11-8C4C-10C37BDB8410","bootID":"cdfff127-3fe4-4b28-be19-8b344604dd0a","kernelVersion":"4.0.4-301.fc22.x86_64","osImage":"Fedora 22 (Twenty Two)","containerRuntimeVersion":"docker://1.7.1.fc22","kubeletVersion":"v1.1.0-alpha.1.722+0007794f68239f-dirty","kubeProxyVersion":"v1.1.0-alpha.1.722+0007794f68239f-dirty"}}}}
^C

I don't think "get" is a valid EventType. The docs say "the type of watch event; may be ADDED, MODIFIED, DELETED, or ERROR"

@wojtek-t
Copy link
Member Author

Yes - I initially thought that it was the case before. Should be fixed now.

@stefwalter
Copy link
Contributor

Works well now.

A tiny nit ... the previous behavior was to list the initial set of items with "MODIFIED" rather than "ADDED".

@k8s-bot
Copy link

k8s-bot commented Sep 15, 2015

GCE e2e build/test passed for commit fa1a4a5eca364a918b549ab80b14334b8ca14915.

// However, to keep backward compatibility, we additionally need to return the
// current state and only then start watching from that point.
//
// TODO: In v2 api, we should stop returning the current state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open an issue to track change of WATCH behavior and and link to it here.

Reference the issue on the umbrella after #8190

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@wojtek-t
Copy link
Member Author

@derekwaynecarr - PTAL

@k8s-bot
Copy link

k8s-bot commented Sep 15, 2015

GCE e2e build/test failed for commit 12eaf67.

@derekwaynecarr
Copy link
Member

@k8s-bot test this

// current state and only then start watching from that point.
//
// TODO: In v2 api, we should stop returning the current state - #13969.
allItems := w.store.List()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this list go out to hit etcd? I guess it doesn't matter since this feature can't be used safely anyway, but if it doesn't go out to hit etcd there's no guarantee the combined state returned by the list would be a complete/correct one as of any particular resource version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the List() will work against the current state in the backing cache.Store and not hit etcd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why is this feature not safe? The original implementation was safe -
we called etcd to do a list, then watched from the list timestamp. How is
the new version less able to deliver safety?

On Tue, Sep 15, 2015 at 1:30 PM, Daniel Smith notifications@github.com
wrote:

In pkg/storage/watch_cache.go
#13910 (comment)
:

@@ -250,6 +250,20 @@ func (w *watchCache) GetAllEventsSinceThreadUnsafe(resourceVersion uint64) ([]wa
if size > 0 {
oldest = w.cache[w.startIndex%w.capacity].resourceVersion
}

  • if resourceVersion == 0 {
  •   // resourceVersion = 0 means that we don't require any specific starting point
    
  •   // and we would like to start watching from ~now.
    
  •   // However, to keep backward compatibility, we additionally need to return the
    
  •   // current state and only then start watching from that point.
    
  •   //
    
  •   // TODO: In v2 api, we should stop returning the current state - #13969.
    
  •   allItems := w.store.List()
    

Does this list go out to hit etcd? I guess it doesn't matter since this
feature can't be used safely anyway, but if it doesn't go out to hit etcd
there's no guarantee the combined state returned by the list would be a
complete/correct one as of any particular resource version.


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/13910/files#r39539558.

Clayton Coleman | Lead Engineer, OpenShift

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I have the same question. I can tell @lavalamp doesn't like this method, but I haven't heard a concrete reason an internal list followed by watch of the list's resourceVersion is unsafe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect he is referring to a potential race between the backing cache.Store and the watch cache. So if the current items are enumerated from the cache.store, and a watch is initiated, its possible that in the time between that a change in etcd was missed. I think to fix that you would need to know the resource version of the most recent item for when w.store.List() returned its value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say it's not "safe" because at no point in time can the client be certain it has a complete or correct list. I get that for many uses this safety isn't important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safe (due to how it is implemented).
Basically, the backing store contains a correct state at some point in time. The way it is implemented is that the cacher is locking watchCache, gets the state from that point in time (using list on the backing store) and starts watch from exactly that point. Every new watch event delivered to the backing store will be propagated to that watcher,
So there is no risk that some event will not be delivered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say it's not "safe" because at no point in time can the client be certain it has a complete or correct list. I get that for many uses this safety isn't important.

I don't understand it btw. It is guaranteed that this list is a correct list from some point in time (as long as we don't have multi-object transactions, which is true for now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t thank you for the clarification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see! Yes, this sounds correct. In theory, etcd would provide the same
list at some resourceVersion. Thanks for explaining.

On Tue, Sep 15, 2015 at 11:32 PM, Wojciech Tyczynski <
notifications@github.com> wrote:

In pkg/storage/watch_cache.go
#13910 (comment)
:

@@ -250,6 +250,20 @@ func (w *watchCache) GetAllEventsSinceThreadUnsafe(resourceVersion uint64) ([]wa
if size > 0 {
oldest = w.cache[w.startIndex%w.capacity].resourceVersion
}

  • if resourceVersion == 0 {
  •   // resourceVersion = 0 means that we don't require any specific starting point
    
  •   // and we would like to start watching from ~now.
    
  •   // However, to keep backward compatibility, we additionally need to return the
    
  •   // current state and only then start watching from that point.
    
  •   //
    
  •   // TODO: In v2 api, we should stop returning the current state - #13969.
    
  •   allItems := w.store.List()
    

It is safe (due to how it is implemented).
Basically, the backing store contains a correct state at some point in
time. The way it is implemented is that the cacher is locking watchCache,
gets the state from that point in time (using list on the backing store)
and starts watch from exactly that point. Every new watch event delivered
to the backing store will be propagated to that watcher,
So there is no risk that some event will not be delivered.


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/13910/files#r39597444.

@k8s-bot
Copy link

k8s-bot commented Sep 15, 2015

GCE e2e build/test passed for commit 12eaf67.

@derekwaynecarr derekwaynecarr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 16, 2015
@derekwaynecarr
Copy link
Member

LGTM

@k8s-github-robot
Copy link
Contributor

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Sep 17, 2015

GCE e2e build/test failed for commit 12eaf67.

@wojtek-t
Copy link
Member Author

@k8s-bot test this please

@k8s-bot
Copy link

k8s-bot commented Sep 17, 2015

GCE e2e build/test passed for commit 12eaf67.

@k8s-github-robot
Copy link
Contributor

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Sep 17, 2015

GCE e2e build/test passed for commit 12eaf67.

@k8s-github-robot
Copy link
Contributor

Automatic merge from SubmitQueue

k8s-github-robot added a commit that referenced this pull request Sep 17, 2015
@k8s-github-robot k8s-github-robot merged commit 7cd7aae into kubernetes:master Sep 17, 2015
@liggitt liggitt mentioned this pull request Sep 18, 2015
13 tasks
@wojtek-t wojtek-t deleted the watch_resoure_version branch September 30, 2015 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants