New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable changing times on volume attach/detach reconciling sync to fixing impact to AWS #39551

Merged
merged 2 commits into from Jan 10, 2017

Conversation

@chrislovecnm
Member

chrislovecnm commented Jan 6, 2017

#What this PR does / why we need it:

We are currently blocked by API timeouts with PV volumes. See #39526. This is a workaround, not a fix.

Special notes for your reviewer:

A second PR will be dropped with CLI cobra options in it, but we are starting with increasing the reconciliation periods. I am dropping this without major testing and will test on our AWS account. Will be marked WIP until I run smoke tests.

Release note:

Provide kubernetes-controller-manager flags to control volume attach/detach reconciler sync.  The duration of the syncs can be controlled, and the syncs can be shut off as well. 
@k8s-reviewable

This comment has been minimized.

Show comment
Hide comment
@k8s-reviewable

k8s-reviewable Jan 6, 2017

This change is Reviewable

k8s-reviewable commented Jan 6, 2017

This change is Reviewable

@patrickmcclory

this doesn't actually fix the underlying problem but will unblock users who are seeing AWS API call flooding due to volume checks.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 6, 2017

Member

Will let this build run, and then wire in the cobra options. @saad-ali has asked for both in the same PR.

Member

chrislovecnm commented Jan 6, 2017

Will let this build run, and then wire in the cobra options. @saad-ali has asked for both in the same PR.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

A couple of unit tests are failing on osx. I am going to run some build tests tonight. Also need to update openapi.

Member

chrislovecnm commented Jan 7, 2017

A couple of unit tests are failing on osx. I am going to run some build tests tonight. Also need to update openapi.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

And I broke the build. Will fix on linux ... SGTM

Member

chrislovecnm commented Jan 7, 2017

And I broke the build. Will fix on linux ... SGTM

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Review please @kris-nova

Member

chrislovecnm commented Jan 7, 2017

Review please @kris-nova

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

@jingxu97 / @saad-ali I noticed that ebf796a#diff-2b533d4422a255f4bd8521394e285679 shows that we have openapi auto-generated. What is the magic command for that??

Member

chrislovecnm commented Jan 7, 2017

@jingxu97 / @saad-ali I noticed that ebf796a#diff-2b533d4422a255f4bd8521394e285679 shows that we have openapi auto-generated. What is the magic command for that??

// reconcilerSyncDuration is the amount of time the reconciler sync states loop
// wait between successive executions
reconcilerSyncDuration time.Duration = 5 * time.Second

This comment has been minimized.

@liggitt

liggitt Jan 7, 2017

Member

Is this changing from 5 seconds to 5 minutes?

@liggitt

liggitt Jan 7, 2017

Member

Is this changing from 5 seconds to 5 minutes?

This comment has been minimized.

@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Yes, and @saad-ali are discussing it. @saad-ali is recommending 1 min, and my question is what is the implication of 5 min. Read the referencing issue for more information why we are backing this off.

@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Yes, and @saad-ali are discussing it. @saad-ali is recommending 1 min, and my question is what is the implication of 5 min. Read the referencing issue for more information why we are backing this off.

This comment has been minimized.

@liggitt

liggitt Jan 7, 2017

Member

If reconciliation has varying cost for different volume types, is a single tuning interval across all types what we want?

@liggitt

liggitt Jan 7, 2017

Member

If reconciliation has varying cost for different volume types, is a single tuning interval across all types what we want?

This comment has been minimized.

@jingxu97

jingxu97 Jan 7, 2017

Contributor

make reconcilerSyncDuration time very long might cause problem discussed in issue #33760 when state is out of sync. It explained the main reason we added sync loop here. Since this PR makes configurable, I think 1 min is good enough for most of the cases. This is a tradeoff between overhead and reducing the window of state out sync.

@jingxu97

jingxu97 Jan 7, 2017

Contributor

make reconcilerSyncDuration time very long might cause problem discussed in issue #33760 when state is out of sync. It explained the main reason we added sync loop here. Since this PR makes configurable, I think 1 min is good enough for most of the cases. This is a tradeoff between overhead and reducing the window of state out sync.

This comment has been minimized.

@chrislovecnm

chrislovecnm Jan 7, 2017

Member

What would you recommend @liggitt? We are planning to drop this in as an emergency release on Tuesday.

As I mentioned in #39526

TLDR;

Anyone that is using 1.4.6 or 1.4.7 in AWS will exceed their rate limits with as little as 20 PV attached to a cluster. We have one account that is at about 24k API calls per hour because of timeouts. This makes PV unusable on AWS.

@chrislovecnm

chrislovecnm Jan 7, 2017

Member

What would you recommend @liggitt? We are planning to drop this in as an emergency release on Tuesday.

As I mentioned in #39526

TLDR;

Anyone that is using 1.4.6 or 1.4.7 in AWS will exceed their rate limits with as little as 20 PV attached to a cluster. We have one account that is at about 24k API calls per hour because of timeouts. This makes PV unusable on AWS.

This comment has been minimized.

@smarterclayton

smarterclayton Jan 9, 2017

Contributor

I agree with Jordan, as a concerned lead I have no idea what the impacts or risks to this change are, and all I'm seeing is "might", "could", and "flake" on both sides of the fix / existing state.

Anything that needs to get cherry-picked that is important had better have a clear definition so people can understand them.

Please add that - removing the label until someone can summarize.

@smarterclayton

smarterclayton Jan 9, 2017

Contributor

I agree with Jordan, as a concerned lead I have no idea what the impacts or risks to this change are, and all I'm seeing is "might", "could", and "flake" on both sides of the fix / existing state.

Anything that needs to get cherry-picked that is important had better have a clear definition so people can understand them.

Please add that - removing the label until someone can summarize.

This comment has been minimized.

@smarterclayton

smarterclayton Jan 9, 2017

Contributor

Specifically, a comment was made:

this increases the change of mounting the wrong volume to the pod

which is incredibly terrifying.

@smarterclayton

smarterclayton Jan 9, 2017

Contributor

Specifically, a comment was made:

this increases the change of mounting the wrong volume to the pod

which is incredibly terrifying.

This comment has been minimized.

@gnufied

gnufied Jan 9, 2017

Member

The wrong volume will be only mounted if:

  1. Volume is detached from pod, outside of k8s system (like AWS console)
  2. Node is rebooted.

#1 is not a huge concern because - usually we don't recommend users to detach volumes attached to nodes/pods.

#2 is bit of a problem and I am damned for saying this - but as long as time it takes for a reboot to complete is more than time specified here - we would be perhaps okay. But even original fix isn't a complete fix for #33760 as commented by @jingxu97

@gnufied

gnufied Jan 9, 2017

Member

The wrong volume will be only mounted if:

  1. Volume is detached from pod, outside of k8s system (like AWS console)
  2. Node is rebooted.

#1 is not a huge concern because - usually we don't recommend users to detach volumes attached to nodes/pods.

#2 is bit of a problem and I am damned for saying this - but as long as time it takes for a reboot to complete is more than time specified here - we would be perhaps okay. But even original fix isn't a complete fix for #33760 as commented by @jingxu97

This comment has been minimized.

@smarterclayton

smarterclayton Jan 9, 2017

Contributor

Was that an "OR" or an "AND"? If it's an AND, that resolves my concern (because I agree 1 is PIBKAC). If it's an OR, then that's a really important thing to document, and very scary, and should be part of the flag documentation.

@smarterclayton

smarterclayton Jan 9, 2017

Contributor

Was that an "OR" or an "AND"? If it's an AND, that resolves my concern (because I agree 1 is PIBKAC). If it's an OR, then that's a really important thing to document, and very scary, and should be part of the flag documentation.

This comment has been minimized.

@jingxu97

jingxu97 Jan 9, 2017

Contributor

It is an "OR". @gnufied lists two possible scenarios that the problem might happen.

@jingxu97

jingxu97 Jan 9, 2017

Contributor

It is an "OR". @gnufied lists two possible scenarios that the problem might happen.

Show outdated Hide outdated pkg/apis/componentconfig/types.go
Show outdated Hide outdated pkg/apis/componentconfig/types.go
// reconcilerSyncDuration is the amount of time the reconciler sync states loop
// wait between successive executions
reconcilerSyncDuration time.Duration = 5 * time.Second

This comment has been minimized.

@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Yes, and @saad-ali are discussing it. @saad-ali is recommending 1 min, and my question is what is the implication of 5 min. Read the referencing issue for more information why we are backing this off.

@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Yes, and @saad-ali are discussing it. @saad-ali is recommending 1 min, and my question is what is the implication of 5 min. Read the referencing issue for more information why we are backing this off.

Show outdated Hide outdated pkg/controller/volume/attachdetach/reconciler/reconciler.go
@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

@k8s-bot unit test this

Member

chrislovecnm commented Jan 7, 2017

@k8s-bot unit test this

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

@saad-ali / @jingxu97 I am leaving this in WIP, until I run some base smoke test tomorrow. Will update the issue when I would like a final review.

Member

chrislovecnm commented Jan 7, 2017

@saad-ali / @jingxu97 I am leaving this in WIP, until I run some base smoke test tomorrow. Will update the issue when I would like a final review.

@jingxu97

This comment has been minimized.

Show comment
Hide comment
@jingxu97

jingxu97 Jan 7, 2017

Contributor
Contributor

jingxu97 commented Jan 7, 2017

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Review status: 0 of 8 files reviewed at latest revision, 10 unresolved discussions.


pkg/controller/volume/attachdetach/attach_detach_controller.go, line 68 at r1 (raw file):

Previously, chrislovecnm (Chris Love) wrote…

@saad-ali that is the second PR that I wanted to drop. Since I am not super familiar with wiring in the cobra options.

Updated PR with cobra comments.


Comments from Reviewable

Member

chrislovecnm commented Jan 7, 2017

Review status: 0 of 8 files reviewed at latest revision, 10 unresolved discussions.


pkg/controller/volume/attachdetach/attach_detach_controller.go, line 68 at r1 (raw file):

Previously, chrislovecnm (Chris Love) wrote…

@saad-ali that is the second PR that I wanted to drop. Since I am not super familiar with wiring in the cobra options.

Updated PR with cobra comments.


Comments from Reviewable

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Thanks!


Review status: 0 of 8 files reviewed at latest revision, 10 unresolved discussions.


Comments from Reviewable

Member

chrislovecnm commented Jan 7, 2017

Thanks!


Review status: 0 of 8 files reviewed at latest revision, 10 unresolved discussions.


Comments from Reviewable

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Review status: 0 of 8 files reviewed at latest revision, 10 unresolved discussions.


pkg/apis/componentconfig/types.go, line 779 at r3 (raw file):

Previously, chrislovecnm (Chris Love) wrote…

@saad-ali care to comment?

Jin responded to this.


Comments from Reviewable

Member

chrislovecnm commented Jan 7, 2017

Review status: 0 of 8 files reviewed at latest revision, 10 unresolved discussions.


pkg/apis/componentconfig/types.go, line 779 at r3 (raw file):

Previously, chrislovecnm (Chris Love) wrote…

@saad-ali care to comment?

Jin responded to this.


Comments from Reviewable

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

Review status: 0 of 8 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


cmd/cloud-controller-manager/app/options/options.go, line 84 at r4 (raw file):

Previously, liggitt (Jordan Liggitt) wrote…

Yes, follow the patten of the existing vars. Pass the current value in as the default for the flag. Define the default you want in NewCloudControllerManagerServer

Fixed. Committing after e2e.


Comments from Reviewable

Member

chrislovecnm commented Jan 7, 2017

Review status: 0 of 8 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


cmd/cloud-controller-manager/app/options/options.go, line 84 at r4 (raw file):

Previously, liggitt (Jordan Liggitt) wrote…

Yes, follow the patten of the existing vars. Pass the current value in as the default for the flag. Define the default you want in NewCloudControllerManagerServer

Fixed. Committing after e2e.


Comments from Reviewable

@jingxu97

This comment has been minimized.

Show comment
Hide comment
@jingxu97

jingxu97 Jan 7, 2017

Contributor
Contributor

jingxu97 commented Jan 7, 2017

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

@k8s-bot unit test this

Member

chrislovecnm commented Jan 7, 2017

@k8s-bot unit test this

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 7, 2017

Member

@jingxu97 / @saad-ali & et al - aws e2e is failing on PetSets/StatefulSets, which is a bit concerning. Trying to get a test running.

I do need some help figuring out what the unit/integration tests are not happy. They are passing locally.

Member

chrislovecnm commented Jan 7, 2017

@jingxu97 / @saad-ali & et al - aws e2e is failing on PetSets/StatefulSets, which is a bit concerning. Trying to get a test running.

I do need some help figuring out what the unit/integration tests are not happy. They are passing locally.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 8, 2017

Member

@saad-ali 1 min still seems a bit high. Can we look at 5 min? What is the consequence of 5 min?

Member

chrislovecnm commented Jan 8, 2017

@saad-ali 1 min still seems a bit high. Can we look at 5 min? What is the consequence of 5 min?

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 8, 2017

Member

Review status: 0 of 9 files reviewed at latest revision, 11 unresolved discussions.


pkg/controller/volume/attachdetach/reconciler/reconciler.go, line 101 at r2 (raw file):

Previously, chrislovecnm (Chris Love) wrote…

Fixed

Actually, I had a bug, thank goodness for AWS e2e, it caught that I updated the wrong cobra options file and the cloud-controller-manager. Now it is passing! AWS e2e for the win.


Comments from Reviewable

Member

chrislovecnm commented Jan 8, 2017

Review status: 0 of 9 files reviewed at latest revision, 11 unresolved discussions.


pkg/controller/volume/attachdetach/reconciler/reconciler.go, line 101 at r2 (raw file):

Previously, chrislovecnm (Chris Love) wrote…

Fixed

Actually, I had a bug, thank goodness for AWS e2e, it caught that I updated the wrong cobra options file and the cloud-controller-manager. Now it is passing! AWS e2e for the win.


Comments from Reviewable

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 8, 2017

Member

Review status: 0 of 9 files reviewed at latest revision, 11 unresolved discussions.


cmd/cloud-controller-manager/app/options/options.go, line 84 at r2 (raw file):

Previously, chrislovecnm (Chris Love) wrote…

Understand that, but I alway look for the big red button. We are shutting down important code, and accidentally shutting it down would not be nice.

You ok with this? What are your thoughts?


Comments from Reviewable

Member

chrislovecnm commented Jan 8, 2017

Review status: 0 of 9 files reviewed at latest revision, 11 unresolved discussions.


cmd/cloud-controller-manager/app/options/options.go, line 84 at r2 (raw file):

Previously, chrislovecnm (Chris Love) wrote…

Understand that, but I alway look for the big red button. We are shutting down important code, and accidentally shutting it down would not be nice.

You ok with this? What are your thoughts?


Comments from Reviewable

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 8, 2017

Member

Review status: 0 of 9 files reviewed at latest revision, 11 unresolved discussions.


cmd/cloud-controller-manager/app/options/options.go, line 85 at r5 (raw file):

Previously, liggitt (Jordan Liggitt) wrote…

Make it a duration, which you set with a unit (e.g. "1m", etc)

Done, and we have a check that it cannot be < 1 second. Which is insane small amount of time, but we are checking for it so that we don't create a tight loop.


Comments from Reviewable

Member

chrislovecnm commented Jan 8, 2017

Review status: 0 of 9 files reviewed at latest revision, 11 unresolved discussions.


cmd/cloud-controller-manager/app/options/options.go, line 85 at r5 (raw file):

Previously, liggitt (Jordan Liggitt) wrote…

Make it a duration, which you set with a unit (e.g. "1m", etc)

Done, and we have a check that it cannot be < 1 second. Which is insane small amount of time, but we are checking for it so that we don't create a tight loop.


Comments from Reviewable

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 8, 2017

Member

Reviewed 4 of 8 files at r4, 1 of 3 files at r5, 1 of 1 files at r6, 4 of 4 files at r7.
Review status: all files reviewed at latest revision, 11 unresolved discussions.


Comments from Reviewable

Member

chrislovecnm commented Jan 8, 2017

Reviewed 4 of 8 files at r4, 1 of 3 files at r5, 1 of 1 files at r6, 4 of 4 files at r7.
Review status: all files reviewed at latest revision, 11 unresolved discussions.


Comments from Reviewable

@chrislovecnm chrislovecnm changed the title from [WIP] Increasing times on reconciling volumes fixing impact to AWS. to Increasing times on reconciling volumes fixing impact to AWS. Jan 8, 2017

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 8, 2017

Member

@saad-ali this is no longer WIP. We need to discuss 1 vs 5 min, before we merge. I have tested on a cluster we 70 pvc mounted. I need to cherry pick this myself into 1.4.x stream and run more tests. 1.6 is running like a champ.

Member

chrislovecnm commented Jan 8, 2017

@saad-ali this is no longer WIP. We need to discuss 1 vs 5 min, before we merge. I have tested on a cluster we 70 pvc mounted. I need to cherry pick this myself into 1.4.x stream and run more tests. 1.6 is running like a champ.

@gnufied

This comment has been minimized.

Show comment
Hide comment
@gnufied

gnufied Jan 9, 2017

Member

That 2x number of calls is a BUG imo and it affects only AWS. getAWSInstance() has same details as e.describeInstance - but we may not have to fix that since @justinsb already opened a PR that removes all that stuff.

Member

gnufied commented Jan 9, 2017

That 2x number of calls is a BUG imo and it affects only AWS. getAWSInstance() has same details as e.describeInstance - but we may not have to fix that since @justinsb already opened a PR that removes all that stuff.

@gnufied

This comment has been minimized.

Show comment
Hide comment
@gnufied

gnufied Jan 9, 2017

Member

Option 1: We leave the sync rate at 5 seconds, and we add the flag to allow people to customize it for > their own deployments. They can increase it at their own risk.
Pro: Current default deployment remains in a "safe" state.
Con: Any cluster of significant enough size hits quota limit.

I like this option a lot.

We have had multiple reports that as few as 20 PV start to cause API limit errors in AWS. So this approach will cause anyone in AWS issues.

That may be so, but we can point people to relevant documentation about - how to fix it (i.e how to increase default sync duration). And this workaround may not be required once #39564 lands.

Member

gnufied commented Jan 9, 2017

Option 1: We leave the sync rate at 5 seconds, and we add the flag to allow people to customize it for > their own deployments. They can increase it at their own risk.
Pro: Current default deployment remains in a "safe" state.
Con: Any cluster of significant enough size hits quota limit.

I like this option a lot.

We have had multiple reports that as few as 20 PV start to cause API limit errors in AWS. So this approach will cause anyone in AWS issues.

That may be so, but we can point people to relevant documentation about - how to fix it (i.e how to increase default sync duration). And this workaround may not be required once #39564 lands.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 9, 2017

Member

Meeting to discuss options at 1:30 PM PT
If you didn't get the invite and want to join, hangout link will be: https://plus.google.com/hangouts/_/google.com/k8s-aws-storage

Member

saad-ali commented Jan 9, 2017

Meeting to discuss options at 1:30 PM PT
If you didn't get the invite and want to join, hangout link will be: https://plus.google.com/hangouts/_/google.com/k8s-aws-storage

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 9, 2017

Member

@saad-ali per the meeting we are going to set the default time to 5s, and let install tools set it higher. We are in the process of deprecating kube-up.sh support for AWS anyway, so we probably will let that lie.

Do you want me to rename the flag and variable as well

	fs.DurationVar(&s.ReconcilerSyncLoopPeriod.Duration, "reconcile-sync-loop-period", s.ReconcilerSyncLoopPeriod.Duration, "The reconciler sync wait time between volume attach detach.")

We did not have the correct name in it.

Member

chrislovecnm commented Jan 9, 2017

@saad-ali per the meeting we are going to set the default time to 5s, and let install tools set it higher. We are in the process of deprecating kube-up.sh support for AWS anyway, so we probably will let that lie.

Do you want me to rename the flag and variable as well

	fs.DurationVar(&s.ReconcilerSyncLoopPeriod.Duration, "reconcile-sync-loop-period", s.ReconcilerSyncLoopPeriod.Duration, "The reconciler sync wait time between volume attach detach.")

We did not have the correct name in it.

@liggitt

This comment has been minimized.

Show comment
Hide comment
@liggitt

liggitt Jan 9, 2017

Member

Do you want me to rename the flag and variable as well

Yes, please, especially the CLI flag that is getting exposed. Both should include the attach/detach qualifier (maybe something like --attach-detach-sync-period?)

Member

liggitt commented Jan 9, 2017

Do you want me to rename the flag and variable as well

Yes, please, especially the CLI flag that is getting exposed. Both should include the attach/detach qualifier (maybe something like --attach-detach-sync-period?)

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 9, 2017

Member

Yes, please, especially the CLI flag that is getting exposed. Both should include the attach/detach qualifier (maybe something like --attach-detach-sync-period?)

We have
--disable-attach-detach-reconcile
So
--attach-detach-reconcile-period
Makes sense to me.

I am not going to update the variable names as frankly, I don't want to touch this code much. I will update the flag name, the duration, and the flag options hack file.

Member

chrislovecnm commented Jan 9, 2017

Yes, please, especially the CLI flag that is getting exposed. Both should include the attach/detach qualifier (maybe something like --attach-detach-sync-period?)

We have
--disable-attach-detach-reconcile
So
--attach-detach-reconcile-period
Makes sense to me.

I am not going to update the variable names as frankly, I don't want to touch this code much. I will update the flag name, the duration, and the flag options hack file.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 9, 2017

Member

Do you want me to rename the flag and variable as well
Yes, please, especially the CLI flag that is getting exposed. Both should include the attach/detach qualifier (maybe something like --attach-detach-sync-period?)

+1

We have
--disable-attach-detach-reconcile
So
--attach-detach-reconcile-period
Makes sense to me.

I am not going to update the variable names as frankly, I don't want to touch this code much. I will update the flag name, the duration, and the flag options hack file.

Should include the word sync in both (not just reconcile).

Member

saad-ali commented Jan 9, 2017

Do you want me to rename the flag and variable as well
Yes, please, especially the CLI flag that is getting exposed. Both should include the attach/detach qualifier (maybe something like --attach-detach-sync-period?)

+1

We have
--disable-attach-detach-reconcile
So
--attach-detach-reconcile-period
Makes sense to me.

I am not going to update the variable names as frankly, I don't want to touch this code much. I will update the flag name, the duration, and the flag options hack file.

Should include the word sync in both (not just reconcile).

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 9, 2017

Member

I am noticing that with a build locally
pkg/generated/bindata.go
Is claiming that it needs a commit. I assume that I have missed that file?

Member

chrislovecnm commented Jan 9, 2017

I am noticing that with a build locally
pkg/generated/bindata.go
Is claiming that it needs a commit. I assume that I have missed that file?

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 9, 2017

Member

@saad-ali I have a couple of other tweaks I will push in after this. Fixing times on two tests, and removing dead code in reconciler. I am letting e2e run now, but will be in shortly.

Member

chrislovecnm commented Jan 9, 2017

@saad-ali I have a couple of other tweaks I will push in after this. Fixing times on two tests, and removing dead code in reconciler. I am letting e2e run now, but will be in shortly.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 9, 2017

Member

@saad-ali I have a couple of other tweaks I will push in after this. Fixing times on two tests, and removing dead code in reconciler. I am letting e2e run now, but will be in shortly.

Ack.

Member

saad-ali commented Jan 9, 2017

@saad-ali I have a couple of other tweaks I will push in after this. Fixing times on two tests, and removing dead code in reconciler. I am letting e2e run now, but will be in shortly.

Ack.

The capability to control duration via controller-manager flags,
and the option to shut off reconciliation.
@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 9, 2017

Member

@saad-ali last e2e was successful, now waiting on this one to run. The release notes are up to date now, and the changes that were requested are in.

Member

chrislovecnm commented Jan 9, 2017

@saad-ali last e2e was successful, now waiting on this one to run. The release notes are up to date now, and the changes that were requested are in.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 9, 2017

Member

Cool, I'll take another look

Member

saad-ali commented Jan 9, 2017

Cool, I'll take another look

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Jan 10, 2017

Contributor

Jenkins unit/integration failed for commit a973c38. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Contributor

k8s-ci-robot commented Jan 10, 2017

Jenkins unit/integration failed for commit a973c38. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@chrislovecnm

This comment has been minimized.

Show comment
Hide comment
@chrislovecnm

chrislovecnm Jan 10, 2017

Member

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/39551/pull-kubernetes-unit/12293/ <- same open API stuff that was failing over the weekend. No idea what is up

Member

chrislovecnm commented Jan 10, 2017

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/39551/pull-kubernetes-unit/12293/ <- same open API stuff that was failing over the weekend. No idea what is up

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Jan 10, 2017

Contributor

Jenkins Bazel Build failed for commit ac49139. Full PR test history.

The magic incantation to run this job again is @k8s-bot bazel test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Contributor

k8s-ci-robot commented Jan 10, 2017

Jenkins Bazel Build failed for commit ac49139. Full PR test history.

The magic incantation to run this job again is @k8s-bot bazel test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 10, 2017

Member

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/39551/pull-kubernetes-unit/12293/ <- same open API stuff that was failing over the weekend. No idea what is up

Failing UT issue looks like #39604 which is a marked a flake, so it should pass if we run it often enough.

Member

saad-ali commented Jan 10, 2017

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/39551/pull-kubernetes-unit/12293/ <- same open API stuff that was failing over the weekend. No idea what is up

Failing UT issue looks like #39604 which is a marked a flake, so it should pass if we run it often enough.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 10, 2017

Member

@k8s-bot bazel test this

Member

saad-ali commented Jan 10, 2017

@k8s-bot bazel test this

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 10, 2017

Member

/lgtm

Member

saad-ali commented Jan 10, 2017

/lgtm

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 10, 2017

Member

Marking P0 to get merged ASAP for tomorrow's release.

Member

saad-ali commented Jan 10, 2017

Marking P0 to get merged ASAP for tomorrow's release.

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Jan 10, 2017

Contributor

Automatic merge from submit-queue (batch tested with PRs 39628, 39551, 38746, 38352, 39607)

Contributor

k8s-merge-robot commented Jan 10, 2017

Automatic merge from submit-queue (batch tested with PRs 39628, 39551, 38746, 38352, 39607)

@k8s-merge-robot k8s-merge-robot merged commit 7c3fff1 into kubernetes:master Jan 10, 2017

14 of 15 checks passed

code-review/reviewable 9 files, 23 discussions left (chrislovecnm, gnufied, jingxu97, kris-nova, liggitt, saad-ali)
Details
Jenkins Bazel Build Build succeeded.
Details
Jenkins CRI GCE Node e2e Build succeeded.
Details
Jenkins GCE Node e2e Build succeeded.
Details
Jenkins GCE e2e Build succeeded.
Details
Jenkins GCE etcd3 e2e Build succeeded.
Details
Jenkins GCI GCE e2e Build succeeded.
Details
Jenkins GCI GKE smoke e2e Build succeeded.
Details
Jenkins GKE smoke e2e Build succeeded.
Details
Jenkins Kubemark GCE e2e Build succeeded.
Details
Jenkins kops AWS e2e Build succeeded.
Details
Jenkins unit/integration Build succeeded.
Details
Jenkins verification Build succeeded.
Details
Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation chrislovecnm authorized
Details
@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 10, 2017

Member

Chris this is merged. Can you please prepare the cherry pick to 1.5 and once that is complete we can do the 1.4 cherry pick

Member

saad-ali commented Jan 10, 2017

Chris this is merged. Can you please prepare the cherry pick to 1.5 and once that is complete we can do the 1.4 cherry pick

jessfraz added a commit that referenced this pull request Jan 11, 2017

Merge pull request #39700 from saad-ali/automated-cherry-pick-of-#39551
…-upstream-release-1.4

Automated cherry pick of #39551 upstream release 1.4

k8s-merge-robot added a commit that referenced this pull request Jan 11, 2017

Merge pull request #39697 from saad-ali/automated-cherry-pick-of-#39551
…-upstream-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #39551 upstream release 1.5

Automated cherry pick of #39551 ("Increasing times on reconciling volumes fixing impact to AWS") to upstream release 1.5

@saad-ali saad-ali changed the title from Increasing times on reconciling volumes fixing impact to AWS. to Enable changing times on volume attach/detach reconciling sync to fixing impact to AWS Jan 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment