Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added flannel migration FV. #406

Merged
merged 1 commit into from Sep 10, 2019

Conversation

@song-jiang
Copy link
Member

song-jiang commented Sep 8, 2019

Description

Added Flannel migration FV.

Todos

  • Tests
  • Documentation
  • Release note

Release Note

None required
@@ -107,8 +107,8 @@ func (m ipamMigrator) InitialiseIPPoolAndFelixConfig() error {
}

// Create Calico IPAM blocks for a Kubernetes node.
func (m ipamMigrator) SetupCalicoIPAMForNode(node *v1.Node) error {
log.Infof("Start setting up Calico IPAM for node %s.", node.Name)
func (m ipamMigrator) SetupCalicoIPAMForNode(node *v1.Node, index int) error {

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Sep 9, 2019

Member

Is passing the index just for logging purposes?

This comment has been minimized.

Copy link
@song-jiang

song-jiang Sep 9, 2019

Author Member

It is for log and FV. I got an FV test case to check node ordering by capturing node xxx[index 2] for example.

@@ -214,7 +231,7 @@ func (c *flannelMigrationController) processNewNode(node *v1.Node) {
}

log.Infof("Start processing new node %s.", node.Name)
err = c.ipamMigrator.SetupCalicoIPAMForNode(node)
err = c.ipamMigrator.SetupCalicoIPAMForNode(node, -1)

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Sep 9, 2019

Member

Isn't this going to produce weird log messages which say we're processing index -1?

I feel like this will be confusing for users and future readers of this code. At a minimum we should comment what this value means and why it is here, but if possible I think it would be good to do something more clear (like moving that log with the index out of SetupCalicoIPAMForNode so the calling code can log out an appropriate message).

This comment has been minimized.

Copy link
@song-jiang

song-jiang Sep 10, 2019

Author Member

Agree. It is a bit hacky to do that. I will make changes as you suggested.

func (m *networkMigrator) setupCalicoNetworkForNode(node *v1.Node) error {
log.Infof("Setting node label to disable Flannel daemonset pod on %s.", node.Name)
func (m *networkMigrator) setupCalicoNetworkForNode(node *v1.Node, index int) error {
log.Infof("Setting node label to disable Flannel daemonset pod on node %s[index %d].", node.Name, index)

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Sep 9, 2019

Member

Same for this - can we move the log into the loop below so that we don't need to pass the index just for logging purposes?

// Add 3 seconds delay before main thread starts, this is to make sure FV can add watch channels
// before controller logs out to stderr.
// (container.Run returns after 'docker ps' shows the container, the polling interval is 1 second.)
// Add 60 seconds delay before main thread exits, this is to make sure controller is still running

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Sep 9, 2019

Member

I don't really understand this comment - we don't seem to need this for our other FV tests for kube-controllers. Maybe we should chat about this more so I understand.

This comment has been minimized.

Copy link
@song-jiang

song-jiang Sep 10, 2019

Author Member

Sure. It is hard to describe the issue as a comment in the code.
kube-controllers focus more on the consistency between k8s resources and calico resources. migration-controller is more about applying correct steps to existing k8s resources. Most of the migration controller test cases rely on watching logs to validate the workflow of the controller. It is not perfect but probably is the most efficient way to test different flows, error handlings etc. without adding more complicated code to output those flows to some external storage and do validations.

So a typical test case for migration-controller would look like this

1. c := container.Run(`migration-controller with env`)
2. w := c.AddWatchStderr(Regex("target string"))
3. Eventually(w is closed) at timeout of 20 seconds.
4. c.Stop()

The problem we have is container.Run will return after the container is listed by docker ps. The felix FV code polling status of docker ps at a 1 second interval. So it is possible that at the time container.Run returns, the migration-controller has been running for 1 seconds and exits. Adding a watch after container is stopped does not work. To fix it I added 3 seconds delay so we can add watch before the migration-controller logs out anything.

Adding a 60 seconds delay before exits is to make sure container is still running and be stopped correctly after Eventually timeout.

We could improve Felix FV code by providing interface to do something like

1. c := container.Create(`migration-controller with env`)
2. w := c. AddWatchStderr(Regex("target string"))
3. c.Start()
4. Eventually(w is closed) at timeout of 20 seconds.
4. c.Stop()

However, I thought it would be too risky to make such changes since we are so close for a major release.

}, 30*time.Second, 1*time.Second).Should(BeNil())

// Apply the necessary CRDs.
out, err := apiserver.ExecOutput("kubectl", "apply", "-f", "crds.yaml")

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Sep 9, 2019

Member

We seem to support a CRDS_FILE environment variable.

"-v", os.Getenv("CRDS_FILE")+":/crds.yaml",

I don't seem to see where we're using crds.yaml for our other tests.... do they not need this file?

This comment has been minimized.

Copy link
@song-jiang

song-jiang Sep 10, 2019

Author Member

I think we do use crds.yaml for every test cases. Migration controller FV creates api-server and apply crds.yaml inside BeforeEach for all test cases.
https://github.com/song-jiang/kube-controllers/blob/song-f2c-fv/tests/fv/flannel_migration_test.go#L122

})

Context("Healthcheck FV tests", func() {
It("should pass health check", func() {

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Sep 9, 2019

Member

Huh, I didn't realize we had health checks in the migration controller. I can't seem to find that code - could you point me at it?

This comment has been minimized.

Copy link
@song-jiang

song-jiang Sep 10, 2019

Author Member

Migration controller does not has it's own health checks but it use standard checks for api server connection.
https://github.com/projectcalico/kube-controllers/blob/master/cmd/kube-controllers/main.go#L183

Migration controller manifest enables pod health check.

          readinessProbe:
            exec:
              command:
              - /usr/bin/check-status
              - -r

Context("Node ordering FV tests", func() {
checkNodeOrdering := func() {
// Set watch for node index.
w0 := migrationController.WatchStderrFor(regexp.MustCompile(`.*node-2\[index 0\].*`))

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Sep 9, 2019

Member

Is this the best way to tell when the migration controller has finished a node?

Can't we check for the labels on that node to know when it was completed? I'm just a bit skeptical of tying our tests to specific wording in our log messages.

This comment has been minimized.

Copy link
@caseydavenport

caseydavenport Sep 9, 2019

Member

Ah, I guess it's doing more than just making sure they all complete - it is also checking that the node matches the index. I guess this is OK.

This comment has been minimized.

Copy link
@song-jiang

song-jiang Sep 10, 2019

Author Member

Yes. It is about node been processed in correct order.

Copy link
Member

caseydavenport left a comment

@song-jiang just a few comments and queries, but overall this looks fine. I think we can merge without me taking another look.

@song-jiang song-jiang force-pushed the song-jiang:song-f2c-fv branch from 35af72f to b91e420 Sep 10, 2019
@song-jiang

This comment has been minimized.

Copy link
Member Author

song-jiang commented Sep 10, 2019

Rebased and squashed.

@song-jiang song-jiang merged commit a7d3adf into projectcalico:master Sep 10, 2019
2 checks passed
2 checks passed
license/cla Contributor License Agreement is signed.
Details
semaphoreci The build passed on Semaphore.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.