Skip to content

Conversation

@deitch
Copy link
Contributor

@deitch deitch commented Apr 15, 2019

What this PR does / why we need it:

Add Packet support to the kubermatic machine-controller

Special notes for your reviewer:

Followed the document instructions at here .

I am fairly certain we missed something. Also not sure the easiest way to test it. Would request we do the following:

  1. Have someone do a quick look at it to see if there is something glaringly obvious.
  2. Have someone write here the best way to test just this against Packet.
Add Packet cloud support.

@kubermatic-bot kubermatic-bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 15, 2019
@kubermatic-bot
Copy link
Contributor

Hi @deitch. Thanks for your PR.

I'm waiting for a kubermatic member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubermatic-bot kubermatic-bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 15, 2019
@alvaroaleman
Copy link
Contributor

/ok-to-test

@kubermatic-bot kubermatic-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 15, 2019
@deitch
Copy link
Contributor Author

deitch commented Apr 15, 2019

Fixed formatting issues.

@deitch
Copy link
Contributor Author

deitch commented Apr 15, 2019

/retest

@deitch
Copy link
Contributor Author

deitch commented Apr 15, 2019

/retest

@deitch deitch force-pushed the packet-provider branch 5 times, most recently from f20ace1 to ec9d34f Compare April 15, 2019 12:31
@deitch deitch force-pushed the packet-provider branch 4 times, most recently from 55a10ed to 0f95c62 Compare April 15, 2019 14:49
@deitch
Copy link
Contributor Author

deitch commented Apr 15, 2019

/retest

@alvaroaleman
Copy link
Contributor

The tests fail btw because of

# github.com/kubermatic/machine-controller/pkg/cloudprovider/provider/packet
pkg/cloudprovider/provider/packet/provider.go:18:2: imported and not used: "github.com/kubermatic/machine-controller/vendor/k8s.io/apimachinery/pkg/runtime"

@alvaroaleman alvaroaleman self-assigned this Apr 16, 2019
@alvaroaleman alvaroaleman added the sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. label Apr 16, 2019
@deitch
Copy link
Contributor Author

deitch commented Apr 18, 2019

The tests fail btw because of

Strange that it didn't pick it up locally. Oh well.

@deitch
Copy link
Contributor Author

deitch commented Apr 18, 2019

/retest

@deitch
Copy link
Contributor Author

deitch commented Apr 22, 2019

Rebased with correct userdata based on #504 . Restarting packet tests

@deitch
Copy link
Contributor Author

deitch commented Apr 22, 2019

/test pull-machine-controller-e2e-packet

@alvaroaleman
Copy link
Contributor

The tests btw failed because of some gcr issue that broke the kubeadm installation:

[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.14.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/kube-scheduler/manifests/v1.14.0: Get https://k8s.gcr.io/v2/token?scope=repository%3Akube-scheduler%3Apull&service=k8s.gcr.io: net/http: TLS handshake timeout

and

[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.14.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/kube-controller-manager/manifests/v1.14.0: Get https://k8s.gcr.io/v2/token?scope=repository%3Akube-controller-manager%3Apull&service=k8s.gcr.io: net/http: TLS handshake timeout

c, _, _, err := p.getConfig(machine.Spec.ProviderSpec)
if err == nil {
labels["size"] = c.InstanceType
labels["facilities"] = strings.Join(c.Facilities, ",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely happy about this, as it doesn't actually tell the user in which facility the instance ended up being if there was more than one submitted. But I am not sure what a better solution is, as doing an API call every time MachineMetricsLabels gets called is definitely too expensive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It is an interesting feature of Packet's API that you can specify one, multiple or "any" facility. See here:

The facilities attribute specifies in what datacenter you wish to create the device.

You can either specify a single facility { "facility": "f1" } , or you can instruct to create the device in the best available datacenter { "facility": "any" }. Additionally it is possible to set a prioritized location selection.

For example { "facility": ["f3", "f2", "any"] } will try to assign to the facility f3, if there are no available f2, and so on. If "any" is not specified for "facility", the request will fail unless it can assign in the selected locations.

When you actually do Create() the device, it returns a device as well. I don't know if that returns the requested facilities or actual facility. I have some go code with a quick experiment to find out.

If it does, could we not just retrieve it, since we save the device at the end of Create()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. The return from the call does tell you what facility it is in, and we save that:

func (p *provider) Create(machine *v1alpha1.Machine, _ *cloud.MachineCreateDeleteData, userdata string) (instance.Instance, error) {
        // ...
        device, res, err := client.Devices.Create(serverCreateOpts)  
        // This "device" contains it in `device.Facility.Name`
        // ...
        // and we return it here
        return &packetDevice{device: device}, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not that easy, because we would need to store this state somewhere, probably in .Status.ProviderStatus. We can leave that as a follow-up thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. We do have it, though, so that is a good first step. It is easier to find a place to store something we have, than to find a way to get that which we do not. :-)

return &packetDevice{device: device}, nil
}

return nil, cloudprovidererrors.ErrInstanceNotFound
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the API can not find the instance it does return an empty error and a nil device...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, did I get it backwards? Oops. Fixing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, wait, that makes sense. It returns nil for the device and cloudprovidererrors.ErrInstanceNotFound for the error. I took this from all of the other cloud providers, see here, here, here, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay it actually does work, because getDeviceByTag returns a nil error if it can't find the device. I personally find that subpar, because this means all users of p.getPacketDevice must check if the returned device is non-nil (Which they currently don't do). What about making getDeviceByTag return an error if it cant find the device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a hard time following your logic. I will try to work it through here.

  1. getDeviceByTag() (internal only): gets the device that matches the tag, or nil if none is found. Returns a non-nil error if there is a real error.
  2. p.getPacketDevice() (internal only): calls getDeviceByTag(). It really just wraps getDeviceByTag() while providing some additional convenience (I probably could subsume getDeviceByTag() into it, since it no longer is used elsewhere, but I like small, easy-to-understand functions). It returns the device if found, nil if not; error only if there was an error.
  3. The internal convenience function p.getPacketDevice() is consumed by p.Get() and p.MigrateUID(), which check that...

Ah, I see. You are saying that the API from p.getPacketDevice() is "return nil if no device found", but not everyone is respecting that, particularly p.MigrateUID().

OK, that is a bug in my implementation of p.MigrateUID(). I will fix that.

I would like to leave p.getPacketDevice() as is, as I tend to view 3 distinct use cases:

  • found the device: device set; err = nil
  • had a comms or auth error: device = nil; err set
  • no error but no matching device: device = nil; err = nil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree except for no error but no matching device: device = nil; err = nil.. That is counterintuitive, if I get no error I expect the device to be non-nil. Also it makes for redundant nil-checks everywhere p.getPacketDevice is used.

Where in the Hetzner or AWS code do you see a nil error return when no instance was found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree except for no error but no matching device: device = nil; err = nil.. That is counterintuitive, if I get no error I expect the device to be non-nil

To each their own. Since this is internal to the provider, we can agree to disagree.

Where in the Hetzner or AWS code do you see a nil error return when no instance was found?

In MigrateUID() AWS here and Hetzner here

Copy link
Contributor

@alvaroaleman alvaroaleman Apr 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MigrateUID is a special case and only returns a nil error if p.Get returned err == cloudprovidererrors.ErrInstanceNotFound.

This is really prone to result in bugs in the future. Why would I assume that if I call p.getPacketDevice() and err == nil device can also be nil and has to be checked separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like what would you think of a rest api that returned for a GET call a http/200 and no body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would I assume that if I call p.getPacketDevice() and err == nil device can also be nil and has to be checked separately?

That is pretty common, actually.

Like what would you think of a rest api that returned for a GET call a http/200 and no body?

It depends. GET /items/1234 should return a 404 for not found, and a 200 for found. But GET /items or a search like GET /items?name=jim should return 404 only if the path GET /items does not exist. If none matches name=jim, then it should return a 200 with an empty set. That has been a longstanding debate among REST implementers/builders for about as long as Fielding's paper has been out (almost 30 years), but I always came down on that side.

I think, though, that we are spending too much time on this. It is a purely internal private function. It works, is clearly defined. Let's just leave it?

@deitch deitch force-pushed the packet-provider branch 2 times, most recently from ee23c95 to aef31e3 Compare April 23, 2019 08:24
@deitch
Copy link
Contributor Author

deitch commented Apr 23, 2019

Formatting was messed up. I fixed and repushed.

@alvaroaleman
Copy link
Contributor

/test pull-machine-controller-e2e-packet

@alvaroaleman
Copy link
Contributor

/lgtm
/approve
/hold

so @mrIncompetent can have a look

@kubermatic-bot kubermatic-bot added lgtm Indicates that a PR is ready to be merged. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 24, 2019
Copy link
Contributor

@mrIncompetent mrIncompetent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR :)
Just a few smaller things and we're ready to merge

@kubermatic-bot kubermatic-bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 24, 2019
@mrIncompetent
Copy link
Contributor

/lgtm
/approve

/hold cancel

@kubermatic-bot kubermatic-bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 24, 2019
@kubermatic-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 150b8612ee30d68633a2efcfc6d672c6d0219b19

@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, mrIncompetent

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alvaroaleman,mrIncompetent]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrIncompetent
Copy link
Contributor

/retest

@kubermatic-bot kubermatic-bot merged commit b9af737 into kubermatic:master Apr 24, 2019
@deitch deitch deleted the packet-provider branch April 24, 2019 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants