Packet: Switch to official Flatcar images from Packet #28

johananl · 2019-06-20T20:28:23Z

This PR switches from iPXE-booting Flatcar to using official Flatcar images from Packet.

Also fixes #27.

We should merge this only after Flatcar is generally available in all Packet facilities.

Testing

At the moment official Flatcar images are available only in the sjc1 and nrt1 facilities in preview mode, so this PR can be tested only using these facilities.

Please use the updated docs from this PR when testing since the PR includes variable changes.

rata

It looks fine, there are some border cases and some functionality (like RAID arrays, to which disk the OS is installed when the server has several, etc.) that we may want to pay special attention. But seems great :)

packet/flatcar-linux/kubernetes/workers/workers.tf

packet/flatcar-linux/kubernetes/workers/variables.tf

packet/flatcar-linux/kubernetes/controllers.tf

invidian

Looks like @rata's review covered all issues, which are:

check on which disk the OS is installed
restore raid building functionality, maybe we move it to ignition config?

I tested it with lokoctl and everything seems to work, so LGTM.

invidian · 2019-06-21T09:43:21Z

core@test-pool-1-worker-0 ~ $ sudo fdisk -l /dev/sda
Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: SSDSCKJB120G7R  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 18D4FFCD-565C-41F3-A945-C038B5C61C5F

Device       Start       End   Sectors   Size Type
/dev/sda1     4096    266239    262144   128M EFI System
/dev/sda2   266240    270335      4096     2M BIOS boot
/dev/sda3   270336   2367487   2097152     1G unknown
/dev/sda4  2367488   4464639   2097152     1G unknown
/dev/sda6  4464640   4726783    262144   128M Linux filesystem
/dev/sda7  4726784   4857855    131072    64M unknown
/dev/sda9  4857856 234441614 229583759 109.5G unknown
core@test-pool-1-worker-0 ~ $ sudo fdisk -l /dev/sdb
Disk /dev/sdb: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: SSDSCKJB120G7R  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
core@test-pool-1-worker-0 ~ $ sudo fdisk -l /dev/sdc
Disk /dev/sdc: 447.1 GiB, 480103981056 bytes, 937703088 sectors
Disk model: SSDSC2KB480G7R  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
core@test-pool-1-worker-0 ~ $ sudo fdisk -l /dev/sdd
Disk /dev/sdd: 447.1 GiB, 480103981056 bytes, 937703088 sectors
Disk model: SSDSC2KB480G7R  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk layout for c2.medium worker after installation.

My only concern is that we have 2 disks for OS, but we don't use RAID 1... But in case of CoreOS, that doesn't really matter I believe, since we shouldn't keep data on OS disk anyway.

packet/flatcar-linux/kubernetes/workers/workers.tf

docs/flatcar-linux/packet.md

invidian · 2019-07-16T09:34:02Z

Since Flatcar Images are not yet available in all regions and we still have some issues to address, I guess we should delay merging this PR, what do you think?

dongsupark · 2019-07-16T09:35:33Z

I've tested this PR with the following options:

Channel: stable, beta, alpha
Facility: ams1, ewr1, nrt1, sjc1
Type: t1.small.x86, c2.medium.x86

Everything works fine.
It's really great that it takes only 10~12 minutes until a working k8s cluster became provisioned.

I've seen the issue of multiple block devices with c2.medium.x86 as mentioned above.
I'm not completely sure, but it looks like that's not a common use case. I would say, that should be a separate issue we should further investigate.

So +1 for merging this PR.

rata · 2019-07-16T10:13:04Z

@dongsupark I strongly disagree on merging this.

There are two things to address:

Setup RAID 0 array (that was before but this PR removes it, see here: Packet: Switch to official Flatcar images from Packet #28 (comment))
Packet Flatcar Linux images create issues with node local storage.

Let me elaborate a little: provisioning works with c2.medium.x86 but it uses a "random" disk to install (this is the way the Packet Flatcar Linux image is currently working). Is not just c2.medium.x86, but probably all instances with more than one disk. This, basically, breaks reasonable usage of node local storage on servers with more than one disks, and it would be a regression comparing current master. I explained the problem in more detail in this comment: #28 (comment)

As that is out of our control (is the way current beta Packet images work) and:

Is currently working on master without using them
Already have a meeting with Packet and they will fix it soon

I think we should wait, so we merge this when it won't break existing users of node local storage.

What do you think? Or am I missing something?

dongsupark · 2019-07-16T10:39:16Z

Ok, if you say so, I'm fine with waiting until Packet fixed the issues.
I'm just wondering, if it's simply possible to fall back to the ipxe variant for c2.medium.x86, like it's done in the current master branch. That would be a fairly simple workaround.
That's why I'm not sure if the multiple block storages issue is a hard blocker.

rata · 2019-07-16T10:52:41Z

@dongsupark It is not just c2.medium, but any server with more than one disk. I think only t1.small.x86 and s1.small.x86 may be the only ones with one disk (had a quick look at the site, not sure).

But yeah, that is a valid work around too. Not sure if it's worth the complexity, but maybe is because I'm too lazy. I'm okay with that temporary approach :)

indradhanush · 2019-07-19T11:17:11Z

Came here to say that I just provisioned a 1 controller - 5 worker cluster from the commit a59aa87 from this PR and it just worked! 🎉

Edit: Cluster is of t2.small nodes.

packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl

rata

LGTM and all my comments seem to have been addressed (haven't tested myself, though) and Packet reported images are available on all regions

kosyfrances · 2019-08-26T08:28:52Z

@johananl pending when this PR gets merged, please can you help ensure that this branch is up to date with master in the mean time?

invidian · 2019-08-26T08:56:24Z

@kosyfrances since @johananl is usually busy, maybe you can rebase it and test if you need it? It seems that new RAID configuration options needs to be ported here. You can push to PR branch without problems.

Did we confirm with Packet that OS is now installed on smallest disks always?

johananl · 2019-08-26T14:58:34Z

pothos · 2019-08-26T15:17:44Z

So, the variables for SSD and HDD RAID are present but not used now, right?

johananl · 2019-08-26T15:23:01Z

So, the variables for SSD and HDD RAID are present but not used now, right?

Oh, just realized we don't have install.yaml.tmpl anymore. I guess it doesn't make sense to have these vars in this PR then.

Any idea what we should do here?

johananl · 2019-08-26T15:40:27Z

Summarizing offline discussion with @pothos:

We should port the RAID-related logic from install.yaml.tmpl to worker.yaml.tmpl so that the setup_raid_* vars take effect.

We should consider the refactoring in #54 when doing so.

pothos · 2019-08-26T18:42:29Z

I updated my last commit to include the disk-type-specific RAID setup. While porting I fixed some bugs and am testing it now. If worth the effort to rebase this here once more I can also make PRs for the spotted bugs (commented in #48).

invidian

Seems OK (didn't test) 👍

rata

Overall, LGTM. There is a backwards incompatible change (the /mnt/mount thing) that we may want to think about before merging, IMHO

packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl

The RAIDs were setup during the installation step which is not used anymore. Port the RAID setup code from the deleted worker installation template. Bugs are fixed and the code is made more robust against accessing mounted drives. It also got a bit simplified because lsblk can sort by itself and the RAID mounting code does not need to scan for drives but can directly reuse the information from the setup.

rata

LGTM!

I'd love to simplify that logic in the future, though. Maybe we can create an issue? There might be some paths to explore, like adding local disks info to the metadata (/run/metadata/flatcar)?

johananl · 2019-08-27T13:39:40Z

Thanks for helping @pothos.

I guess my only remaining concerns before merging are:

We didn't test all Packet regions.
We don't have automatic updates of new Flatcar versions on Packet, so new clusters will use an older image.

As far as I'm concerned we can merge since we've tested our most "important" regions and we can probably fix problems in others by pinging Packet after merging.

Regarding Flatcar updates - I'm personally fine with using a slightly-older Flatcar image for clusters until we have an automated publishing mechanism for images on Packet.

pothos · 2019-08-27T14:09:30Z

I think the update issue was solved because there is a mirror script running regularly and the installer fetches the image from the local mirror while installing.

dongsupark · 2019-08-27T14:22:16Z

I've just tested this PR in ewr1. It works well.
So it's confirmed that it works in all 4 major regions.
It installs still Flatcar stable 2079.6.0, which is a little behind.
Anyway I don't think that's a blocker, as there's nothing we can do about it.
👍 for merging it.

pothos · 2019-08-27T18:34:41Z

I'm testing it on all available regions now (I don't see all of the list). On FRA2, DFW2, HKG1 was the latest stable version. If we observe that some regions don't catch up in the next days, we can report it.
For SYD1, SIN1 and Seattle (SEA1? it's gone now from the list) I got provisioning failures – maybe it's a general thing, I used the web UI which doesn't show the reason.

rata · 2019-08-28T08:49:13Z

I've tested all the regions a few days ago (past Friday?) and it worked in all but SIN1, report it and they fixed it. So, I feel confident about that too :-)

This RAID 0 array should be created and mounted before the kubelet starts, as pods may need to use the node local storage to run on this node. This dependency was lost in this PR: #28 Before this PR the RAID 0 array was created when IPXE booting and was already created for the node restart. After this PR there is no network boot so the RAID 0 array is created in the same boot that the kubelet starts, but we missed to add that dependency that was implicit before (as different systemd units were executed on different boots, and therefore the array was always created when the kubelet was about to start). This PR is just a cleanup to have the dependency, but note that this doesn't fix node local storage and the previous commit in this PR is needed for that.

This RAID 0 array should be created and mounted before the kubelet starts, as pods may need to use the node local storage to run on this node. This dependency was lost in this PR: #28 Before this PR, the RAID 0 array was created when IPXE booting and therefore already existed when the node was rebooted after faltcat-install. After this PR there is no IPXE boot so the RAID 0 array is created in the same boot that the kubelet starts, but we missed to add that dependency that was implicit before (as different systemd units were executed on different boots, and therefore the array was always created when the kubelet was about to start). This PR is just a cleanup to have the dependency, but note that this doesn't fix node local storage and the previous commit in this PR is needed for that.

johananl requested review from rata, indradhanush, surajssd, kosyfrances and invidian June 20, 2019 20:28

johananl force-pushed the johananl/packet-no-ipxe branch from 89fccd8 to c735b06 Compare June 20, 2019 20:34

rata reviewed Jun 20, 2019

View reviewed changes

johananl mentioned this pull request Jun 20, 2019

Packet "phone home" is failing #27

Closed

invidian suggested changes Jun 21, 2019

View reviewed changes

rata reviewed Jul 4, 2019

View reviewed changes

packet/flatcar-linux/kubernetes/workers/workers.tf Show resolved Hide resolved

johananl force-pushed the johananl/packet-no-ipxe branch from c735b06 to 79f0bd8 Compare July 5, 2019 14:49

rata reviewed Jul 5, 2019

View reviewed changes

docs/flatcar-linux/packet.md Show resolved Hide resolved

iaguis mentioned this pull request Jul 31, 2019

packet: first stab at cluster autoscaler #46

Closed

rata reviewed Aug 15, 2019

View reviewed changes

packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl Show resolved Hide resolved

pothos force-pushed the johananl/packet-no-ipxe branch 5 times, most recently from f4248f3 to a2aa822 Compare August 15, 2019 16:33

pothos approved these changes Aug 16, 2019

View reviewed changes

rata approved these changes Aug 16, 2019

View reviewed changes

johananl requested review from invidian, rata and pothos August 26, 2019 15:07

pothos force-pushed the johananl/packet-no-ipxe branch from c6c5792 to 480792c Compare August 26, 2019 18:39

invidian changed the title ~~[WIP] Packet: Switch to official Flatcar images from Packet~~ Packet: Switch to official Flatcar images from Packet Aug 27, 2019

invidian approved these changes Aug 27, 2019

View reviewed changes

rata reviewed Aug 27, 2019

View reviewed changes

packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl Show resolved Hide resolved

packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl Outdated Show resolved Hide resolved

rata self-requested a review August 27, 2019 10:37

pothos force-pushed the johananl/packet-no-ipxe branch from 480792c to fc82c3a Compare August 27, 2019 10:46

pothos approved these changes Aug 27, 2019

View reviewed changes

rata approved these changes Aug 27, 2019

View reviewed changes

pothos merged commit a459e97 into master Aug 27, 2019

pothos deleted the johananl/packet-no-ipxe branch August 27, 2019 19:25

dongsupark mentioned this pull request Oct 9, 2019

Bring back iPXE installation for Flatcar ARM images #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packet: Switch to official Flatcar images from Packet #28

Packet: Switch to official Flatcar images from Packet #28

johananl commented Jun 20, 2019 •

edited

Loading

rata left a comment

invidian left a comment

invidian commented Jun 21, 2019 •

edited

Loading

invidian commented Jul 16, 2019

dongsupark commented Jul 16, 2019

rata commented Jul 16, 2019 •

edited

Loading

dongsupark commented Jul 16, 2019

rata commented Jul 16, 2019

indradhanush commented Jul 19, 2019 •

edited

Loading

rata left a comment •

edited

Loading

kosyfrances commented Aug 26, 2019

invidian commented Aug 26, 2019

johananl commented Aug 26, 2019 •

edited by pothos

Loading

pothos commented Aug 26, 2019

johananl commented Aug 26, 2019

johananl commented Aug 26, 2019

pothos commented Aug 26, 2019

invidian left a comment

rata left a comment

rata left a comment

johananl commented Aug 27, 2019

pothos commented Aug 27, 2019

dongsupark commented Aug 27, 2019

pothos commented Aug 27, 2019 •

edited

Loading

rata commented Aug 28, 2019

Packet: Switch to official Flatcar images from Packet #28

Packet: Switch to official Flatcar images from Packet #28

Conversation

johananl commented Jun 20, 2019 • edited Loading

Testing

rata left a comment

Choose a reason for hiding this comment

invidian left a comment

Choose a reason for hiding this comment

invidian commented Jun 21, 2019 • edited Loading

invidian commented Jul 16, 2019

dongsupark commented Jul 16, 2019

rata commented Jul 16, 2019 • edited Loading

dongsupark commented Jul 16, 2019

rata commented Jul 16, 2019

indradhanush commented Jul 19, 2019 • edited Loading

rata left a comment • edited Loading

Choose a reason for hiding this comment

kosyfrances commented Aug 26, 2019

invidian commented Aug 26, 2019

johananl commented Aug 26, 2019 • edited by pothos Loading

pothos commented Aug 26, 2019

johananl commented Aug 26, 2019

johananl commented Aug 26, 2019

pothos commented Aug 26, 2019

invidian left a comment

Choose a reason for hiding this comment

rata left a comment

Choose a reason for hiding this comment

rata left a comment

Choose a reason for hiding this comment

johananl commented Aug 27, 2019

pothos commented Aug 27, 2019

dongsupark commented Aug 27, 2019

pothos commented Aug 27, 2019 • edited Loading

rata commented Aug 28, 2019

johananl commented Jun 20, 2019 •

edited

Loading

invidian commented Jun 21, 2019 •

edited

Loading

rata commented Jul 16, 2019 •

edited

Loading

indradhanush commented Jul 19, 2019 •

edited

Loading

rata left a comment •

edited

Loading

johananl commented Aug 26, 2019 •

edited by pothos

Loading

pothos commented Aug 27, 2019 •

edited

Loading