Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SystemDisk and DATA partition #4041

Closed
sergelogvinov opened this issue Aug 9, 2021 · 34 comments
Closed

SystemDisk and DATA partition #4041

sergelogvinov opened this issue Aug 9, 2021 · 34 comments

Comments

@sergelogvinov
Copy link
Sponsor Contributor

Feature Request

May be this casein be useful only on bare-metal setup.

I have only 1Tb disk on the server. And i want to store some data (cache/database replica/zfs cache/and other cases) on this disk.
Sometimes the bad things happen with containerd/kubelet and very easy solution to solve it - just format the EPHEMERAL store. But we can lost the data (case from the slack)

So, proposal:

Add feature to create special (DATA) partition on the system disk.

  install:
    dataMountPoint: /var/data
    ephemeralDiskSize: 64GB

    diskSelector:
      size: ">128GB"
    bootloader: true
  systemDiskEncryption:
    state:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    ephemeral:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    data:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0

I've added 2 new keys

  • ephemeralDiskSize - if exist - the installer resize the ephemeral partition to this size, and all other free space allocate the the DATA partition
  • dataMountPoint - if exist - format the DATA partition and mount it.

And keep possible to encrypt DATA store too.

Thanks.

@sergelogvinov
Copy link
Sponsor Contributor Author

UPD.

  install:
    dataMountPoint: /var/data
    osSize: 64GB # RenameMe

    diskSelector:
      size: ">128GB"
    bootloader: true
  systemDiskEncryption:
    state:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    ephemeral:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    data:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0

osSize (or another name) - full size of all system partitions (BOOT+META+STATE+EPHEMERAL)
We define the full size of the system which helps to upgrade system if count of partitions change.

@hobyte
Copy link

hobyte commented May 8, 2022

Is there any Update on this feature Request? In siderolabs/go-blockdevice#50 (comment), the data partition is defered to v0.14, so is this feature still in development or is ist usable?

@smira
Copy link
Member

smira commented May 10, 2022

there's still no clear decision on whether we want to have DATA partition or not on the system disk

@nickbp
Copy link

nickbp commented Aug 30, 2022

I was surprised to discover that I couldn't just configure this as part of the machine.disks config:

    - device: /dev/sda
      partitions:
      - mountpoint: /
        size: 64 GB
      - mountpoint: /var/mnt/data
        # size: rest of disk

(This technically validates as-is today, but doesn't seem to get anywhere and if anything seems to trash the system)

@hobyte
Copy link

hobyte commented Aug 30, 2022

I noticed a guide for local storage in the talos docs. How is the relation of this guide with this feature request?
As far as I understand, the guide just mounts a directory on an existing partition into the kubelet containter while this feature request is about creating a new partition for local storage only.
Am I right with this assumption?

@smira
Copy link
Member

smira commented Sep 2, 2022

This request I believe is to have part of Talos system disk not owned by Talos but given to the workloads.

@smira
Copy link
Member

smira commented Sep 19, 2022

machine:
  install:
     ephemeral:
        size: 64GiB
     data:
        size: <use-remaining-space>
        mountPoint: /var/data # optional, if not specified, don't format it
  systemDiskEncryption:
    data:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0        

@davralin
Copy link

@smira does this mean that this feature has been implemented?
And that would be an empty partition, or a pre-formatted drive?
Curious if I could point rook-ceph at /dev/sda4 or something like that...

@smira
Copy link
Member

smira commented Sep 21, 2022

no, feature hasn't been implemented, just adding some planning notes to understand what we're looking for. the design is not final yet, and no commitment on dates yet.

@vorburger
Copy link
Contributor

#2213 is about formatting partitions, which perhaps could be a pre-requisite to having part of this FR about the parts of system disk given to workloads.

@pl4nty
Copy link

pl4nty commented Sep 4, 2023

is this still in design? it'd be pretty useful for edge devices like NUCs

@smira
Copy link
Member

smira commented Sep 5, 2023

yes, still in design, the best thing is to drop your detailed use-cases to this ticket

@runningman84
Copy link

Use case are edge devices like intel nucs which could have just one rather big nvme device. Talos should only create a smaller system partition and leave the remaining space for things like longhorn or openebs.

@pl4nty
Copy link

pl4nty commented Sep 10, 2023

my current usecase isn't commercial, but the edge systems I've worked with usually have single-drive or mirrored-drive configurations. a few quick thoughts:

  • additional drives add procurement/sustainment costs and require specific device SKUs
  • prevents migration to Talos on existing single-drive hardware
  • edge workloads don't often need much storage, so sharing the drive would be fine
  • Talos is otherwise well-suited to the edge environments I've worked with - lightweight (vs RKE2 or Tanzu), supported and simple to sustain (vs k3s), security as first-class citizen (especially support for airgapped networks)

Chick-Fil-A also have some decent writeups on their edge k8s NUCs

@davralin
Copy link

davralin commented Sep 20, 2023

I think the primary use-case is homelab and edge-sites.

  • Homelabs with small NUC's (1x m2-drive, 1x SATA-drive)
  • Edge-devices with only one deployed node with several discs

I am in the first category, but at work we would seriously consider the second if it was possible.
That would mean that the data-partition should, at least with an option, be unformatted - so that it could be presented to rook-ceph.

  • edge workloads don't often need much storage, so sharing the drive would be fine

And also, the added redundancy at the drive-level is very nice (if your PVC supports it).

A very special case for me, is that I have one node running on the free oracle-cloud-instance, which is one huge instance, and where I can't use external storage without going away from the free tier.

@bcspragu
Copy link

bcspragu commented Oct 2, 2023

Chiming in here with a use case. I've been using Talos in a homelab in a 1 x control plane, 3 x worker setup, and I'm migrating that a single-node NUC/mini PC configuration. Like @davralin, the mini PC (Beelink GTR6) has:

  • 1x M.2 NVMe drive (512 GB in my case)
  • 1x M.2 SATA drive (2 TB in my case)

Since the SATA drive is limited to a few hundred MBps, I'd like to use 200-300GB on the NVMe for things that benefit from the faster drive, like databases and frequently accessed files, and leave the SATA drive for storage/backup/etc.

@jamcole
Copy link

jamcole commented Oct 11, 2023

I have been using 3 nuc-like devices with MicroOS and k3s with Longhorn.io for storage and plan to use a similar setup for more upcoming small site installs... It's a similar level of immutable with only the longhorn storage needing to be backed up.

Each nuc-like device already has 1TB nvme storage, and some devices simply don't have room for more storage, so it's hard to justify adding more SSDs as a requirement just to use Talos on this setup.

@jamesagarside
Copy link

Thought I'd add my thoughts here.
I currently run Talos on Raspberry Pis (so not commercial), I install Talos on the SD (64GB) card and have a 1TB NVMe drive attached via USB which I'd like to use for Ephemeral & Data as SDs are notoriously slow and crumble with high I/O workloads.

I've tried mounting the NVMe at /var/ but the Kubelet fails to start. Has anyone had a similar issue? I'd like any data which needs to persist to be store on the faster NMVe drive.

From what I can tell this issue captures part of this desire.

@Ulrar
Copy link

Ulrar commented Oct 29, 2023

Trying to setup backups for piraeus-operator / linstor and discovered that they only support uploading snapshots to s3 or other on LVM / ZFS backed volumes, so I guess the FILE_THIN backed ones we can create on talos can't be backed up properly. Would be awesome if talos could leave a partition to serve for lvm

@sergelogvinov
Copy link
Sponsor Contributor Author

Trying to setup backups for piraeus-operator / linstor and discovered that they only support uploading snapshots to s3 or other on LVM / ZFS backed volumes, so I guess the FILE_THIN backed ones we can create on talos can't be backed up properly. Would be awesome if talos could leave a partition to serve for lvm

Hello, you can dd system image to the disk, and add manually partition at the end (for LVM). It this case you can lose upgrade function. Talos can clean all partition table during upgrade.

@Kuresov
Copy link

Kuresov commented Jan 1, 2024

Chiming in--also using Intel NUC's and similar commercially available devices on edge. It would be great if Talos would reserve some configurable partition of the system drive for itself, and then leave the rest for applications. Most of our devices have a single large(ish) drive in them.

@btrepp
Copy link

btrepp commented Jan 5, 2024

I'll add in a non storage provider use-case.

I've written the tailscale extension for Talos, and this actually works great, but you do hit interesting edge cases in philosophies. Talos assumes it can wipe it's disks at any time, tailscale maintains a bit of state of private keys to identify devices. While tailscale can be configured to be more ephemeral, it then ends up being dynamic in IPS and names.

Ideally reserving a small (100mb say) partition for system extensions to store things would be great. That way tailscale can activate itself if needed, but also have it's data persist over upgrades

@stevefan1999-personal
Copy link

stevefan1999-personal commented Jan 21, 2024

This was not implemented for 2 years, and this would really help deploying Talos on small VPS hosts where storage space is fixed and very likely scarce. A real-life example would be having a 4C8G VPS and only 200GB SSD with no option to add an extra data disk.

If we can have extra data disk attached to the VPS, of course we can do Talos well. But if we don't, we would need to do a little bit of special tweaks. If installing Talos directly on a pre-carved partition wouldn't likely work, then this one I theorized would probably work: create a LVM PV on the physical drives and then carve a system logical volume and a data logical volume, and only install Talos on the system logical volume, leaving the data logical volume available to mount by other applications.

This, however, implies a recent version of GRUB and a kernel so that it can boot off LVM, and hopefully any Talos upgrade won't fuck up the partition scheme, and this will have implication on Ceph usage: Ceph currently won't accept OSD creation on an existing LVM logical volume to prevent LVM deadlock (alas, it is stacking LVM-on-LVM far as I can tell, and it is not smelling good already).

So, I guess you probably can't run Rook on it unless you cheat a little with the following workaround if you want to go in further: create two raw GPT partitions (example: on a 200GB disk), a system partition (example: 75G LVM) and a data partition (example: 125G RAW), and create a LVM PV/VG/LV on the system partition only, while leaving the data partition to be detected by Ceph and manage its own LVM over there.

This scheme however is obviously inflexible, as this means you would have a fixed system partition and needs careful planning beforehand, and in order to scale it, a full dupli-migration of the PV is needed, but it should theoretically work out nicely with Ceph.

--
Oh I just realized Talos also manages the bootloader and EFI partition. This is getting a little complicated I think, as you also have to preserve a small grub section to chainload onto the Talos bootloader inside LVM.

@smira
Copy link
Member

smira commented Jan 22, 2024

There're some changes in planning for Talos 1.7, hold on :)

@alexvanderberkel
Copy link

Are those changes already visible in the alpha release?

@smira
Copy link
Member

smira commented Feb 14, 2024

Not yet

@stevefan1999-personal
Copy link

Not yet

Maybe give a brief description of what would be done?

@ianatha
Copy link

ianatha commented Feb 14, 2024

@smira (& @utkuozdemir & @frezbo -- we briefly chatted during the community meeting) first of all, thank you for all your work.

@stevefan1999-personal, from the community meeting, I got the impression that #4041 (improved user control over EPHEMERAL partition), which relates, but is different, from #8016 (structured /var) is still in the very early stages. (I'm not affiliated with Sidero Labs/Talos in any capacity.)

@ianatha
Copy link

ianatha commented Feb 14, 2024

Also wanted to summarize the mega-threads regarding these interrelated issues and say that 80% of what people seem to need is that Talos doesn't take over the entirety of a disk. (I'm biased in that I also would like that feature.)

I'd like to suggest a maxEphemeralPartitionSize parameter that restricts the size of that partition and leaves the rest of main disk unmanaged.

(I operate a non-profit/community cluster with mostly donated/reused/recycled hardware, which are extremely heterogenous. I use Rook-Ceph, and consider we're all about using computational resources thriftily, it hurts that there's a lot of storage in EPHEMERALs that I can't take advantage of.)

@smira
Copy link
Member

smira commented Feb 15, 2024

We will publish the design document to an issue and link it once it's ready to the #8010 issue

@kenlasko
Copy link

I did see some tasks on the 1.7.0 release page relating to multiple partitions in #8010, but when I checked today they are gone. Looks like it won't make it into the 1.7.0 release.

@smira
Copy link
Member

smira commented Feb 29, 2024

Please follow #8367

Copy link

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 28, 2024
Copy link

github-actions bot commented Sep 2, 2024

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests