Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix GCP plugin] Introduce public GCE Bootstrap Image #1

Closed

Conversation

tewfik-ghariani
Copy link
Contributor

@tewfik-ghariani tewfik-ghariani commented Mar 18, 2020

Background

After reading some docs related to running NixOs on GCE, I've seen that we currently have to bootstrap our own image from an object publicly shared in GS according to this : https://nixos.wiki/wiki/Install_NixOS_on_GCE

And at this time, the nixops-gcp plugin is deemed broken specifically due to this particular 'bootstrap-image' resource

Proposed Solution

However, according to the GCP docs, we can actually just bake some images in a given GCP account, and then make the images public.
https://cloud.google.com/compute/docs/images/managing-access-custom-images#share-images-publicly

Even better, we may create image families to allow better flexibility to manage single images. No need to specify the machine name per say but just the family. And under the hood, the maintainer may deprecate old ones and keep the 'latest' up-to-date.

Validation

I started by testing the whole scenario using gcloud commands and it worked like a charm!

Building an image from source

$ gcloud compute images create nixos-18091228a4c4cbb613c-x86-64-linux  \
              --source-uri gs://nixos-cloud-images/nixos-image-18.09.1228.a4c4cbb613c-x86_64-linux.raw.tar.gz \
              --family=nixos-1809

Making the image public

$ gcloud compute images  add-iam-policy-binding nixos-18091228a4c4cbb613c-x86-64-linux \
             --member='allAuthenticatedUsers'                                                                              \
             --role='roles/compute.imageUser' 

And then that image may be used publicly so that all nixops users won't have the need to provision their own 'bootstrap-image' resource for every deployment.

$ gcloud compute instances create test-nixos-18  \
            --image-family=nixos-1809                        \
            --zone=europe-west1-c                             \
            --image-project=predictix-operations

Implementation

Implemented the change described above in this codebase and the gce plugins should be working fine now.

The way it works

  • Extract the public image based on family name and project
  • Create a volume out of that image [ No need to copy it in every deployment]
  • Proceed naturally and use that volume as bootDisk

A summary of what has been achieved overall

  • Deprecated the automatic provisioning of gce-image resource.
  • Introduced bootstrap images as public image families.
  • Added 'publicImageProject' option for blockDeviceMapping and gceDisk to specify the project from which publicly available image families are retrieved from.
  • Controlled the exceptions and error messages

Small Additions

  • Display GCE volume size in 'nixops-info'
  • Remove Uniqueness condition on fileSystemsOption
  • Update some GCP documentation links

ToDo

  • Update the 'image-family' to point to the 'gce-images.nix' file under nixpkgs
  image-family = import ./gce-images.nix;
  # To be changed to
  # image-family = import <nixpkgs/nixos/modules/virtualisation/gce-images.nix>;

Please let me know what do you think about this. If you have any suggestions or recommendations, feel free to share them.
cc @PsyanticY @AmineChikhaoui

cc @rbvermaa Can we consider this as part of NixOS/nixpkgs#6991

@@ -193,7 +200,7 @@ let
options = {
gce = mkOption {
default = null;
type = with types; uniq (nullOr (submodule gceDiskOptions));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove uniq ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that we don't find ourselves in a situation where we have to re-define all options
Example
gce.nix

      fileSystems."/data" = {
        fsType = "xfs";
        options = [ "noatime" "nodiratime" ];
        autoFormat = true;
        formatOptions = "-f";
        gce.disk = null;
        gce.disk_name = null;
        gce.size = lib.mkDefault 500;
        gce.diskType = "ssd";
        gce.encrypt = encrypt;
      };

partitions-gce.nix

 {
   fileSystems."/data".gce.disk      = lib.mkForce resources.gceDisks."${machine}-data";
   fileSystems."/data".gce.disk_name = lib.mkForce "data";
 } 

This error is raised

error: The unique option `fileSystems./data.gce' is defined multiple times, in `<unknown-file>' and `/nixops/partitions-gce.nix'.
(use '--show-trace' to show detailed location information)

Solution

      fileSystems."/data".gce = lib.mkOverride 10
      {
        disk      = resources.gceDisks."${machine}-data";
        disk_name = "data";
        size      = volumeSize;
        diskType  = "ssd";
        encrypt   = encrypt;
      };

"18.09" = "nixos-1809";

latest = self."18.09";
project = "predictix-operations";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated to a "trusted" upstream like nixos/nix-community.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, added the steps to prepare the images in the description :

Building an image from source

$ gcloud compute images create nixos-18091228a4c4cbb613c-x86-64-linux  \
              --source-uri gs://nixos-cloud-images/nixos-image-18.09.1228.a4c4cbb613c-x86_64-linux.raw.tar.gz \
              --family=nixos-1809

Making the image public

$ gcloud compute images  add-iam-policy-binding nixos-18091228a4c4cbb613c-x86-64-linux \
             --member='allAuthenticatedUsers'                                                                              \
             --role='roles/compute.imageUser'  

@AmineChikhaoui should be familiar with these operations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update the create-gce.sh script in nixpkgs to add those steps. Thanks for researching this !
@adisbladis yeah we'll have to host this under a NixOS account. I'm looking into that. Also we're missing a couple of releases in GCP images, so will start first with 20.03 and see if it works.

@@ -66,7 +66,8 @@ def parse_block_device(xml):
'disk': self.get_option_value(xml, 'disk', 'resource', optional = True),
'disk_name': opt_disk_name(self.get_option_value(xml, 'disk_name', str, optional = True)),
'snapshot': self.get_option_value(xml, 'snapshot', str, optional = True),
'image': self.get_option_value(xml, 'image', 'resource', optional = True),
'image': self.get_option_value(xml, 'image', str, optional = True),
'publicImageProject' : self.get_option_value(xml, 'publicImageProject', str, optional = True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is None(null)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be optional because in case it has value of None, the old method would be used consisting in creating the disk from our own bootstrap image

                if v['publicImageProject']:
                    try:
                        img = self.connect().ex_get_image_from_family(
                                  image_family=v['image'],
                                  ex_project_list=[v['publicImageProject']],
                                  ex_standard_projects=False,
                              )
               .......................
                .........................
                else:
                    img = v['image']

@@ -211,6 +218,12 @@ let
};
};

nixosVersion = builtins.substring 0 5 (config.system.nixos.version or config.system.nixosVersion);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting this is very similar to how nixops-aws works: https://github.com/NixOS/nixops-aws/blob/master/nix/ec2.nix#L166

@adisbladis
Copy link
Member

@AmineChikhaoui Could you check this PR?

@@ -0,0 +1,7 @@
let self = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have this file in nixpkgs somhow like ec2-amis.nix and export an attrs:

{ imageFamily = "nixos-20-03";
  project = "nixos-org";
}

So that updates of images are automatic and without changes to nixops-gce.

nix/gce.nix Outdated

image-family = import ./gce-images.nix;
# To be changed to
# image-family = import <nixpkgs/nixos/modules/virtualisation/gce-images.nix>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you made a comment about it already :)
nitpick: image-family -> imageFamily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done fd88dea , 3322656

adisbladis and others added 23 commits June 1, 2020 13:31
The imageOptions contains the following :
 - image : either image-resource or the name of an existing image
 - family : the image family to be used
 - project : the parent project containing the image/imageFamily
Note that we have to specify either name or family.
Default = null
Then adjusted the image option to inherit from the imageOptions submodule in
- gceDiskOptions in gce.nix
- gce-disk.nix
Finally, updated the bootstrapImage gce option accordingly
- Adding config as GCEMachinesOptions in GCEDefinition
- Updating GCEDefinition attributes by specifying gce as parent attribute
- adding the scheduling options
- Correcting the instanceServiceAccount attrs ( email and scopes )
- nixosRelease as attr
- Using disk name while attaching a GCE Disk
- fileSystems backend option made optional
- Finally, adding a temporary check over gcp_common options
Add mypy types & compat with latest nixops master
A somehow trivial example of frontend & backend machines
alongside their respectively attached frontend-volume and
backend-volume GCE disks
+ updating poetry.lock
The imageOptions contains the following :
 - image : either image-resource or the name of an existing image
 - family : the image family to be used
 - project : the parent project containing the image/imageFamily
Note that we have to specify either name or family.
Default = null
Then adjusted the image option to inherit from the imageOptions submodule in
- gceDiskOptions in gce.nix
- gce-disk.nix
Finally, updated the bootstrapImage gce option accordingly
Moving out the method to retrieve image in gcp_common
Enhancing it and adding a couple of exceptions
Making it available for both gce.py and gce_disk.py
Adding some assertions in gce.nix and gce-disk.nix
Removal of publicImageProject attribute
Usage of ImageOptions class
Upgrading libcloud
ignoring libcloud mypy annotations
@tewfik-ghariani tewfik-ghariani marked this pull request as ready for review June 16, 2020 14:59
@tewfik-ghariani
Copy link
Contributor Author

Hello again :))

As a result of the meeting with @AmineChikhaoui and based on his remarks, updated the code in a way to make it more intuitive to create root disks or separate disks from a public image.

The usage syntax shall be as simple as the following :

machine = {
  deployment.gce = {
    bootstrapImage = {    
      name = "base-image-bootstrap";
      family = null;
      project = "nixos-org";
    };
  };
}
resources.gceDisks.main-volume = {
  image = {
    name = null;
    family = "super-family";
    project = "another-project";
  };
};

I've completed my work based on the new changes related to types and options per #7

Change log

  • Added an example of Machines + GCE Disks.
  • Introduced imageOptions and defined it as submodule.

The imageOptions contains the following :

  • image : either image-resource or the name of an existing image
  • family : the image family to be used
  • project : the parent project containing the image/imageFamily

Note that we have to specify either name or family.
Default value = {}
Then adjusted the image option to inherit from the imageOptions submodule in

  • gceDiskOptions in gce.nix
  • gce-disk.nix
    Finally, updated the bootstrapImage gce option accordingly
  • Created a new common retrieve_gce_image method to fetch the GCENodeImage object.
  • Removed the publicImageProject attribute
  • Upgraded libcloud
  • Ignored libcloud mypy annotations

Full list of changes : master...tewfik-ghariani:gcp-plugin

cc @AmineChikhaoui , @adisbladis can you review please? Or do you prefer if I create a new PR?

@tewfik-ghariani
Copy link
Contributor Author

For some reason, the changes are no longer visible and the commit log has gotten confusing. Maybe it's because the second PR containing the original changes was already merged and this branch was re-based of master

In any case, I will be closing this PR and creating a new one

Cheers :))

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixops-flake-gce/22355/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants