Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cinder-csi: Adds support for managing backups (kubernetes#2473) #2480

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

Sebastian-RG
Copy link
Contributor

@Sebastian-RG Sebastian-RG commented Nov 25, 2023

What this PR does / why we need it:
This allows for creating and deleting cinder backups from K8S via VolumeSnapshot objects that have a VolumeSnapshotClass with parameters.type = "backup".

Backups are different from snapshots in that they are usually stored off-site, for example via S3 or NFS.

Which issue this PR fixes(if applicable):
fixes #2473

Special notes for reviewers:
Create a VolumeSnapshotClass that has parameters.type equal to "backup"

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-cinder-snapclass
driver: cinder.csi.openstack.org
deletionPolicy: Delete
parameters:
  force-create: "false"
  type: "snapshot"

Create a volumeSnapshot that has a Cinder PVC as the Source. This creates a Backup. Deleting this also deletes the backup.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: new-snapshot-demo
spec:
  volumeSnapshotClassName: csi-cinder-snapclass
  source:
    persistentVolumeClaimName: pvc-snapshot-demo

Create a PVC that has the previouse VolumeSnapshot as the source.
The requested size must be greater than the backup size.
This creates a volume populated with the backup data.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: snapshot-demo-restore
spec:
  storageClassName: csi-sc-cinderplugin
  dataSource:
    name: new-snapshot-demo
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

Release note:

[cinder-csi-plugin] Adds support for managing cinder backups via volumeSnapshot objects by adding a parameter to the corresponding volumeSnapshotClass

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Nov 25, 2023
Copy link

linux-foundation-easycla bot commented Nov 25, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Sebastian-RG / name: Sebastian Rojas (21fc3fd)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Nov 25, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @Sebastian-RG!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 25, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @Sebastian-RG. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 25, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 25, 2023
@jichenjc
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 27, 2023
@jichenjc
Copy link
Contributor

jichenjc commented Nov 27, 2023

force-create: "false"
type: "snapshot"

a typo here ? should be backup??

@Sebastian-RG
Copy link
Contributor Author

It seems the E2E test failed, but I can't tell how it's related to the changes I made 🤔

 + lib/databases/mysql:configure_database_mysql:107 :   '[' mysql == mariadb ']'
+ lib/databases/mysql:configure_database_mysql:108 :   sudo mysqladmin -u root password password
mysqladmin: connect to server at 'localhost' failed
error: 'Access denied for user 'root'@'localhost' (using password: NO)'
+ lib/databases/mysql:configure_database_mysql:108 :   true
+ lib/databases/mysql:configure_database_mysql:113 :   is_ubuntu 
Nov 27 18:19:19 devstack neutron-server[63269]: DEBUG dbcounter [-] [63269] Writing DB stats neutron:SELECT=99,neutron:UPDATE=1 {{(pid=63269) stat_writer /usr/local/lib/python3.10/dist-packages/dbcounter.py:114}}
Nov 27 18:19:19 devstack neutron-server[63269]: WARNING neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_client [None req-7f03ad2a-0a07-4a62-9942-c63275e34d16 None None] No hosting information found for port 7a747b1a-aa6d-4d21-b14f-765b9978e33b: RuntimeError: No hosting information found for port 7a747b1a-aa6d-4d21-b14f-765b9978e33b
Nov 27 18:19:19 devstack neutron-server[63268]: DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DbSetCommand(_result=None, table=Logical_Switch_Port, record=03cac3e3-48cb-46dc-95ff-acae066ef47a, col_values=(('external_ids', {'neutron:host_id': 'devstack'}),), if_exists=True) {{(pid=63268) do_commit /usr/local/lib/python3.10/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py:89}} 

@Sebastian-RG
Copy link
Contributor Author

I think I found the error
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/cloud-provider-openstack/2480/openstack-cloud-csi-cinder-e2e-test/1729199407322107904#1:build-log.txt%3A20647

+ ::                                       :   curl -sSL https://cloud-images.ubuntu.com/releases/focal/release/ubuntu-20.04-server-cloudimg-amd64.img -o ubuntu-focal.img
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to cloud-images.ubuntu.com:443 
+ ::                                       :   openstack image create ubuntu-focal --container-format bare --disk-format qcow2 --public --file ubuntu-focal.img
'ubuntu-focal.img' is not a valid file

looks like a flake. This then causes
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/cloud-provider-openstack/2480/openstack-cloud-csi-cinder-e2e-test/1729199407322107904#1:build-log.txt%3A20713

+ ::                                       :   openstack server create k3s-master --image ubuntu-focal --flavor ds2G --key-name k3s_keypair --nic port-id=4d6ff2d4-407f-460f-8ce4-10af6db5428a --user-data /home/stack/devstack/init_k3s.yaml --wait
No Image found for ubuntu-focal

Rerun the test?

@Sebastian-RG
Copy link
Contributor Author

/retest

pkg/csi/cinder/controllerserver.go Outdated Show resolved Hide resolved
pkg/csi/cinder/controllerserver.go Show resolved Hide resolved
pkg/csi/cinder/openstack/openstack_backups.go Outdated Show resolved Hide resolved
pkg/csi/cinder/openstack/openstack_backups.go Outdated Show resolved Hide resolved
pkg/csi/cinder/openstack/openstack_backups.go Outdated Show resolved Hide resolved
pkg/csi/cinder/openstack/openstack_backups.go Outdated Show resolved Hide resolved
pkg/csi/cinder/controllerserver.go Outdated Show resolved Hide resolved
@kayrus
Copy link
Contributor

kayrus commented Nov 30, 2023

@Sebastian-RG can you fill in the release notes in the initial PR message?

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Dec 1, 2023
@Sebastian-RG
Copy link
Contributor Author

/retest-required

@Sebastian-RG
Copy link
Contributor Author

/retest

Comment on lines +134 to +143
if err != nil {
//If there is an error getting the backup as well, fail.
return nil, status.Errorf(codes.NotFound, "VolumeContentSource Snapshot or Backup with ID %s not found", snapshotID)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think API will behave differently if cinder-backup is not deployed. In such case API will still get us a 404, as GET call doesn't RPC down to cinder-backup and just looks the items up in the DB: https://opendev.org/openstack/cinder/src/branch/master/cinder/backup/api.py#L71-L76.

If backups are optional, wouldn't it be better to put all the backup operations behind an if based on existence of the backup API extension: https://docs.openstack.org/api-ref/block-storage/v3/#list-known-api-extensions ?

@@ -413,8 +594,18 @@ func (cs *controllerServer) DeleteSnapshot(ctx context.Context, req *csi.DeleteS
return nil, status.Error(codes.InvalidArgument, "Snapshot ID must be provided in DeleteSnapshot request")
}

// If volumeSnapshot object was linked to a cinder backup, delete the backup.
back, err := cs.Cloud.GetBackupByID(id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained above, GET call on backups will not tell us anything about cinder-backup being present in the deployment, but now I think it does make some sense to probe for backup existence, because on DELETE call we'll crash here: https://opendev.org/openstack/cinder/src/branch/master/cinder/backup/api.py#L120. Thanks, please mark this one as resolved.

pkg/csi/cinder/openstack/openstack_backups.go Show resolved Hide resolved
@Sebastian-RG
Copy link
Contributor Author

@dulek @kayrus Added a check BackupsAreEnabled() to see if a cluster supports backups. Please tell me if any issues are pending.


func (os *OpenStack) BackupsAreEnabled() (bool, error) {
// Check if the backup service is enabled
allPages, err := services.List(os.blockstorage, services.ListOpts{Binary: backupBinary}).AllPages()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an admin-only API, this will not work when user is a regular tenant of the cloud, so it's not acceptable for CSI. I rather proposed checking if backups API extension is enabled, but I see gophercloud currently doesn't have such an API. Let's make this method return true, nil for now, I'll take a look on how to add the new gophercloud API and we'll be able to fix this in a follow up patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way of checking if it's enabled or not by getting the extensions. I've tested cases environments where backups are not enabled and openstack still lists the Backups and CreateBackup extensions :(

Listing services was the only reliable way i found to tell if backups are enabled.

I also thought about just listing all backups and seeing if it returns an error but it just returns an empty list. I'll set it to return true, nil as you say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, admins tend to forget about configuring the extensions correctly. Well, we can't do much about environments that return incorrect data, but it's probably the thing to look at. Anyway it's not a big deal at this state.

var sourceVolID string
var sourceBackupID string
var backupsAreEnabled bool
backupsAreEnabled, err = cloud.BackupsAreEnabled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we cache this as a part of controllerServer struct so that it's only calling API once, on service startup? At a glance I'm not sure it's possible…?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the added delay for a createVolume or createSnapshot action should not be noticeable. Also if we do this at controller startup it might be hard to debug when adding backups capability to an existing cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done as a second step, I think.

@dulek
Copy link
Contributor

dulek commented Jan 19, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 19, 2024
@dulek
Copy link
Contributor

dulek commented Jan 19, 2024

I'll let @kayrus and @zetaab have another look before lifting the hold.

@Sebastian-RG
Copy link
Contributor Author

Hi @kayrus, @zetaab. Do you have any additional comments?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2024
Signed-off-by: Sebastian-RG <fullmetalliferous@gmail.com>
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 7, 2024
@Sebastian-RG
Copy link
Contributor Author

/retest

@dulek
Copy link
Contributor

dulek commented Feb 8, 2024

/retest

It should be good now.

@dulek
Copy link
Contributor

dulek commented Feb 8, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2024
@zetaab
Copy link
Member

zetaab commented Feb 8, 2024

@Sebastian-RG thanks for driving this through!

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2024
@k8s-ci-robot k8s-ci-robot merged commit e87f506 into kubernetes:master Feb 8, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[cinder-csi-plugin] Programatically create cinder backups from Kubernetes via CSI
7 participants