Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a nova tempest ceph job #685

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

SeanMooney
Copy link
Contributor

@SeanMooney SeanMooney commented Feb 15, 2024

This second job variant is trying to enable ceph for nova
and cinder storage.

Depends-On: openstack-k8s-operators/ci-framework#1515
Closes: OSPRH-94

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/a5de9450096445708d4df09ae0c9859f

✔️ nova-operator-content-provider SUCCESS in 52m 19s
nova-operator-tempest-multinode-ceph FAILURE in 25m 19s

@SeanMooney
Copy link
Contributor Author

check-rdo

.zuul.yaml Show resolved Hide resolved
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/d85037b367014d3797bcc293f96fc347

✔️ nova-operator-content-provider SUCCESS in 1h 59m 19s
nova-operator-tempest-multinode-ceph FAILURE in 19m 17s

.zuul.yaml Outdated
cifmw_extras:
- '@scenarios/centos-9/ci.yml'
- '@scenarios/centos-9/multinode-ci.yml'
- '@scenarios/centos-9/ceph_backends.yml'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note i do not want to use https://github.com/openstack-k8s-operators/ci-framework/blob/5c138d4f734e600cde2e78859da21ab9fa835d33/scenarios/centos-9/ceph_backends.yml direcrtly but this can be a path to a file in this repo

so ill refactor this in the next verion as i do not want to enable manilla and there
are other sideeffect to that file that we may not want.

so this is just a test to ensure we have a working baseline then ill create a vars file in this repo with just the bits we actully need/want

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/9de5dd8b2be04ee6a3db4314a3b15632

✔️ nova-operator-content-provider SUCCESS in 2h 43m 17s
nova-operator-tempest-multinode-ceph FAILURE in 2h 24m 10s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/2620dc92a4e44f80ac1c8ff181c848b8

✔️ nova-operator-content-provider SUCCESS in 2h 31m 56s
nova-operator-tempest-multinode-ceph FAILURE in 2h 10m 52s

@fultonj fultonj self-requested a review March 4, 2024 17:33
@SeanMooney
Copy link
Contributor Author

with openstack-k8s-operators/ci-framework@da6e49e this might actully pass some of the volume test as the contolplane part is now presnt

however i like need to incldue the new post_deploy hooks openstack-k8s-operators/ci-framework@da6e49e#diff-abdda9c1414e9e7da469561ed643339dc8b1604b76fdbe17a127b35f919304b3

post_deploy:

  • name: Kustomize OpenStack CR with Ceph
    type: playbook
    source: control_plane_ceph_backends.yml
  • name: Kustomize and update Control Plane
    type: playbook
    source: control_plane_kustomize_deploy.yml

so this might be close to working

.zuul.yaml Show resolved Hide resolved
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/332ac4e7240d48bca8a65c9d707ba3e6

✔️ nova-operator-content-provider SUCCESS in 2h 36m 18s
nova-operator-tempest-multinode-ceph FAILURE in 2h 16m 22s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/27646b26e0184307a5e83611e60cc3a4

✔️ nova-operator-content-provider SUCCESS in 2h 33m 58s
nova-operator-tempest-multinode-ceph FAILURE in 2h 13m 50s

@SeanMooney
Copy link
Contributor Author

check-rdo lets enable nova extra config to work.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/5a3fd5cc9e72400c83f342cec2706d4a

✔️ nova-operator-content-provider SUCCESS in 3h 06m 09s
nova-operator-tempest-multinode-ceph FAILURE in 2h 44m 45s

@SeanMooney
Copy link
Contributor Author

check-rdo

@SeanMooney SeanMooney changed the title [WIP] add a nova tempest ceph job add a nova tempest ceph job Apr 15, 2024
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/8b2e73f5ac36455daf91d7cdb304625b

✔️ nova-operator-content-provider SUCCESS in 53m 17s
nova-operator-tempest-multinode-ceph FAILURE in 32m 42s

@SeanMooney
Copy link
Contributor Author

check-rdo 6b896a47cbb1ef7aab7ee504af1e372a59d845fd has been reverted

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f4f3567bc0f546d89304fdd3c028c2ee

✔️ nova-operator-content-provider SUCCESS in 2h 09m 06s
nova-operator-tempest-multinode-ceph FAILURE in 1h 47m 55s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/cdf1b5db09744d2a8090ea5509480fbe

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 52m 35s
✔️ nova-operator-kuttl SUCCESS in 37m 57s
nova-operator-tempest-multinode FAILURE in 2h 05m 29s
nova-operator-tempest-multinode-ceph FAILURE in 2h 25m 52s

This second job variant is trying to enable ceph for nova
and cinder storage.
@SeanMooney
Copy link
Contributor Author

for now im going to disabel the failing ceph tests
the non ceph job failures looks like they might be a random failure due to ci stability issues

if they block migration tests fail again ill disable them temporally so we can move forward with ceph testing.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/e0a36aba795e47a7bd1b1cd3a900254d

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 39m 02s
nova-operator-kuttl MERGE_CONFLICT in 3s
nova-operator-tempest-multinode FAILURE in 1h 58m 07s
nova-operator-tempest-multinode-ceph FAILURE in 2h 10m 57s

@SeanMooney
Copy link
Contributor Author

SeanMooney commented Apr 18, 2024

so the normal job failed with

[instance: 6785c8c7-78ae-4dad-b70d-632626d34659] Live Migration failure: authentication failed: Failed to verify peer's certificate: libvirt.libvirtError: authentication failed: Failed to verify peer's certificate

looking at the config generated we have the change that disabel post-copy

https://logserver.rdoproject.org/85/685/b9abdc4b4cf699e7174aa15e7bc0092500403b48/github-check/nova-operator-tempest-multinode/46222bc/controller/ci-framework-data/logs/38.102.83.233/openstack/config/nova/02-nova-host-specific.conf

so that does not seam to be sufficent.

@olliewalsh do you think this would be fixed by openstack-k8s-operators/edpm-ansible#628
i suspect not but im not sure why this only happens when doing explicit block live migration however the same tests passed for you with that so it might?

@SeanMooney
Copy link
Contributor Author

check-rdo

@olliewalsh
Copy link

@olliewalsh do you think this would be fixed by openstack-k8s-operators/edpm-ansible#628 i suspect not but im not sure why this only happens when doing explicit block live migration however the same tests passed for you with that so it might?

Have the dataplane PR to set the CN & Subject in the cert? openstack-k8s-operators/dataplane-operator#827

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f9461755ab3a4b9aa3203449b020c554

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 22m 29s
✔️ nova-operator-kuttl SUCCESS in 37m 10s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 00m 13s
nova-operator-tempest-multinode-ceph FAILURE in 1h 59m 43s

@olliewalsh
Copy link

@olliewalsh do you think this would be fixed by openstack-k8s-operators/edpm-ansible#628 i suspect not but im not sure why this only happens when doing explicit block live migration however the same tests passed for you with that so it might?

Have the dataplane PR to set the CN & Subject in the cert? openstack-k8s-operators/dataplane-operator#827

Yea, certs look bad:

Don't see a CN or Subject in the cert:

$ certtool -i --infile clientcert.pem 
X.509 Certificate Information:
	Version: 3
	Serial Number (hex): 00c38b093e17d8ee8c4c9a4e00da09e180
	Issuer: CN=rootca-internal
	Validity:
		Not Before: Wed Apr 17 13:17:20 UTC 2024
		Not After: Thu Apr 17 13:17:20 UTC 2025
	Subject:
	Subject Public Key Algorithm: RSA
	Algorithm Security Level: Medium (2048 bits)
		Modulus (bits 2048):
			00:b8:1a:c7:22:86:00:be:d0:d8:db:e8:fe:76:7f:ad
			95:3d:69:b3:07:14:a2:c3:1f:3e:d9:85:83:bb:88:97
			7e:ca:1d:1f:2a:a9:f7:c6:a8:d0:66:5a:88:dc:6a:fe
			62:4e:6c:95:9f:50:8f:0a:ab:30:88:cb:bb:f4:01:ff
			10:a5:54:9b:03:38:c9:bb:15:3b:30:36:59:56:98:79
			fa:6c:3e:68:40:d5:d5:73:04:72:ae:89:57:d8:84:ee
			28:fd:bc:c3:f8:c5:5b:81:ab:ed:7b:d7:9b:6d:59:0e
			97:63:4c:99:ce:08:6b:c8:2e:d8:2f:9e:39:aa:9e:48
			80:b4:5e:7c:32:18:7b:35:41:3e:4e:b8:b7:15:d9:a8
			4e:94:bb:57:bf:66:7a:0c:15:70:c2:0d:87:23:06:32
			2d:e0:10:aa:10:56:26:81:ae:6b:09:55:07:17:e0:37
			de:9e:15:3c:9e:2d:16:08:37:35:b2:e8:7d:03:29:db
			4b:ae:97:90:bf:57:1a:09:85:b6:3d:40:6b:b8:8a:d4
			d5:c1:d6:40:96:c0:a4:42:c9:55:3c:cb:28:20:56:5b
			7d:3e:b5:a9:e3:f3:50:97:22:ce:84:a8:69:b1:06:06
			f9:17:b4:6d:18:92:88:51:59:48:ac:b9:9a:f5:56:bc
			0f
		Exponent (bits 24):
			01:00:01
	Extensions:
		Key Usage (critical):
			Digital signature.
			Key encipherment.
		Key Purpose (not critical):
			TLS WWW Server.
			TLS WWW Client.
		Basic Constraints (critical):
			Certificate Authority (CA): FALSE
		Authority Key Identifier (not critical):
			4d25f820bc0d8f9f5787203be57fa00a42d9a788
		Subject Alternative Name (critical):
			DNSname: compute-0.ci-rdo.local
			IPAddress: 192.168.122.100
	Signature Algorithm: ECDSA-SHA256
	Signature:
		30:44:02:20:39:82:f2:5a:3c:eb:21:64:03:5d:bf:0e
		15:77:ae:fc:aa:cb:50:e1:84:6c:34:2d:25:a0:c1:a3
		bc:82:76:e4:02:20:5c:47:ea:68:4f:28:c9:ff:33:3c
		86:75:39:c4:c8:27:55:66:6b:37:0d:16:7e:01:c2:64
		1a:c4:85:fb:b3:39
Other Information:
	Fingerprint:
		sha1:875acb5edd783c1983f0a1d86d6ad67c3426c979
		sha256:a9bab1b2644dbd07330e6b771a6be183deaf824811f8f2e666c8e4cfd9a1ea25
	Public Key ID:
		sha1:0db186f1441093d1975d3530b4d2a2ca13d53dc1
		sha256:bb9957b77a9612b536ee421f3dc848cfe0f6617cce512fdcb2cf43419d37bebc
	Public Key PIN:
		pin-sha256:u5lXt3qWErU27kIfPchIz+D2YXzOUS/css9DQZ03vrw=

-----BEGIN CERTIFICATE-----
MIICcTCCAhigAwIBAgIRAMOLCT4X2O6MTJpOANoJ4YAwCgYIKoZIzj0EAwIwGjEY
MBYGA1UEAxMPcm9vdGNhLWludGVybmFsMB4XDTI0MDQxNzEzMTcyMFoXDTI1MDQx
NzEzMTcyMFowADCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBALgaxyKG
AL7Q2Nvo/nZ/rZU9abMHFKLDHz7ZhYO7iJd+yh0fKqn3xqjQZlqI3Gr+Yk5slZ9Q
jwqrMIjLu/QB/xClVJsDOMm7FTswNllWmHn6bD5oQNXVcwRyrolX2ITuKP28w/jF
W4Gr7XvXm21ZDpdjTJnOCGvILtgvnjmqnkiAtF58Mhh7NUE+Tri3FdmoTpS7V79m
egwVcMINhyMGMi3gEKoQViaBrmsJVQcX4DfenhU8ni0WCDc1suh9AynbS66XkL9X
GgmFtj1Aa7iK1NXB1kCWwKRCyVU8yyggVlt9PrWp4/NQlyLOhKhpsQYG+Re0bRiS
iFFZSKy5mvVWvA8CAwEAAaOBjTCBijAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYw
FAYIKwYBBQUHAwEGCCsGAQUFBwMCMAwGA1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAU
TSX4ILwNj59XhyA75X+gCkLZp4gwKgYDVR0RAQH/BCAwHoIWY29tcHV0ZS0wLmNp
LXJkby5sb2NhbIcEwKh6ZDAKBggqhkjOPQQDAgNHADBEAiA5gvJaPOshZANdvw4V
d678qstQ4YRsNC0loMGjvIJ25AIgXEfqaE8oyf8zPIZ1OcTIJ1VmazcNFn4BwmQa
xIX7szk=
-----END CERTIFICATE-----

@SeanMooney
Copy link
Contributor Author

ok well the block migration tests passed on the last interaction
so i guess that picked up openstack-k8s-operators/dataplane-operator#827 so i just need to review the nova-operator-tempest-multinode-ceph failures but i
think the tls issues were resolved by merging openstack-k8s-operators/dataplane-operator#827 based on the last run

thanks for pointing that out

@SeanMooney
Copy link
Contributor Author

ok the ceph job failed because nova comptue could not start

�[1;35m/etc/ssh/ssh_known_hosts: no such file or directory�[0m

the service list does not appear to have the service that create the know host file

https://logserver.rdoproject.org/85/685/b9abdc4b4cf699e7174aa15e7bc0092500403b48/github-check/nova-operator-tempest-multinode-ceph/1771996/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackdataplanenodesets.dataplane.openstack.org/openstack-edpm-ipam.yaml

services:

  • repo-setup
  • bootstrap
  • configure-network
  • validate-network
  • install-os
  • ceph-hci-pre
  • configure-os
  • run-os
  • reboot-os
  • install-certs
  • ceph-client
  • ovn
  • neutron-metadata
  • libvirt
  • nova-custom-ceph
    tlsEnabled: true

so that should be a simple fix

@SeanMooney
Copy link
Contributor Author

check-rdo

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK to merge this but let's troubleshoot some of the disabled tests or at least document why they does not work

tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
tempest.scenario.test_stamp_pattern
tempest.api.compute.admin.test_live_migration.LiveMigrationTest.test_live_migration_with_trunk
tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_live_migration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this should work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should this was disabled before in the non ceph job.
i belive this was originally disabled because there were some issues with ovn and network isolation causing arp issues with tempest

i think that has been resolved but we have not removed this exclusion.

im planing to start enabling more tests in follow ups and i look into removing some of these exlutions

tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_mtu_sized_frames
tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes
crypt
tempest.api.compute.servers.test_server_actions.ServerActionsV293TestJSON.test_rebuild_volume_backed_server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the issue with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont recall.

it might have been an intermittent failure related to so of the other bugs i hit.

my guess those is we are disabling ServerStableDeviceRescue
that to me imples we are not configuring the tempest images with the hw_rescuce_bus image property
if i recall correctly BFV rebuild requires the stable device bus feature si this likely is failing because of that.

we have service_available.cinder false in the multi node jobs today so this test is not running there

hopefully it wont take long to enable ciner lvm in that but when i do i expect this to fail there too for the same reason as the ceph job so ill try and address this for both

Copy link
Contributor

openshift-ci bot commented Apr 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gibizer, SeanMooney

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit ee2905b into openstack-k8s-operators:main Apr 19, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants