Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM platform]Remote snapshotter test failed on pulling image due to microVM time out of sync #737

Open
fangn2 opened this issue Apr 7, 2023 · 2 comments
Labels
kind/bug Something isn't working

Comments

@fangn2
Copy link
Contributor

fangn2 commented Apr 7, 2023

On ARM64 platform, remote snapshotter test failed on pulling image due to microVM time out of sync.
Failed CI build link.
When CI tried to run test TestSnapshotterMetrics_Isolated which will pull an image and unpack in remote snapshotter, it failed with below issue on pulling image(the image pulling fine on ARM host with ctr).

=== RUN   TestSnapshotterMetrics_Isolated
    metrics_integ_test.go:54:
        	Error Trace:	metrics_integ_test.go:54
        	Error:      	Received unexpected error:
        	            	Failed to pull image on microVM[0]: failed to extract layer sha256:fdb3c0ecba2ee0b2b39f778f7da3beb4ee4c75f6f4b8a083211b12971fde4ad6: failed to mount /var/lib/containerd/tmpmounts/containerd-mount3440161598: no such file or directory: unknown
        	Test:       	TestSnapshotterMetrics_Isolated

Checked the containerd.log for the test, found that there is Error x509: certificate has expired or is not yet valid: current time 2022-08-07T13:25:13Z is before 2023-02-21T00:00:00Z seems the microVM time is not synced right.

{\"key\":\"0/1/extract-101125884-ByGC sha256:fdb3c0ecba2ee0b2b39f778f7da3beb4ee4c75f6f4b8a083211b12971fde4ad6\",\"level\":\"info\",\"mountpoint\":\"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/1/fs\",\"msg\":\"Received status code: 401 Unauthorized. Refreshing creds...\",\"parent\":\"\",\"src\":\"ghcr.io/firecracker-microvm/firecracker-containerd/amazonlinux:latest-esgz/sha256:efc8b66d208d6eaa2e24799081e13c035f25ad585cec5d478845a744f98324b8\",\"time\":\"2022-08-07T13:25:13.491954298Z\"}" jailer=noop runtime=aws.firecracker vmID=0 vmm_stream=stdout
time="2023-04-06T18:05:06.129064764Z" level=debug msg="[    5.441207] containerd-stargz-grpc[744]: {\"error\":\"Get \\\"https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:efc8b66d208d6eaa2e24799081e13c035f25ad585cec5d478845a744f98324b8?se=2023-04-06T18%3A15%3A00Z\\u0026sig=8ccbSSbYFbIlb%2Fr9eh5ptZlKm5W1pkw50WfKqVLTY4A%3D\\u0026sp=r\\u0026spr=https\\u0026sr=b\\u0026sv=2019-12-12\\\": x509: certificate has expired or is not yet valid: current time 2022-08-07T13:25:13Z is before 2023-02-21T00:00:00Z\"

Tried to set the microVM time in the commit, but didn’t make effect, rerun still had old time.
Strange part is AMD platform build has the right time in MicroVM.

@swagatbora90 swagatbora90 added the kind/bug Something isn't working label Apr 10, 2023
@BinSquare
Copy link

Guest kernels needs to be compiled with KVM_PTP support as a mechanism for clock sync.

CONFIG_PTP_1588_CLOCK=y
CONFIG_PTP_1588_CLOCK_KVM=y

We can see that all the arm microvm kernel configs are missing CONFIG_PTP_1588_CLOCK_KVM=y incomparison to all the x86 configs. This discrepancy is due to 4.14 arm64 missing the feature which has been upstreamed since 5.3, good discussion here as experienced by kata-containers: kata-containers/packaging#693

We can see that the CI build logs indicate that it failed & was using the 4.14 as well.

default-vmlinux.bin: OK
--
  | chmod 0400 default-vmlinux.bin
  | _submodules/firecracker/tools/devtool -y build_kernel --config tools/kernel-configs/microvm-kernel-aarch64-4.14.config

The solution for this issue needs 2 parts:

  1. guest kernel configs need the missing property
  2. whatever kernel we choose needs to have ptp_kvm commit: https://github.com/torvalds/linux/blob/16a8829130ca22666ac6236178a6233208d425c3/Documentation/virt/kvm/arm/ptp_kvm.rst#L4

@fangn2
Copy link
Contributor Author

fangn2 commented May 10, 2023

@BinSquare Thanks for taking a look at the issue.
The two parts solution makes sense to me. I did try compiling the kernel with

CONFIG_PTP_1588_CLOCK=y
CONFIG_PTP_1588_CLOCK_KVM=y

But since we don't have the ptp_kvm patch, the change did not make any difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants