Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firecracker incorrect clock #1869

Closed
jeromegn opened this issue Jun 9, 2023 · 9 comments · Fixed by #1873
Closed

Firecracker incorrect clock #1869

jeromegn opened this issue Jun 9, 2023 · 9 comments · Fixed by #1873

Comments

@jeromegn
Copy link

jeromegn commented Jun 9, 2023

I'm not sure how to debug this, there aren't many docs about firecracker usage, but the clock is very wrong when using firecracker.

It appears to be stuck on Aug 16th 2021, on boot. I can't figure out why exactly. Firecracker supports KVM clock (pv), but I don't know if nanos is detecting it.

❯ ops profile
Ops version: 0.1.37
Nanos version: 0.0
Qemu version: 8.0.0
Arch: linux
Virtualized: false
❯ firecracker --version
Firecracker v0.24.6

Using the C program defined here: #357 (comment)

I get:

❯ ops build hello -c ops.json --ip-address 10.0.2.2 --type firecracker && sudo rm /tmp/firecracker.socket; sudo firecracker --config-file fc.json --api-sock /tmp/firecracker.socket
 100% |████████████████████████████████████████|         
Bootable image file:/home/jerome/.ops/images/hello
warning: ACPI MADT not found, default to 1 processor
en1: assigned 10.0.2.2
Boot time   = Mon Aug 16 00:00:00 2021

Monotonic Time  = Thu Jan  1 00:00:00 1970
Current (boot)  = 1629072000
Current (mono)  = 0
Current time    = Mon Aug 16 00:00:00 2021
Real Time       = Mon Aug 16 00:00:00 2021

en1: assigned FE80::A8FC:FF:FE00:1
Monotonic Time  = Thu Jan  1 00:00:02 1970
Current (boot)  = 1629072000
Current (mono)  = 2
Current time    = Mon Aug 16 00:00:02 2021
Real Time       = Mon Aug 16 00:00:02 2021

Monotonic Time  = Thu Jan  1 00:00:04 1970
Current (boot)  = 1629072000
Current (mono)  = 4
Current time    = Mon Aug 16 00:00:04 2021
Real Time       = Mon Aug 16 00:00:04 2021

Monotonic Time  = Thu Jan  1 00:00:06 1970
Current (boot)  = 1629072000
Current (mono)  = 6
Current time    = Mon Aug 16 00:00:06 2021
Real Time       = Mon Aug 16 00:00:06 2021
fc.json
{
    "boot-source": {
      "kernel_image_path": "/home/jerome/.ops/0.1.43/kernel.img",
      "boot_args": "console=ttyS0 reboot=k panic=1 pci=off random.trust_cpu=on"
    },
    "drives": [
      {
        "drive_id": "rootfs",
        "path_on_host": "/home/jerome/.ops/images/hello",
        "is_root_device": true,
        "is_read_only": false
      }
    ],
    "network-interfaces": [
      {
        "iface_id": "eth0",
        "guest_mac": "AA:FC:00:00:00:01",
        "host_dev_name": "tap0"
      }
    ],
    "machine-config": {
      "vcpu_count": 1,
      "mem_size_mib": 512,
      "ht_enabled": false
    },
    "logger": {
        "log_path": "logs.fifo",
        "level": "info"
      }
  }
ops.json
{
    "Env": {
      "PORT": "8080"
    },
    "Klibs": ["ntp.dbg"],
    "ManifestPassthrough": {
      "ntp_servers": ["0.us.pool.ntp.org:123"],
      "ntp_poll_min": "4",
      "ntp_poll_max": "6"
    }
  }

I'm only trying the ntp klib as an attempt to fix it, but that doesn't help either.

A few questions:

  • Is there a way to debug this in a firecracker environment? I couldn't get any of the debugging examples to work for me in there.
  • What is the list of supported boot_args for the kernel?
@eyberg
Copy link
Contributor

eyberg commented Jun 9, 2023

can confirm what you're seeing; we'll take a look

as for boot_args I wasn't aware we were actually honoring those but I guess we are in some cases

@jeromegn
Copy link
Author

Thanks!

For completeness, I also tried with firecracker 1.3.3 and the latest cloud-hypervisor w/ the same result.

@rinor
Copy link
Contributor

rinor commented Jun 12, 2023

@jeromegn while not sure where that date-time comes from and why (both fc and ch), but if you want to test/use ntp klib, you need to have the application running for a bit longer, since it takes about 53-54 seconds for the ntp klib to collect samples and apply the clock changes.

From tests some time ago, on my env tests,

  • on firecracker the ntp didn't work, it looped here (that condition never became false) and system frooze
    while (rtc_gettimeofday() != seconds) {
  • on cloud-hypervisor it should work

@jeromegn
Copy link
Author

@rinor interesting. I don't think I tried the cloud-hypervisor + ntp combo. Even if that works, ~1 minute to become time-synced is way too long. During that time, no TLS handshakes can happen (or maybe no TLS handshakes using certificates only valid after a date later than August 16th 2021).

I did notice a freeze when using firecracker, but didn't link that to ntp.

francescolavra added a commit that referenced this issue Jun 14, 2023
Nanos can run on platforms where the RTC peripheral is read-only
(e.g. riscv-virt), is not initialized with the system time from the
host (e.g. cloud-hypervisor), or is not available at all (e.g.
AWS Firecracker). This change set allows the system time to be
retrieved during boot from an alternative source (e.g. the wall
clock time made available by the paravirtualized clock), and allows
changing the system time even when the RTC peripheral is
unavailable (in this case, the time update will not persist across
a VM reset).
The register_platform_clock_now() function now takes an additional
parameter with the value of the offset between the system time and
the monotonic time: if such offset is available (i.e. non-zero), it
is used to initialize the rtc_offset field of the VDSO structure,
otherwise the kernel falls back to trying to derive the offset from
the RTC time. On platforms with the pvclock peripheral, the RTC
offset is available via the wall clock time.
The x86 RTC driver is being amended to be able to handle cases
where the RTC is not available: in these cases, rtc_gettimeofday()
returns 0, and rtc_settimeofday() is a no-op.

Closes #1869
@francescolavra
Copy link
Member

francescolavra commented Jun 14, 2023

#1873 fixes the incorrect clock issue on both Firecracker and cloud-hypervisor, and also fixes the kernel freeze when using Firecracker with the ntp klib. If anyone wants to try it out without waiting for it to be merged in the master branch, you can do so by adding the --nanos-version f0fe064 flag to the Ops command line to create an image
@jeromegn as for your questions:

  • firecracker does not have an embedded gdb server, so it's not possible to debug directly the Nanos kernel with gdb like we do with Qemu; a gdb server implementation has been proposed in Gdb server #2168 firecracker-microvm/firecracker#2333 a couple of years ago, but it never got merged
  • at the moment, the only boot_args we support are those inserted by firecracker to describe the topology of virtio_mmio devices (such as the disk and network interfaces); we don't support user-specified args such as in your example "console=ttyS0 reboot=k panic=1 pci=off random.trust_cpu=on"

@jeromegn
Copy link
Author

Thank you! I should be able to continue my research into nanos with this. I'll try it out later today.

If there's no debugging server available, would there be a way to output verbose logs at runtime (especially boot time) from the nanos kernel? Somewhat like the debug Linux kernel boot arg.

Concerning boot args: I added the trust_cpu one, but wasn't sure it would do anything. The rest of the args was taken from the ops docs for firecracker.

@francescolavra
Copy link
Member

Nanos supports various tracing options to output verbose logs at runtime, see https://docs.ops.city/ops/configuration and specifically the trace, ltrace and NoTrace sections. But most of this output results from the interaction between the kernel and the user program, so does not cover what happens during boot before the user program begins execution. To get verbose logs at boot time one has to recompile the kernel after adding *_DEBUG preprocessor defines in various source files (e.g. uncommenting the INIT_DEBUG line at

//#define INIT_DEBUG
).

@jeromegn
Copy link
Author

at the moment, the only boot_args we support are those inserted by firecracker to describe the topology of virtio_mmio devices (such as the disk and network interfaces)

That made me wonder if there's support for vsock devices? I thought there wasn't, but I noticed there's 1 reference to "VSOCK" in this codebase. However, trying to bind w/ a AF_VSOCK family socket does not work. Maybe I'm just missing a vsock device config in the boot arg?

@francescolavra
Copy link
Member

No, vsock devices are not supported yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants