Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Firecracker cannot shut down successfully without API #4176

Closed
3 tasks done
mkroening opened this issue Oct 15, 2023 · 15 comments
Closed
3 tasks done

[Bug] Firecracker cannot shut down successfully without API #4176

mkroening opened this issue Oct 15, 2023 · 15 comments
Assignees

Comments

@mkroening
Copy link

mkroening commented Oct 15, 2023

Describe the bug

When running Firecracker without API, it prints Error: RunWithoutApiError(Shutdown(Ok)) and returns with a non-zero exit code, although the kernel shut down cleanly, which is also indicated by the error message.

To Reproduce

Launch any kernel without API. This has been tested with Hermit. It should work the same with any other kernel.

  1. Create firecracker-config.json:

    {
      "boot-source": {
        "kernel_image_path": "rusty-loader-x86_64-fc",
        "initrd_path": "hello_world",
        "boot_args": ""
      },
      "drives": [],
      "machine-config": {
        "vcpu_count": 1,
        "mem_size_mib": 256,
        "smt": false
      }
    }
    
  2. Launch Firecracker:

    firecracker --no-api --config-file firecracker-config.json --log-path firecracker.log --level Info --show-level --show-log-origin
    
  3. Compare output and logs from 1.4.1 with 1.5.0:

    • 1.4.1 exit code: 0

    • 1.5.0 exit code: 1

    • 1.4.1 output:

      [kernel messages]
      
    • 1.5.0 output:

      2023-10-15T22:41:46.865613581 [anonymous-instance:main] Running Firecracker v1.5.0
      [kernel messages]
      Error: RunWithoutApiError(Shutdown(Ok))
      
    • 1.4.1 log:

      Running Firecracker v1.4.1
      2023-10-15T22:41:27.341970753 [anonymous-instance:main:INFO:src/vmm/src/device_manager/mmio.rs:393] Artificially kick devices.
      2023-10-15T22:41:27.342026016 [anonymous-instance:main:INFO:src/firecracker/src/main.rs:509] Successfully started microvm that was configured from one single json
      2023-10-15T22:41:27.367335451 [anonymous-instance:fc_vcpu 0:INFO:src/vmm/src/vstate/vcpu/mod.rs:497] Received KVM_EXIT_SHUTDOWN signal
      2023-10-15T22:41:27.367358340 [anonymous-instance:main:INFO:src/vmm/src/lib.rs:842] Vmm is stopping.
      2023-10-15T22:41:27.367426447 [anonymous-instance:main:INFO:src/vmm/src/lib.rs:842] Vmm is stopping.
      
    • 1.5.0 log:

      2023-10-15T22:41:46.865746248 [anonymous-instance:main:INFO:src/vmm/src/logger/logging.rs:93] Running Firecracker v0.1.0
      2023-10-15T22:41:46.876801479 [anonymous-instance:main:INFO:src/vmm/src/device_manager/mmio.rs:392] Artificially kick devices.
      2023-10-15T22:41:46.876843674 [anonymous-instance:main:INFO:src/firecracker/src/main.rs:581] Successfully started microvm that was configured from one single json
      2023-10-15T22:41:46.903474283 [anonymous-instance:fc_vcpu 0:INFO:src/vmm/src/vstate/vcpu/mod.rs:470] Received KVM_EXIT_SHUTDOWN signal
      2023-10-15T22:41:46.903502831 [anonymous-instance:main:INFO:src/vmm/src/lib.rs:806] Vmm is stopping.
      2023-10-15T22:41:46.903586030 [anonymous-instance:main:INFO:src/vmm/src/lib.rs:806] Vmm is stopping.
      2023-10-15T22:41:46.909890194 [anonymous-instance:main:ERROR:src/firecracker/src/main.rs:96] RunWithoutApiError error: MicroVMStopped without an error: Ok
      

Expected behaviour

Return a zero exit code.

Additional context

This is a regression from 1.4.1 to 1.5.0 from 820c5ac from #3994.

This code is the culprit, as it can never return Ok(()):

    // Run the EventManager that drives everything in the microVM.
    loop {
        event_manager
            .run()
            .expect("Failed to start the event manager");

        if let Some(exit_code) = vmm.lock().unwrap().shutdown_exit_code() {
            return Err(RunWithoutApiError::Shutdown(exit_code));
        }
    }

A fix should be simple.
I can open a PR if you like. :)

Checks

  • Have you searched the Firecracker Issues database for similar problems?
  • Have you read the existing relevant Firecracker documentation?
  • Are you certain the bug being reported is a Firecracker issue?
@mkroening mkroening changed the title [Bug] firecracker shuts down successfully with non-zero exit code without API [Bug] Firecracker cannot shut down successfully without API Oct 15, 2023
@JonathanWoollett-Light
Copy link
Contributor

JonathanWoollett-Light commented Oct 16, 2023

Minimal verifiable example

Updated

git clone https://github.com/firecracker-microvm/firecracker
cd firecracker
git switch firecracker-v1.5
cargo build --release
echo "{
  \"boot-source\": {
    \"kernel_image_path\": \"rusty-loader-x86_64-fc\",
    \"initrd_path\": \"hermit-rs-template\",
    \"boot_args\": \"\"
  },
  \"drives\": [],
  \"machine-config\": {
    \"vcpu_count\": 1,
    \"mem_size_mib\": 256,
    \"smt\": false
  }
}" > firecracker-config.json
echo "" > firecracker.log
wget https://github.com/hermit-os/loader/releases/download/v0.4.4/rusty-loader-x86_64-fc
wget https://github.com/firecracker-microvm/firecracker/files/12915940/hermit-rs-template.zip
unzip hermit-rs-template.zip
sudo ./build/cargo_target/release/firecracker \
    --no-api \
    --config-file firecracker-config.json \
    --log-path firecracker.log \
    --level Info --show-level \
    --show-log-origin
echo $?
jcawl@u08fff4fc3ef156:~/Projects/firecracker$ sudo ./build/cargo_target/release/firecracker \
    --no-api \
    --config-file firecracker-config.json \
    --log-path firecracker.log \
    --level Info --show-level \
    --show-log-origin
2023-10-16T13:06:02.792042618 [anonymous-instance:main] Running Firecracker v1.5.0
[LOADER][INFO] Loader: [0x100000 - 0x116018]
[LOADER][INFO] Found boot_params at 0x7000
[LOADER][WARN] Mapping 1 4KiB pages from 0x7000..0x8000 to 0x7000..0x8000
[LOADER][INFO] Found Linux kernel boot flag and header magic! Probably booting in firecracker.
[LOADER][INFO] Number of e820-entries: 3
[LOADER][INFO] e820-entry-table at 0x72d0
[LOADER][INFO] Initrd: Address 0xfadc000, Size 0x523cc8
[LOADER][INFO] Intialize PhysAlloc with 0x800000
[LOADER][INFO] Found an ELF module at 0xfadc000
[LOADER][WARN] Mapping 292 4KiB pages from 0xfadc000..0xfc00000 to 0xfadc000..0xfc00000
[LOADER][WARN] Mapping 2 2MiB pages from 0xfc00000..0x10000000 to 0xfc00000..0x10000000
[LOADER][INFO] Parsing kernel from ELF at 0xfadc000..0xffffcc8 (5389512 B)
[LOADER][WARN] Mapping 2 2MiB pages from 0xa00000..0xe00000 to 0xa00000..0xe00000
[LOADER][INFO] Loading kernel to 0xa00000
[LOADER][INFO] TLS is at 0xca8138..0xca81b0 (120 B)
[LOADER][WARN] Mapping 1 4KiB pages from 0x20000..0x21000 to 0x20000..0x21000
[LOADER][INFO] Found command line at 0x20000
[LOADER][INFO] Use kernel stack:  [0x118000 - 0x120000]
[LOADER][WARN] Mapping 8 4KiB pages from 0x118000..0x120000 to 0x118000..0x120000
[LOADER][INFO] Number of e820-entries: 3
[LOADER][INFO] e820-Entry with index 0: Address 0x0, Size 0x9fc00, Type 0x1
[LOADER][INFO] e820-Entry with index 1: Address 0x9fc00, Size 0x400, Type 0x2
[LOADER][INFO] e820-Entry with index 2: Address 0x100000, Size 0xff00000, Type 0x1
[LOADER][INFO] Found available RAM: [0x9fc00 - 0x10000000]
[LOADER][INFO] BootInfo located at 0x115000
[LOADER][INFO] Jumping to HermitCore Application Entry Point at 0xc11b10
[0][INFO] Welcome to Hermit 0.6.5
[0][INFO] Kernel starts at a00000
[0][INFO] BSS starts at 0xdd0db8
[0][INFO] tls_info = Some(
    TlsInfo {
        start: 0xca8138,
        filesz: 0x18,
        memsz: 0x78,
        align: 0x8,
    },
)
[0][INFO] Enable FSGSBASE support
[0][INFO] Total memory size: 256 MB
[0][INFO] Kernel region: [a00000 - e00000]
[0][INFO] A pure Rust application is running on top of Hermit!
[0][INFO] Heap: size 216 MB, start address e00000
[0][INFO] Heap is located at 0xe00000..0xe600000 (0 Bytes unmapped)
[0][INFO] 
[0][INFO] ===================== PHYSICAL MEMORY FREE LIST ======================
[0][INFO] 0x0000000E600000 - 0x00000010000000
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] 
[0][INFO] ================== KERNEL VIRTUAL MEMORY FREE LIST ===================
[0][INFO] 0x0000000E600000 - 0x00800000000000
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] 
[0][INFO] ========================== CPU INFORMATION ===========================
[0][INFO] Model:                   AMD EPYC
[0][INFO] Frequency:               2094 MHz (from Measurement)
[0][INFO] SpeedStep Technology:    Not Available
[0][INFO] Features:                MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AESNI RDRAND FMA MOVBE MCE FXSR XSAVE RDTSCP CLFLUSH TSC-DEADLINE X2APIC HYPERVISOR AVX2 BMI1 BMI2 FSGSBASE RDSEED 
[0][INFO] Physical Address Width:  48 bits
[0][INFO] Linear Address Width:    48 bits
[0][INFO] Supports 1GiB Pages:     Yes
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] Hermit booted on 1999-11-29 23:59:59.93544 +00:00:00
[0][PANIC] panicked at src/arch/x86_64/kernel/acpi.rs:471:30:
Hermit requires an ACPI-compliant system: ()
[0][INFO] Number of interrupts
[0][INFO] Shutting down system
[0][WARN] ACPI Power Off is not available
Error: RunWithoutApiError(Shutdown(Ok))
jcawl@u08fff4fc3ef156:~/Projects/firecracker$ echo $?
1
jcawl@u08fff4fc3ef156:~/Projects/firecracker$ 

@mkroening Can you please provide links for rusty-loader-x86_64-fc and hello_world so these can be wget.

    "kernel_image_path": "rusty-loader-x86_64-fc",
    "initrd_path": "hello_world",

@mkroening
Copy link
Author

Thanks for looking into this! :)

@JonathanWoollett-Light
Copy link
Contributor

Fixed with

impl From<MainError> for ExitCode {
    fn from(value: MainError) -> Self {
        let exit_code = match value {
            MainError::ParseArguments(_) => FcExitCode::ArgParsing,
            MainError::InvalidLogLevel(_) => FcExitCode::BadConfiguration,
            MainError::RunWithApi(ApiServerError::MicroVMStoppedWithoutError(code)) => code,
            MainError::RunWithApi(ApiServerError::MicroVMStoppedWithError(code)) => code,
+          MainError::RunWithoutApiError(RunWithoutApiError::Shutdown(code)) => code,
            _ => FcExitCode::GenericError,
        };

        ExitCode::from(exit_code as u8)
    }
}

e.g.

jcawl@u08fff4fc3ef156:~/Projects/firecracker$ sudo ./build/cargo_target/release/firecracker \
    --no-api \
    --config-file firecracker-config.json \
    --log-path firecracker.log \
    --level Info --show-level \
    --show-log-origin
2023-10-16T13:14:11.669670984 [anonymous-instance:main] Running Firecracker v1.5.0
[LOADER][INFO] Loader: [0x100000 - 0x116018]
[LOADER][INFO] Found boot_params at 0x7000
[LOADER][WARN] Mapping 1 4KiB pages from 0x7000..0x8000 to 0x7000..0x8000
[LOADER][INFO] Found Linux kernel boot flag and header magic! Probably booting in firecracker.
[LOADER][INFO] Number of e820-entries: 3
[LOADER][INFO] e820-entry-table at 0x72d0
[LOADER][INFO] Initrd: Address 0xfadc000, Size 0x523cc8
[LOADER][INFO] Intialize PhysAlloc with 0x800000
[LOADER][INFO] Found an ELF module at 0xfadc000
[LOADER][WARN] Mapping 292 4KiB pages from 0xfadc000..0xfc00000 to 0xfadc000..0xfc00000
[LOADER][WARN] Mapping 2 2MiB pages from 0xfc00000..0x10000000 to 0xfc00000..0x10000000
[LOADER][INFO] Parsing kernel from ELF at 0xfadc000..0xffffcc8 (5389512 B)
[LOADER][WARN] Mapping 2 2MiB pages from 0xa00000..0xe00000 to 0xa00000..0xe00000
[LOADER][INFO] Loading kernel to 0xa00000
[LOADER][INFO] TLS is at 0xca8138..0xca81b0 (120 B)
[LOADER][WARN] Mapping 1 4KiB pages from 0x20000..0x21000 to 0x20000..0x21000
[LOADER][INFO] Found command line at 0x20000
[LOADER][INFO] Use kernel stack:  [0x118000 - 0x120000]
[LOADER][WARN] Mapping 8 4KiB pages from 0x118000..0x120000 to 0x118000..0x120000
[LOADER][INFO] Number of e820-entries: 3
[LOADER][INFO] e820-Entry with index 0: Address 0x0, Size 0x9fc00, Type 0x1
[LOADER][INFO] e820-Entry with index 1: Address 0x9fc00, Size 0x400, Type 0x2
[LOADER][INFO] e820-Entry with index 2: Address 0x100000, Size 0xff00000, Type 0x1
[LOADER][INFO] Found available RAM: [0x9fc00 - 0x10000000]
[LOADER][INFO] BootInfo located at 0x115000
[LOADER][INFO] Jumping to HermitCore Application Entry Point at 0xc11b10
[0][INFO] Welcome to Hermit 0.6.5
[0][INFO] Kernel starts at a00000
[0][INFO] BSS starts at 0xdd0db8
[0][INFO] tls_info = Some(
    TlsInfo {
        start: 0xca8138,
        filesz: 0x18,
        memsz: 0x78,
        align: 0x8,
    },
)
[0][INFO] Enable FSGSBASE support
[0][INFO] Total memory size: 256 MB
[0][INFO] Kernel region: [a00000 - e00000]
[0][INFO] A pure Rust application is running on top of Hermit!
[0][INFO] Heap: size 216 MB, start address e00000
[0][INFO] Heap is located at 0xe00000..0xe600000 (0 Bytes unmapped)
[0][INFO] 
[0][INFO] ===================== PHYSICAL MEMORY FREE LIST ======================
[0][INFO] 0x0000000E600000 - 0x00000010000000
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] 
[0][INFO] ================== KERNEL VIRTUAL MEMORY FREE LIST ===================
[0][INFO] 0x0000000E600000 - 0x00800000000000
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] 
[0][INFO] ========================== CPU INFORMATION ===========================
[0][INFO] Model:                   AMD EPYC
[0][INFO] Frequency:               2095 MHz (from Measurement)
[0][INFO] SpeedStep Technology:    Not Available
[0][INFO] Features:                MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AESNI RDRAND FMA MOVBE MCE FXSR XSAVE RDTSCP CLFLUSH TSC-DEADLINE X2APIC HYPERVISOR AVX2 BMI1 BMI2 FSGSBASE RDSEED 
[0][INFO] Physical Address Width:  48 bits
[0][INFO] Linear Address Width:    48 bits
[0][INFO] Supports 1GiB Pages:     Yes
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] Hermit booted on 1999-11-29 23:59:59.936184 +00:00:00
[0][PANIC] panicked at src/arch/x86_64/kernel/acpi.rs:471:30:
Hermit requires an ACPI-compliant system: ()
[0][INFO] Number of interrupts
[0][INFO] Shutting down system
[0][WARN] ACPI Power Off is not available
Error: RunWithoutApiError(Shutdown(Ok))
jcawl@u08fff4fc3ef156:~/Projects/firecracker$ echo $?
0
jcawl@u08fff4fc3ef156:~/Projects/firecracker$ 

@JonathanWoollett-Light
Copy link
Contributor

Posted fixes in #4178 and #4179.

@mkroening
Copy link
Author

Thanks a lot!

The status code looks good, but it still prints Error: RunWithoutApiError(Shutdown(Ok)).

@JonathanWoollett-Light
Copy link
Contributor

JonathanWoollett-Light commented Oct 16, 2023

Yeah this also applies to other cases. I think changing this further can be another discussion, I agree it is non-ideal.

For now we do not recommend pattern matching on logs, they are meant for humans and often change between releases.

What do you think about a solution like:

fn main() -> ExitCode {
    let result = main_exec();
    if let Err(err) = result {
        let exit_code = ExitCode::from(err);
        if exit_code != ExitCode:SUCCESS {
            error!("{err}");
            eprintln!("Error: {err:?}");
        }
        exit_code
    } else {
        ExitCode::SUCCESS
    }
}

This is still quite awkward, it feels more like a work-around than fixing the problem.

@bchalios
Copy link
Contributor

Although it is true we do not suggest that users should check log messages, I agree that printing something like "Error: whatever" can be quite confusing. Is this difficult to fix?

@mkroening
Copy link
Author

What do you think about a solution like:

How about fixing run_without_api to return Ok on success instead?

    // Run the EventManager that drives everything in the microVM.
    loop {
        event_manager
            .run()
            .expect("Failed to start the event manager");

        if let Some(exit_code) = vmm.lock().unwrap().shutdown_exit_code() {
            return match exit_code {
                ExitCode:SUCCESS => Ok(()),
                exit_code => Err(RunWithoutApiError::Shutdown(exit_code))
            };
        }
    }

@JonathanWoollett-Light
Copy link
Contributor

@mkroening I've posted a draft PR that refactors the error propagation similar to your suggestion #4180.

@mkroening
Copy link
Author

Thanks! :)

@JonathanWoollett-Light
Copy link
Contributor

@mkroening PRs have been merged fixing this on main and the v1.5 branch.

@mkroening
Copy link
Author

Awesome, thanks a lot! When do you expect 1.5.1 to be released? For now, we downgraded to 1.4.1. :)

@roypat
Copy link
Contributor

roypat commented Nov 7, 2023

Awesome, thanks a lot! When do you expect 1.5.1 to be released? For now, we downgraded to 1.4.1. :)

Hi @mkroening, we expect to start working on preparing the 1.5.1 release in two weeks

@bchalios
Copy link
Contributor

Hi @mkroening! We have released now 1.5.1 (and recently 1.6.0). Could you validate that these releases solve the problem?

@mkroening
Copy link
Author

Both of those releases work wonderfully (see hermit-os/kernel#1000).

Thanks a lot! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants