Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gdb server #2168 #2333

Conversation

kumargu
Copy link
Contributor

@kumargu kumargu commented Dec 9, 2020

Reason for This PR

Initial implementation of a GDB server for guest debugging (#2168)

Description of Changes

  • Created a new package implementing the gdbstub interface for client-server communication

  • Implemented the following functionalities: breakpoint, single-step, reading of registers and memory

  • Implemented virtual-to-physical address translation, used by any guest memory access performed as part of the above-mentioned functionalities

  • Created a new thread for handling client requests and two channels for vCPU - GDB server thread communication

  • Added a "--debugger" command-line option for the firecracker executable

  • This functionality can be added in rust-vmm.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any newly added unsafe code is properly documented.
  • Any API changes are reflected in firecracker/swagger.yaml.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

@kumargu kumargu changed the title Baciumar/gdb server [WIP] Rebase-Baciumar/gdb server Dec 9, 2020
@gbionescu
Copy link

Hi @kumargu.

The gdb changes should be pointed towards the branch called firecracker-microvm:feature/gdb_server.

@kumargu
Copy link
Contributor Author

kumargu commented Dec 9, 2020

Hi @kumargu.

The gdb changes should be pointed towards the branch called firecracker-microvm:feature/gdb_server.

Yes, I will point it to the feature branch, just want to make sure everything builds fine with this (#2319) change which is not yet there in feature.

@kumargu kumargu force-pushed the baciumar/gdb_server branch 8 times, most recently from fd53103 to ae7f807 Compare December 9, 2020 19:32
@kumargu kumargu force-pushed the baciumar/gdb_server branch 3 times, most recently from 16a29b0 to 69eba2c Compare December 29, 2020 12:03
@dianpopa
Copy link
Contributor

Hi @kumargu

Wanted to know if you are still interested to contribute to this PR? Is there any help needed from us to get you unblocked?

@kumargu
Copy link
Contributor Author

kumargu commented Feb 10, 2021

Hi,
Sorry, got a little busy with other stuff.
Had a sync-up with @gc-plp. As next steps we will fix the build issues. The build is failing because we don't yet have support for aarch64. We will add flags to just support x86 for now and work for supporting aarch64 in a separate PR.
I will raise a PR to fix the build issues and hopefully have a working build by this weekend.

@kumargu kumargu force-pushed the baciumar/gdb_server branch 9 times, most recently from 42f69f2 to cce931a Compare February 11, 2021 20:04
@gbionescu
Copy link

Hi @kumargu. Seems that the commits should be signed. Please do that and resubmit.

@kumargu
Copy link
Contributor Author

kumargu commented Feb 17, 2021

Hi @kumargu. Seems that the commits should be signed. Please do that and resubmit.

Ack.

@kumargu kumargu changed the base branch from master to feature/gdb_server February 26, 2021 15:00
};

pub use gdbstub::GdbStubError;
#[allow(unused_imports)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove #[allow(unused_imports)]?

The purpose of imports is to have a clear understanding of what is used from other crates, but here we're just importing everything.


extern crate vm_memory;

#[allow(dead_code, unused_imports)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for allow(dead_code).

#[allow(dead_code)]
fn wait_for_tcp(port: u16) -> DynResult<TcpStream> {
let sockaddr = format!("0.0.0.0:{}", port);
eprintln!("Waiting for a GDB connection on {:?}...", sockaddr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of eprintln we could add these messages to the logger.

let mut debugger = GdbStub::new(connection);
match debugger.run(&mut target)? {
DisconnectReason::Disconnect => {
println!("Disconnected from GDB.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

println statements could also be directed through the logger.

pub fn insert_bp(&mut self, linear_addr: u64, translate: bool) -> Result<(), DebuggerError> {
// Opcode specific to x86 architecture that triggers a trap when
// encountered during cpu execution
let int3: u8 = 0xCC;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this as a constant somewhere.

let mut opcode: Option<u8> = None;
if self.breakpoints_linear.contains_key(&linear_addr) {
let val = self.breakpoints_linear.get_mut(&linear_addr).unwrap();
if !val.2 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these tuples represent? Can we add a comment explaining the logic?

@gbionescu gbionescu mentioned this pull request Mar 5, 2021
3 tasks
@alindima alindima added Status: Author and removed Status: Awaiting review Indicates that a pull request is ready to be reviewed labels Mar 17, 2021
@kumargu
Copy link
Contributor Author

kumargu commented Mar 22, 2021

sorry for the delay, will pick the comments this week.

@alindima alindima mentioned this pull request May 26, 2021
9 tasks
@LegNeato
Copy link

Anything I can do to help here? I might have some free time to help push this over the line if you'll point me in the right direction.

@gbionescu
Copy link

Hi @LegNeato!

We are open to external contributors here as we don't have any work planned for this functionality.

The to-do items here would be:

  1. Making page walking work. In order to translate from virtual to physical memory, the PoC in this PR makes use of a workaround where it picks up the sections from the elf file and uses that to do the translations. Ideally we want to avoid that.

  2. Multi vcpu support: the PoC only works for 1-vcpu microvms, so we'd like to be able to use multi-vcpu microvms.

  3. ARM support. Debugging arm microvms would be useful!

  4. Hardware breakpoints support. The PoC only uses software breakpoints.

  5. Enabling the gdb server code only when a special target is built - e.g. firecracker-gdb

The items above are not enumerated by priority.

I'd say that (1), (5) and optionally (2) are must have to start working towards merging the PR.

If you're interested in working on any of these items I propose that we discuss here or on our slack server together with @kumargu.

Let us know what you think!

@kumargu
Copy link
Contributor Author

kumargu commented Aug 10, 2021

Did a comparison of translated address using the current mechanism (page walks + reading phdr headers) vs multi level page walks.
translated address matches for a) Identity mapped pages b) 1G/2M pages but the translation seems to fail for other cases (4K pages).

Some logs -

Waiting for a GDB connection on "0.0.0.0:8443"...
Debugger connected from 127.0.0.1:59334
level 4 vaddr 10001f0 table-addr 9000 mask ffffffffff000 ent a023 offset 0
level 3 vaddr 10001f0 table-addr a023 mask ffffffffff000 ent b023 offset 0
level 2 vaddr 10001f0 table-addr b023 mask ffffffffff000 ent 10000a3 offset 40
Linear Address is: 16777712. Translated addresses using phdrs headers: 16777712 Translated Address using page walks: 16777712 Page size 2097152
level 4 vaddr ffffffff8100005d table-addr 9000 mask ffffffffff000 ent 0 offset ff8
level 3 vaddr ffffffff8100005d table-addr 0 mask ffffffffff000 ent 0 offset ff0
level 2 vaddr ffffffff8100005d table-addr 0 mask ffffffffff000 ent 0 offset 40
level 1 vaddr ffffffff8100005d table-addr 0 mask ffffffffff000 ent 0 offset 0
Linear Address is: 18446744071578845277. Translated addresses using phdrs headers: 16777309 Translated Address using page walks: 93 Page size 4096
level 4 vaddr ffffffff81ce5951 table-addr 9000 mask ffffffffff000 ent 0 offset ff8
level 3 vaddr ffffffff81ce5951 table-addr 0 mask ffffffffff000 ent 0 offset ff0
level 2 vaddr ffffffff81ce5951 table-addr 0 mask ffffffffff000 ent 0 offset 70
level 1 vaddr ffffffff81ce5951 table-addr 0 mask ffffffffff000 ent 0 offset 728
Linear Address is: 18446744071592368465. Translated addresses using phdrs headers: 30300497 Translated Address using page walks: 2385 Page size 4096
level 4 vaddr ffffffff81ce5966 table-addr 1d08000 mask ffffffffff000 ent 1c0c067 offset ff8
level 3 vaddr ffffffff81ce5966 table-addr 1c0c067 mask ffffffffff000 ent 1c0d063 offset ff0
level 2 vaddr ffffffff81ce5966 table-addr 1c0d063 mask ffffffffff000 ent 1c001e3 offset 70
Linear Address is: 18446744071592368486. Translated addresses using phdrs headers: 30300518 Translated Address using page walks: 30300518 Page size 2097152
level 4 vaddr ffffffff81ce5974 table-addr 1d08000 mask ffffffffff000 ent 1c0c067 offset ff8
level 3 vaddr ffffffff81ce5974 table-addr 1c0c067 mask ffffffffff000 ent 1c0d063 offset ff0
level 2 vaddr ffffffff81ce5974 table-addr 1c0d063 mask ffffffffff000 ent 1c001e3 offset 70
Linear Address is: 18446744071592368500. Translated addresses using phdrs headers: 30300532 Translated Address using page walks: 30300532 Page size 2097152
level 4 vaddr ffffffff81ce5982 table-addr 1d08000 mask ffffffffff000 ent 1c0c067 offset ff8
level 3 vaddr ffffffff81ce5982 table-addr 1c0c067 mask ffffffffff000 ent 1c0d063 offset ff0
level 2 vaddr ffffffff81ce5982 table-addr 1c0d063 mask ffffffffff000 ent 1c001e3 offset 70
Linear Address is: 18446744071592368514. Translated addresses using phdrs headers: 30300546 Translated Address using page walks: 30300546 Page size 2097152
level 4 vaddr ffffffff81ce5990 table-addr 1d08000 mask ffffffffff000 ent 1c0c067 offset ff8
level 3 vaddr ffffffff81ce5990 table-addr 1c0c067 mask ffffffffff000 ent 1c0d063 offset ff0
level 2 vaddr ffffffff81ce5990 table-addr 1c0d063 mask ffffffffff000 ent 1c001e3 offset 70
Linear Address is: 18446744071592368528. Translated addresses using phdrs headers: 30300560 Translated Address using page walks: 30300560 Page size 2097152
level 4 vaddr 3ff table-addr 1d08000 mask ffffffffff000 ent 0 offset 0
level 3 vaddr 3ff table-addr 0 mask ffffffffff000 ent 0 offset 0
level 2 vaddr 3ff table-addr 0 mask ffffffffff000 ent 0 offset 0
level 1 vaddr 3ff table-addr 0 mask ffffffffff000 ent 0 offset 0

Breakpoints -

(gdb) target remote 127.0.0.1:8443
Remote debugging using 127.0.0.1:8443
0x0000000000000000 in irq_stack_union ()
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000001000000 in ?? ()
(gdb) b *0x10001f0
Breakpoint 1 at 0x10001f0
(gdb) next
Cannot find bounds of current function
(gdb) b *0xffffffff8100005d
Breakpoint 2 at 0xffffffff8100005d
(gdb) b boot_cpu_init
Breakpoint 3 at 0xffffffff81ce5951
(gdb) continue
Continuing.

Breakpoint 1, 0x00000000010001f0 in ?? ()
(gdb) continue
Continuing.

Breakpoint 2, 0xffffffff8100005d in secondary_startup_64 ()
(gdb) continue
Continuing.

Breakpoint 3, 0xffffffff81ce5951 in boot_cpu_init ()
(gdb) next
Single stepping until exit from function boot_cpu_init,
which has no line number information.
0xffffffff81cd0b5d in start_kernel ()

@gbionescu
Copy link

Great to see this move forward @kumargu!

@kumargu
Copy link
Contributor Author

kumargu commented Aug 11, 2021

I further did some more analysis 🧐

translations through page walks fails only for addresses mapped to early bootup phase, because there is no notion of virtual address. To handle this baciumar@ had this nice logic to look at elf headers and adjust the addresses -- similar to what the linux kernel does [link].

we would debate if we need this logic of parsing elf headers or just throw an exception if Firecracker GDB debuggers try to access memory regions in the early boot up phase.

@gbionescu
Copy link

@kumargu so page walking works after the mapping is done?

@kumargu
Copy link
Contributor Author

kumargu commented Aug 11, 2021

@gc-plp Yes, it works after the mapping is done. there was a small check that was missing - I will update the PR.

@kumargu
Copy link
Contributor Author

kumargu commented Aug 17, 2021

Update -

  • Address translation through page walks works once paging is enabled i,e post early boot-up process.
  • With current implementation, if someone puts a breakpoint within the memory regions mapped to early boot process, we will throw an exception. We will later work on solving this problem or use the logic to read elf headers -- if we see lot of people needs this feature.

Next steps -

  • Add unit tests to the page walking logic.
  • Some code clean-ups.
  • Build a different target for gdb -- essentially isolating it from firecracker's target build.

@dianpopa
Copy link
Contributor

We will close this PR for the moment. @kumargu feel free to reopen if you have the bandwidth to resume work here.

@Notselwyn
Copy link

Notselwyn commented Jan 8, 2024

Is there a roadmap / list of code changes required for gdb-server to work on Firecracker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants