Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to modern QEMU! #570

Closed
7 tasks
AndrewFasano opened this issue Mar 20, 2020 · 59 comments
Closed
7 tasks

Update to modern QEMU! #570

AndrewFasano opened this issue Mar 20, 2020 · 59 comments
Milestone

Comments

@AndrewFasano
Copy link
Member

AndrewFasano commented Mar 20, 2020

We've been talking about this for a bit but haven't started work on it. Creating this issue to track progress.

We're currently forked off of Qemu at version 2.9.1. We should update to 4+. At the time of writing, the latest version if 4.1.

We'll likely need to disable MTTCG to avoid significantly changing the record/replay model.

The main tasks I see for now are:

  • Actually do the git merge and handle merge conflicts to create a (likely broken) branch with all the commits
  • Get callbacks to run
  • Capture correct recordings
  • Replay recordings correctly
  • Library mode: build qemu as a library
  • Pypanda testing
  • Extensive testing
@AndrewFasano AndrewFasano added this to the qemu4 milestone Mar 20, 2020
@janbbeck
Copy link
Contributor

Just to give it my 2 cents:
I recently upgraded qira from qemu 2.x to qemu 4.0 instead of 4.1. At least for that effort, the changes after 4.0 were getting so severe that porting to 4.0 before going to 4.1 made sense.
This may, of course, not be true for PANDA, but it may be easier to port PANDA to some intermediate version of qemu than going to 4.1 all at once.

@hanetzer
Copy link
Contributor

judging by the amount of monkeypatching I needed to do to get pandas as it currently
exists to build on my system, I almost think it would be easier to collect what is different
between pandas and the qemu version its based on and just rewrite it against qemu4.

@AndrewFasano
Copy link
Member Author

@janbbeck Thanks for the tip. 4.0 sounds like it might be a more realistic target.

@hanetzer It's unlikely that any of us have time to do a full rewrite given that there are 2k commits to PANDA since we forked. If we had time, that would certainly be the cleanest way. Also- you shouldn't need to do any monkeypatching- if you've had build issues can you open an issue? Our CI has been able to build PANDA on clean machines without any problems.

@hanetzer
Copy link
Contributor

@AndrewFasano I'm running on gentoo ~amd64 (read: bleeding on the edge),
so there's a lot of new gcc diagnostics and such that make the build fail in creative ways.

While I have a human looking at this, I'll open an issue about what I'm trying to
do and what is/isn't/idk working.

@AndrewFasano
Copy link
Member Author

@hanetzer Ah, okay, that makes more sense then. We just fixed some gcc7-related errors in the last few weeks but that's still probably not new enough for you.

@hanetzer
Copy link
Contributor

@AndrewFasano yeah. I'm currently working under a deadline and this
is the last tool I can think of to help me do what I want, but once I either
fail the task or succeed I'm more than willing to help put in the work on
the update. Think you could eyeball #577 and toss me some suggestions?

@janbbeck
Copy link
Contributor

fwiw, panda builds just fine for me with gcc 7 8 ad 9 under ubuntu 19.04, as long as werror is suppressed. no monkeying necessary, using this type of configure:

./configure --target-list=x86_64-softmmu,i386-softmmu,arm-softmmu,ppc-softmmu --prefix=/home/jan/Downloads/panda/panda/scripts/panda/build/install --python=/usr/bin/python2 --disable-vhost-net --extra-cflags=-DXC_WANT_COMPAT_DEVICEMODEL_API --extra-cflags=-DOSI_PROC_EVENTS --extra-cflags=-DOSI_MAX_PROC=256 --extra-cflags=-DOSI_LINUX_PSDEBUG --extra-cflags=-Wformat-truncation=0 --disable-werror --cc=gcc-9 --cxx=g++-9 --host-cc=gcc-9
  • just change to compiler of your choice.

I just rebuilt panda with gcc 7, 8 and 9 and booted each time into a ubuntu live dvd and noticed no issues. HTH

@nathanjackson
Copy link
Contributor

Bumping versions is hard, but not impossible. About a year ago, I worked on bringing PANDA up to 2.9.1 from an in-development version of QEMU 2.9. That itself was an undertaking, but mostly because I had to figure out what MTTCG changes were made and favor the PANDA version.

I would recommend bumping versions incrementally. I know you really want QEMU 4 (or maybe even now, QEMU 5) but I suspect this won't be easy.

I'd follow this approach, as it's what I did to bring PANDA to 2.9.1.

  1. Pick a version, recommend 2.10.2.
  2. Make a list of commits that need to be merged in.
  3. Merge and bisect over the commits to be merged to see where stuff breaks.
  4. Test, etc.
  5. Rinse and repeat for 2.11.2, ... 5.0, etc.

While this is somewhat tedious there's probably a fair amount of automation you could do and you'll be in a working state at each step of the way and avoid huge merge conflicts.

@janbbeck
Copy link
Contributor

I would like to point out that one of the major reasons I stopped at 4.0.0 for the qira port is that after that they added a plugin architecture of some sort that greatly changed the source files. It sure appered tailored for the sort of thing qira and panda do. It looked a lot to me like the correct thing to do from that point forward is re-do qira as a plugin - and that was more than I was willing to bite off :p
I am bringing this up, because maybe - just maybe - the qemu plugin stuff is actually the way panda should be done in newer versions of qemu. If that is the case, I am sure there will be no end of discussion between porting piecemeal or just biting the bullet and making a plugin.

But I think it's worth a look before deciding to go piece by piece to 4.0.0 and beyond.

Thoughts?

@janbbeck
Copy link
Contributor

@AndrewFasano
Copy link
Member Author

I've looked at the tcg plugins a little and I'm really excited that qemu is finally starting to support using it for analysis, but I think there's still a long way to go before it could support everything we use/provide with panda. The biggest issue I see for now is that the TCG plugins only support passive observation so you can't modify a system during its execution.

@janbbeck
Copy link
Contributor

That is good to know! I suspect the code changes in qemu to support the plugins make porting panda painful....

@AndrewFasano
Copy link
Member Author

I guess most of our plugins do passive analysis, but a lot of my research involves mutating guest state so I'm probably a bit biased when I consider that shortcoming to be a dealbreaker.

@janbbeck
Copy link
Contributor

janbbeck commented Mar 25, 2020

Is there some particular shortcoming that necessitates a move away from the current qemu base?

@AndrewFasano
Copy link
Member Author

@mariusmue first brought it up so he might want to chime in. I mainly want support for more machine types and whatever stability/performance fixes they've made.

@mariusmue
Copy link
Contributor

mariusmue commented Mar 26, 2020

Hi all,
In my opinion, an update to a newer version of Qemu benefits foremost emulation and analysis of non x86-based systems. As @AndrewFasano mentioned, there are just more machines, as well as architectures implemented. Furthermore, when it comes to avatar-related changes, being on modern versions of QEMU would it make easier to sync changes between the two frameworks, but this is just a minor point.

MTTCG is luckily guarded well behind preprocessor macros, so it should be easy to keep it deactivated. Furthermore, I think somewhen around qemu 3.0 was a tremendous change/cleanup in the QAPI, I personally like the new organization better.

When it comes to tcg-plugins, yes, they are great for passive monitoring of VMState. In case PANDA wants to enforce a strict separation between record and analysis, I would suggest that recording is re-implemented on top of this API (if possible), as this would allow recording on stock-builds of qemu as drop-in solution. In theory, this should even allow for records independent of the QEMU version, but I think there may various problems arise in praxis. (E.g., different peripheral implementations.)
In any case, the actual analysis/replay instances of PANDA would still need to hook at various in the codebase, but distinguishing whether we are in record or replay mode should be an artifact of the past, allowing for a cleaner codebase, easier to migrate to upcoming versions of qemu as well.

Hence, if these changes are going to happen, I would not plead for a complete rewrite of PANDA, but for identifying and minimizing the locations PANDA actually hooks into QEMU's core logic.

This was referenced Apr 9, 2020
@glueckself
Copy link

I'm merging Qemu 5.0 rc1 into PANDA as part of my thesis.
I hope that going from rc1 to the release version wont be a big problem.

I jumped right into it (I do question myself if that was the smart way) and I have the conflicts down to cpus.c and softmmu_template.h, and whatever errors the compiler will throw at me when I first build it. There were a few parts that were drastically changed, mostly around the TCG.

I'm waiting on the reply from my supervisor about how and when we can publish the source, but I guess I'll know more next week and by then I think I'll also have the merge commit.

@AndrewFasano
Copy link
Member Author

Hey @glueckself, that's great to hear! After you finish the merge, I'd expect there to be lots of bugs (unless you're really good at merging) as PANDA usually needs additional updates when core QEMU things change. Once you have a merge commit, I think there are a number of us who would be willing to help track down bugs if you're able to share!

@m000
Copy link
Contributor

m000 commented Apr 11, 2020

Some thoughts about the (welcome) migration to QEMU 4 codebase.

Since PANDA will continue following the development of QEMU, maybe it would be helpful to label PANDA releases based on the underlying QEMU codebase? E.g. current version would be PANDA2 (based on QEMU 2.x) codebase. Next version would be PANDA4 (based on QEMU 4.x). This would make it a bit easier to discuss issues while both versions are in use.

I can see this issue growing longer and longer. Maybe a separate branch should be created while stabilizing/working out bugs with the QEMU4-based panda? Then issues related to the branch can be reported individually. A new QEMU4 or PANDA4 label for the issue tracker can be used to filter issues quickly.

Finally, it would probably be good to also announce a draft time-plan for the deprecation of the current code-base. This would encourage the community to migrate any incompatible code to the new version.

@glueckself
Copy link

If my supervisor is ok with me publishing the code on Github, I can fork this repository. The fork would then provide a separate issue tracker. When the migration is completed, you could merge my fork back here.
Please note that I'm migrating PANDA to Qemu 5.

@AndrewFasano
Copy link
Member Author

@m000 I like your suggestions - then we get to jump from PANDA2 to PANDA4 and skip all the work of PANDA3 ;) I think we should wait until we have the new version at least partly working before we plan any deprecation timelines.

I think we'll take a look at @glueckself's code if/when that's available and then pull it into a branch on this repo, instead of tracking it in a separate repo. QEMU 5 sounds great if you're able to get it to work. Then we can go right up to PANDA5 :)

@glueckself
Copy link

I've now uploaded the branch containing the merge, however, I'm still fixing compiler errors. I have never merged anything this big before, so I expect there will be some mistakes in there. I would appreciate any feedback. :)
I also haven't looked into testing of Qemu and PANDA yet, I think that'll be the next step after getting it to compile.

Regarding the checklist: my goal is to get record/replay running (and probably only for i386 and arm). After that, I'll probably have to switch over to my thesis (btw, it's only a BSc thesis).

@nathanjackson
Copy link
Contributor

There will be a public regression testing framework available soon.

I just looked at your branch: 23610 commits ahead, 654 commits behind panda-re:master.... Godspeed.

@glueckself
Copy link

glueckself commented Apr 17, 2020

I've got qemu-system-i386 to build, however, some of the "fixes" I made are... a bit ugly (especially ef6a13e). It manages to boot PC-MOS/386. And, with a workaround, Linux. But only a test image. Debian hangs on some udev soft lockups.

Issues I've discovered so far:

  • It's very slow. I've tried to look into it, but I couldn't find anything.
  • The main thread has 100% CPU load (might be what causes it to be slow).
  • The cdrom Device timeouts. This causes Linux to fail to boot. Workaround is to start qemu with -nodefaults -vga std (I'm not sure if there is a way to remove only the cdrom drive). Haven't looked into it yet.
  • Soft lock ups in the guest.
  • It crashes when starting with -llvm or on begin_record.
  • The qemu tests/qtest/boot-serial-test hangs.

I've marked some places where I don't understand what's going on (or what should be going on) with "//TODO: panda:". I'll try to sort out as many as I can. Also, there are a lot of warnings. I haven't looked into them yet, probably there are bad ones in there.

My next goal is to get everything to compile and clean up the warnings and TODOs.

I have problems with the C/C++ mixing. Qemu started to use some C specific stuff in (e.g. __builtin_types_compatible_p() in include/qemu/atomic.h ) and g++ doesn't support that. I'm not sure if C-linkages can solve that (i.e. if there is a extern "C" missing somewhere) or if that has to be implemented for C++. Would it be possible for someone to support me there?

UPDATE: Now everything builds. However, there is also one more commit of questionable quality. Also, the include dependencies are not set up properly so that the make must be run with -j at least twice to make use of a race condition to create plog.pb.h.

@nathanjackson
Copy link
Contributor

@glueckself I would highly recommend incrementally merging, even within a QEMU version. I would try getting PANDA to the next released version of QEMU first (2.10.X I think). Tracking down these issues will not be easy, certainly not something where someone on the internet can just point you in the right direction.

@glueckself
Copy link

@nathanjackson do you think that I'm on a dead end here? Because I've got it to build everything by now and my next step would be to clean stuff up. I guess most of the issues are somewhere near a "//TODO: panda:" comment and some are pointed to by compiler warnings that I've ignored for now. I don't think I would resolve most of those conflicts any better if they came one-by-one instead of all-at-one.
To be honest, I would hate to have to throw everything away (however, I do prefer to throw it away instead of looking for bugs for the next two years).

@github-actions
Copy link

github-actions bot commented Nov 8, 2021

This issue has gone stale! If you believe it is still a problem, please comment on this issue or it will be closed in 30 days

@AndrewFasano
Copy link
Member Author

Still working on this! I've been exploring re-implementing some of the core PANDA features on top of QEMU 6 in https://github.com/andrewfasano/futurepanda by expanding their plugin interface to add support for various PANDA capabilities. So far I've implemented PPP-style callbacks and access to reading guest registers. You can see example plugins in the panda directory, e.g., syscalls3.

Not sure if that will pan out, and a lot of key PANDA features (record/replay, LLVM, taint, pypanda/panda-rs) are out of scope, at least for now. But maybe there'd be be some value in supporting two versions of PANDA concurrently: 1) this version where you get more features, but it's based off older qemu, and 2) a version designed for live-analyses which is up to date with upstream, but lacks some features. Moving plugins out of tree (#1088) might mean that we could support the same plugins across both versions (though that would be a fair amount of work, given how upstream has reimplemented/renamed a bunch of APIs and we'd need to build some shim layers).

@rjzak
Copy link

rjzak commented Dec 1, 2021

This might resolve #1077, extern C vs. glib templates. Broken in Pop_OS (Ubuntu) 21.04 and Ubuntu 21.10.

@github-actions
Copy link

github-actions bot commented Jan 31, 2022

This issue has gone stale! If you believe it is still a problem, please comment on this issue or it will be closed in 30 days

[Manual edit: I just disabled the stale issue bot -Andrew]

@rjzak
Copy link

rjzak commented Jan 31, 2022

Bump.... still an issue even if stale.

@Manouchehri
Copy link
Contributor

I know this might not be too helpful anymore, but if you're trying to merge in a ton of commits, I would highly recommend using git bisect with a test script (e.g. do a clean full build and run a test target) to see which changes break functionality in PANDA.

@fishfacegit
Copy link

Hi,
i just wanted to ask what the status of the qemu upgrade is?
Since malware have greatly improved on detecting VMs, it would be great to have a modern qemu to counter rdtsc reads on vm exit and similar instructions via Hyper-V Enlightenment.
From panda-re perspective, as far as i understand, you cannot record what is not executed.
Best Regards

@AndrewFasano
Copy link
Member Author

AndrewFasano commented Mar 27, 2023

We are actively developing a new version of PANDA atop qemu 8 using TCG plugins. We’re working with upstream to merge a new version of the PPP interface to allow plugins to interact with one another which we see as the first major blocker for PANDA-like functionality. After we hopefully get that merged, we’ll need to upstream patches to allow plugins to access guest registers and memory. Then we can start porting and upstreaming plugins as well. We currently have syscalls, osi_linux, and stringsearch ported to the new interface. And a limited pypanda interface. We haven’t yet tackled the record/replay system or LLVM IR as required by the taint system.

If you’re interested in joining the discussion with the QEMU developers, our latest patch series is at the following link. So far the discussion has only been between us and the maintainer, so other perspectives might be appreciated.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg926042.html

Unfortunately our funding for this work runs out in a few days so we’re not sure what the future of this will be, but the work we've done so far is available here, and if upstream merges the QPP stuff we’ll try getting some of our API changes and plugins upstreamed too

@ghost
Copy link

ghost commented Mar 30, 2023

@AndrewFasano I want to make sure that the panda is based on the branch of the 2.9.1 stable version of the qemu warehouse?
After looking at it, the new version of qemu7.2.0 is very similar to the previous version, some of which are just code additions and deletions, or structural changes between versions.
The function of qemu is essentially the same in all versions. Based on this point of view, panda can be updated to the new qemu version.
The panda project is very good, but due to the direction of the project and the old qemu, its panda activity is low.
Use vscode to open panda and qemu7.2.0 and open the current original qemu version of panda. Through the vscode search function, such as searching for the 'panda_' keyword, it is very easy to see where these keywords are used and update them.
Or use the text comparison tool to compare the differences between the three, plus the code reading ability of qemu, updating qemu7.2.0 is not a problem.

Regarding the qemu application, it can be compiled into so or other types of dynamic link libraries. One step forward is to optimize the panda project structure.

Project optimization + new version of qemu + core panda features = more active commits

@XVilka
Copy link

XVilka commented Apr 23, 2023

We are actively developing a new version of PANDA atop qemu 8 using TCG plugins. We’re working with upstream to merge a new version of the PPP interface to allow plugins to interact with one another which we see as the first major blocker for PANDA-like functionality. After we hopefully get that merged, we’ll need to upstream patches to allow plugins to access guest registers and memory. Then we can start porting and upstreaming plugins as well. We currently have syscalls, osi_linux, and stringsearch ported to the new interface. And a limited pypanda interface. We haven’t yet tackled the record/replay system or LLVM IR as required by the taint system.

If you’re interested in joining the discussion with the QEMU developers, our latest patch series is at the following link. So far the discussion has only been between us and the maintainer, so other perspectives might be appreciated.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg926042.html

Unfortunately our funding for this work runs out in a few days so we’re not sure what the future of this will be, but the work we've done so far is available here, and if upstream merges the QPP stuff we’ll try getting some of our API changes and plugins upstreamed too

Seems the patches were largely ignored. I suspect it was because developers were focused on 8.0 release. Now, when the merge window is open again, probably it makes sense to resend them? Or ping at the very least.

EDITED, nevermind, I see there was some initial feedback that remained unaddressed. Then it makes sense to send the second version of patches: https://lore.kernel.org/qemu-devel/20221213213757.4123265-1-fasano@mit.edu/

@XVilka
Copy link

XVilka commented Aug 22, 2023

@AndrewFasano you might be interested in this patch series, which adds reading registers from TCG plugins in a clean way (partially reviewed and approved): https://lore.kernel.org/qemu-devel/20230912071206.30751-1-akihiko.odaki@daynix.com/T/#t

@AndrewFasano
Copy link
Member Author

Thanks for pointing that out, I think we could use that API to provide some example plugins to upstream with our PR! Should help a bunch and maybe we'll finally be able to get our initial changes merged :)

@cctv130
Copy link

cctv130 commented Oct 23, 2023

@AndrewFasano I am dazzled by a large number of branches and intricate project structure, and I have lost confidence in submitting code for this project. Can we optimize this project structure, separate qemu, and keep pure panda?

@AndrewFasano
Copy link
Member Author

@cctv130 I'd love to refactor PANDA to move things like the plugins into a different repo from the customized emulator code. I think it be easier to maintain the emulation code if all the plugin code/commits were kept separately. You can see a proposal I wrote in #947 and chime in there if you have any more specific suggestions. Perhaps we'll re-open the issue if there's sufficient interest. But for now, none of the other contributors to the project got on board with the idea so we haven't pursued it yet and the issue went stale. I'm hoping with our ongoing qemu port we'll do a better job keeping plugins independent from the core emulator logic.

In terms of branches, the only one you'd need to look at or think about is the main branch, dev. With the exception of changes we've made to the emulator logic, almost all PANDA code in the repo lives in the panda subdirectory.

@cctv130
Copy link

cctv130 commented Oct 24, 2023

@AndrewFasano My suggestion is to start chat rooms such as telegram so that we can discuss problems together. I randomly looked at the submission records of the top contributors. in the last year, most contributors were very active in the initial period of panda, but it seems that panda has been abandoned in the past year. The reason for giving up is clear, if we follow the tcg route and tcg has no obstacles, I think we can immediately remove the messy branches and update the project structure. Thus giving a clear route to the new panda, attracting more developers to develop panda will not be a problem.

@cctv130
Copy link

cctv130 commented Oct 25, 2023

@AndrewFasano
I compared the panda and qemu 8.1 versions by deleting the panda folder under the panda project directory, and then directly searching for the original panda-related files. Although the two versions are very different, they have many similarities, which means that no matter how new qemu version is, The bottom layer of qemu basically changes little, even if there are changes, the idea of implementation is still the same, as others say that the new version has changed a lot, I also admit, but in the face of panda bottom layer, only a few qemu core calls, do not need to consider other qemu changes.
1698240773526

@AndrewFasano
Copy link
Member Author

Superseded by #1383

@AndrewFasano AndrewFasano unpinned this issue Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests