@@ -816,6 +816,203 @@ This plugin can limit the number of Instructions Per Second that are executed::
816
816
The lower the number the more accurate time will be, but the less efficient the plugin.
817
817
Defaults to ips/10
818
818
819
+ Uftrace
820
+ .......
821
+
822
+ ``contrib/plugins/uftrace.c ``
823
+
824
+ This plugin generates a binary trace compatible with
825
+ `uftrace <https://github.com/namhyung/uftrace >`_.
826
+
827
+ Plugin supports aarch64 and x64, and works in user and system mode, allowing to
828
+ trace a system boot, which is not something possible usually.
829
+
830
+ In user mode, the memory mapping is directly copied from ``/proc/self/maps `` at
831
+ the end of execution. Uftrace should be able to retrieve symbols by itself,
832
+ without any additional step.
833
+ In system mode, the default memory mapping is empty, and you can generate
834
+ one (and associated symbols) using ``contrib/plugins/uftrace_symbols.py ``.
835
+ Symbols must be present in ELF binaries.
836
+
837
+ It tracks the call stack (based on frame pointer analysis). Thus, your program
838
+ and its dependencies must be compiled using ``-fno-omit-frame-pointer
839
+ -mno-omit-leaf-frame-pointer ``. In 2024, `Ubuntu and Fedora enabled it by
840
+ default again on x64
841
+ <https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html> `_.
842
+ On aarch64, this is less of a problem, as they are usually part of the ABI,
843
+ except for leaf functions. That's true for user space applications, but not
844
+ necessarily for bare metal code. You can read this `section
845
+ <uftrace_build_system_example> ` to easily build a system with frame pointers.
846
+
847
+ When tracing long scenarios (> 1 min), the generated trace can become very long,
848
+ making it hard to extract data from it. In this case, a simple solution is to
849
+ trace execution while generating a timestamped output log using
850
+ ``qemu-system-aarch64 ... | ts "%s" ``. Then, ``uftrace --time-range=start~end ``
851
+ can be used to reduce trace for only this part of execution.
852
+
853
+ Performance wise, overhead compared to normal tcg execution is around x5-x15.
854
+
855
+ .. list-table :: Uftrace plugin arguments
856
+ :widths: 20 80
857
+ :header-rows: 1
858
+
859
+ * - Option
860
+ - Description
861
+ * - trace-privilege-level=[on|off]
862
+ - Generate separate traces for each privilege level (Exception Level +
863
+ Security State on aarch64, Rings on x64).
864
+
865
+ .. list-table :: uftrace_symbols.py arguments
866
+ :widths: 20 80
867
+ :header-rows: 1
868
+
869
+ * - Option
870
+ - Description
871
+ * - elf_file [elf_file ...]
872
+ - path to an ELF file. Use /path/to/file:0xdeadbeef to add a mapping offset.
873
+ * - --prefix-symbols
874
+ - prepend binary name to symbols
875
+
876
+ Example user trace
877
+ ++++++++++++++++++
878
+
879
+ As an example, we can trace qemu itself running git::
880
+
881
+ $ ./build/qemu-aarch64 -plugin \
882
+ build/contrib/plugins/libuftrace.so \
883
+ ./build/qemu-aarch64 /usr/bin/git --help
884
+
885
+ # and generate a chrome trace directly
886
+ $ uftrace dump --chrome | gzip > ~/qemu_aarch64_git_help.json.gz
887
+
888
+ For convenience, you can download this trace `qemu_aarch64_git_help.json.gz
889
+ <https://fileserver.linaro.org/s/N8X8fnZ5yGRZLsT/download/qemu_aarch64_git_help.json.gz> `_.
890
+ Download it and open this trace on https://ui.perfetto.dev/. You can zoom in/out
891
+ using w,a,s,d keys. Some sequences taken from this trace:
892
+
893
+ - Loading program and its interpreter
894
+
895
+ .. image :: https://fileserver.linaro.org/s/fie8JgX76yyL5cq/preview
896
+ :height: 200px
897
+
898
+ - open syscall
899
+
900
+ .. image :: https://fileserver.linaro.org/s/rsXPTeZZPza4PcE/preview
901
+ :height: 200px
902
+
903
+ - TB creation
904
+
905
+ .. image :: https://fileserver.linaro.org/s/GXY6NKMw5EeRCew/preview
906
+ :height: 200px
907
+
908
+ It's usually better to use ``uftrace record `` directly. However, tracing
909
+ binaries through qemu-user can be convenient when you don't want to recompile
910
+ them (``uftrace record `` requires instrumentation), as long as symbols are
911
+ present.
912
+
913
+ Example system trace
914
+ ++++++++++++++++++++
915
+
916
+ A full trace example (chrome trace, from instructions below) generated from a
917
+ system boot can be found `here
918
+ <https://fileserver.linaro.org/s/WsemLboPEzo24nw/download/aarch64_boot.json.gz> `_.
919
+ Download it and open this trace on https://ui.perfetto.dev/. You can see code
920
+ executed for all privilege levels, and zoom in/out using w,a,s,d keys. You can
921
+ find below some sequences taken from this trace:
922
+
923
+ - Two first stages of boot sequence in Arm Trusted Firmware (EL3 and S-EL1)
924
+
925
+ .. image :: https://fileserver.linaro.org/s/kkxBS552W7nYESX/preview
926
+ :height: 200px
927
+
928
+ - U-boot initialization (until code relocation, after which we can't track it)
929
+
930
+ .. image :: https://fileserver.linaro.org/s/LKTgsXNZFi5GFNC/preview
931
+ :height: 200px
932
+
933
+ - Stat and open syscalls in kernel
934
+
935
+ .. image :: https://fileserver.linaro.org/s/dXe4MfraKg2F476/preview
936
+ :height: 200px
937
+
938
+ - Timer interrupt
939
+
940
+ .. image :: https://fileserver.linaro.org/s/TM5yobYzJtP7P3C/preview
941
+ :height: 200px
942
+
943
+ - Poweroff sequence (from kernel back to firmware, NS-EL2 to EL3)
944
+
945
+ .. image :: https://fileserver.linaro.org/s/oR2PtyGKJrqnfRf/preview
946
+ :height: 200px
947
+
948
+ Build and run system example
949
+ ++++++++++++++++++++++++++++
950
+
951
+ .. _uftrace_build_system_example :
952
+
953
+ Building a full system image with frame pointers is not trivial.
954
+
955
+ We provide a `simple way <https://github.com/pbo-linaro/qemu-linux-stack >`_ to
956
+ build an aarch64 system, combining Arm Trusted firmware, U-boot, Linux kernel
957
+ and debian userland. It's based on containers (``podman `` only) and
958
+ ``qemu-user-static (binfmt) `` to make sure it's easily reproducible and does not depend
959
+ on machine where you build it.
960
+
961
+ You can follow the exact same instructions for a x64 system, combining edk2,
962
+ Linux, and Ubuntu, simply by switching to
963
+ `x86_64 <https://github.com/pbo-linaro/qemu-linux-stack/tree/x86_64 >`_ branch.
964
+
965
+ To build the system::
966
+
967
+ # Install dependencies
968
+ $ sudo apt install -y podman qemu-user-static
969
+
970
+ $ git clone https://github.com/pbo-linaro/qemu-linux-stack
971
+ $ cd qemu-linux-stack
972
+ $ ./build.sh
973
+
974
+ # system can be started using:
975
+ $ ./run.sh /path/to/qemu-system-aarch64
976
+
977
+ To generate a uftrace for a system boot from that::
978
+
979
+ # run true and poweroff the system
980
+ $ env INIT=true ./run.sh path/to/qemu-system-aarch64 \
981
+ -plugin path/to/contrib/plugins/libuftrace.so,trace-privilege-level=on
982
+
983
+ # generate symbols and memory mapping
984
+ $ path/to/contrib/plugins/uftrace_symbols.py \
985
+ --prefix-symbols \
986
+ arm-trusted-firmware/build/qemu/debug/bl1/bl1.elf \
987
+ arm-trusted-firmware/build/qemu/debug/bl2/bl2.elf \
988
+ arm-trusted-firmware/build/qemu/debug/bl31/bl31.elf \
989
+ u-boot/u-boot:0x60000000 \
990
+ linux/vmlinux
991
+
992
+ # inspect trace with
993
+ $ uftrace replay
994
+
995
+ Uftrace allows to filter the trace, and dump flamegraphs, or a chrome trace.
996
+ This last one is very interesting to see visually the boot process::
997
+
998
+ $ uftrace dump --chrome > boot.json
999
+ # Open your browser, and load boot.json on https://ui.perfetto.dev/.
1000
+
1001
+ Long visual chrome traces can't be easily opened, thus, it might be
1002
+ interesting to generate them around a particular point of execution::
1003
+
1004
+ # execute qemu and timestamp output log
1005
+ $ env INIT=true ./run.sh path/to/qemu-system-aarch64 \
1006
+ -plugin path/to/contrib/plugins/libuftrace.so,trace-privilege-level=on |&
1007
+ ts "%s" | tee exec.log
1008
+
1009
+ $ cat exec.log | grep 'Run /init'
1010
+ 1753122320 [ 11.834391] Run /init as init process
1011
+ # init was launched at 1753122320
1012
+
1013
+ # generate trace around init execution (2 seconds):
1014
+ $ uftrace dump --chrome --time-range=1753122320~1753122322 > init.json
1015
+
819
1016
Other emulation features
820
1017
------------------------
821
1018
0 commit comments