Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oxenstored patches v2 #2

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

oxenstored patches v2 #2

wants to merge 20 commits into from

Conversation

edwintorok
Copy link
Owner

No description provided.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Based on initial work by Christian Lindig

Doing oxenstored development, especially fuzzing/unit tests requires
an incremental and fast build system.

Dune is the preferred upstream build system for OCaml, and has been in
use by the XAPI project for years.
Is is incremental and also generates editor integration files (.merlin).

Usage:
./xs-reconfigure.sh
cd tools/ocaml
make clean
make check

There are some other convenience targets as well:
make dune-clean
make dune-syntax-check
make dune-build-oxenstored

There are some files that are generated by Make, these are created
by a 'dune-pre' target, they are too closely tied to make and
cannot yet be generated by Dune itself.

The various Makefile targets are used as entrypoints into Dune
that set the needed env vars (for C include files and libraries)
and ensure that the generated files are available.

The unit tests do not require Xen to be available, so add mock
eventchn and xenctrl libraries for the unit test to use,
and copy the non-system specific modules from xenstored/ to
xenstored/test/.

Xenstored had to be split into Xenstored and Xenstored_main,
so that we can use the functions defined in Xenstored without
actually starting up the daemon in a unit test.
Similarly argument parsing had to be delayed until after daemon startup.

Also had to disable setrlimit when running as non-root in poll.ml.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
To run the unit tests these dependencies need to be available.
The developer can either install them themselves using opam,
or we can add them as subdirs here.

Dune will automatically pick the libraries from the system or build
it from the subdir as needed, no changes to the dune files are needed.

The duniverse/ subdir was generated by using the 'opam monorepo' command:
https://github.com/ocamllabs/opam-monorepo

This wrote a lockfile (xen.opam.locked) containing tarball sources
and hashes, and then opam monorepo pull downloaded the sources.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
This is implemented by C xenstored as live update dump format.
oxenstored already has its own (text-based) dump format, but for
compatibility implement one compatible with C xenstored.

This will also be useful in the future for non-cooperative guest live migration.

docs/designs/xenstore-migration.md documents the format

For now this always dumps integers in big endian order, because even old
versions of OCaml have support for that.
The binary format supports both little and big endian orders, so this
should be compatible.

To dump in little endian or native endian order we would
require OCaml 4.08+.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Do not dump -1, it'll trigger an assertion, use 0xFF.. instead.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
oxenstored already had support for loading a partial dump from a text format.
Add support for the binary format too.
We no longer dump the text format, but we support loading the text format for
backwards compatibility purposes.
(a version of oxenstored supporting live-update with the old text format has been
released as part of the security series)

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
The configuration file can contain typos or various errors that could prevent
live update from succeeding (e.g. a flag only valid on a different version).
Unknown entries in the config file would be ignored on startup normally,
add a strict --config-test that live-update can use to check that the config file
is valid *for the new binary*.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Based on ideas from qcstm, implemented using Crowbar.

Quickcheck-style property tests that uses AFL for quickly
exploring various values that trigger bugs in the code.

This is structured/guided fuzzing: we read an arbitrary random number,
and use it to generate some valid looking xenstore trees and commands.

There are 2 instances of xenstored: one that runs the live update
command, and one that ignores it.
Live-update should be a no-op wrt to xenstored state: this is our
quicheck property.

When any mismatch is identified it prints the input
(tree+xenstore commands), and a diff of the output:
the internal xenstore tree state + quotas.

afl-cmin can be used to further minimize the testcase.
Crowbar (AFL persistent mode Quickcheck integration) is used due to
speed: this very easily gets us a multi-core parallelizable test.

Currently the Transaction tests fail, which is why live updates with
active transactions are rejected. These tests are commented out.

There is also some incomplete code here that attempts to find functional
bugs in xenstored by interpeting xenstore commands in a simpler way and
comparing states.

This will build the fuzzer and run it single core for sanity test:
make container-fuzz-sanity-test

This will run it multicore (requires all dependencies installed on the host,
including ocaml-bun, the multi-core AFL runner):
make dune-oxenstored-fuzz

'make check' will also run the fuzzer but with input supplied by OCaml's
random number generator, and for a very small number of iterations
(few thousand). This doesn't require any external tools (no AFL, bun).

On failure it prints a base64 encoding of the fuzzer state that can be
used to reproduce the failure instantly, which is very useful for
debugging: one can iterate on the failed fuzzer state until it is fixed,
and then run the fuzzer again to find next failure.

The unit tests here require OCaml 4.06, but the rest of the codebase
doesn't (yet).

See https://lore.kernel.org/xen-devel/cbb2742191e9c1303fdfd95feef4d829ecf33a0d.camel@citrix.com/
for previous discussion of OCaml version.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Also expose these macros in a header file that can be reused by
the upcoming grant table code.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
OCaml memory allocation functions use words as units,
unless explicitly documented otherwise.
Thus we were allocating more memory than necessary,
caml_alloc should've been called with the parameter '2',
but was called with a lot more.
To account for future changes in the struct keep using sizeof,
but round up and convert to number of words.

For OCaml 1 word = sizeof(value)

The Wsize_bsize macro converts bytes to words.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
This also handles mmap errors better by using the `uerror` helper
to raise a proper exception using `errno`.

Changed type of `len` from `int` to `size_t`: at construction time we
ensure the length is >= 0, so we can reflect this by using an unsigned
type. The type is unsigned at the C API level, and a negative integer
would just get translated to a very large unsigned number otherwise.

mmap also takes off_t and size_t, so using int64 would be more generic
here, however we only ever use this interface to map rings, so keeping
the `int` sizes is fine.
OCaml itself only uses `ints` for mapping bigarrays, and int64 for just
the offset.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
These functions can potentially take some time,
so allow other OCaml code to proceed meanwhile (if any).

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Upstream URL: https://github.com/mirage/ocaml-gnt
Mirage is part of the Xen project and the license is compatible,
copyright headers are retained.

Changes from upstream:
* cut down dependencies: dropped Lwt, replaced Io_page with Xenmmap
* only import Gnttab and not Gntshr

This is for xenstored's use only which needs a way to grant map
the xenstore ring without using xenctrl.

The gnt code is added into libs/mmap because it uses mmap_stubs.h.
Also this makes it possible to mock out gnttab in the unit tests:
replace it with code that just mmaps /dev/zero.
For the mocking to work gnt.ml needs to be in a dir other than xenstored/.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Xenmmap.mmap_interface is created from multiple places:
* via mmap(), which needs to be unmap()-ed
* xc_map_foreign_range
* xengnttab_map_grant_ref

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
This is an oxenstored port of the following C xenstored commit:
38eeb38 tools/xenstored: Drop mapping of the ring via foreign map

Now only Xenctrl.domain_getinfo remains as the last use of unstable xenctrl interface
in oxenstored.

Depends on: tools/ocaml: safer Xenmmap interface
(without it the code would build but the wrong unmap function would get
 called on domain destruction)

CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
This is a port of the following C xenstored commit
122b522 tools/xenstore: don't store domU's mfn of ring page in xenstored

Backwards compat: accept a domain dump both with and without MFN.

CC: Juergen Gross <jgross@suse.com>
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
@edwintorok edwintorok mentioned this pull request May 11, 2021
Xenmmap is only modified by the ring functions,
these functions are unused.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
end

module IntSet = Set.Make (Int)
module IntMap = Map.Make (Int)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Int means the tests will only work on 4.08+ instead of 4.06 as stated in the commit

…e tested

All the containers available in containerize on 64-bit systems were 64-bit.
32-bit builds are still semi-supported, so add an entry to use the
32-bit stretch container to test 32-bit builds on 64-bit systems.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Fix build using old version of make: it requires LIBS to be available
when referencing it.
edwintorok pushed a commit that referenced this pull request Jun 18, 2021
ASAN reported one issue when Live Updating Xenstored:

=================================================================
==873==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc194f53e0 at pc 0x555c6b323292 bp 0x7ffc194f5340 sp 0x7ffc194f5338
WRITE of size 1 at 0x7ffc194f53e0 thread T0
    #0 0x555c6b323291 in dump_state_node_perms xen/tools/xenstore/xenstored_core.c:2468
    #1 0x555c6b32746e in dump_state_special_node xen/tools/xenstore/xenstored_domain.c:1257
    #2 0x555c6b32a702 in dump_state_special_nodes xen/tools/xenstore/xenstored_domain.c:1273
    #3 0x555c6b32ddb3 in lu_dump_state xen/tools/xenstore/xenstored_control.c:521
    #4 0x555c6b32e380 in do_lu_start xen/tools/xenstore/xenstored_control.c:660
    #5 0x555c6b31b461 in call_delayed xen/tools/xenstore/xenstored_core.c:278
    #6 0x555c6b32275e in main xen/tools/xenstore/xenstored_core.c:2357
    #7 0x7f95eecf3d09 in __libc_start_main ../csu/libc-start.c:308
    #8 0x555c6b3197e9 in _start (/usr/local/sbin/xenstored+0xc7e9)

Address 0x7ffc194f53e0 is located in stack of thread T0 at offset 80 in frame
    #0 0x555c6b32713e in dump_state_special_node xen/tools/xenstore/xenstored_domain.c:1232

  This frame has 2 object(s):
    [32, 40) 'head' (line 1233)
    [64, 80) 'sn' (line 1234) <== Memory access at offset 80 overflows this variable

This is happening because the callers are passing a pointer to a variable
allocated on the stack. However, the field perms is a dynamic array, so
Xenstored will end up to read outside of the variable.

Rework the code so the permissions are written one by one in the fd.

Fixes: ed6eebf ("tools/xenstore: dump the xenstore state for live update")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
edwintorok pushed a commit that referenced this pull request Apr 28, 2022
…ning NULL

If we are in libxl_list_vcpu() and we are returning NULL, let's avoid
touching the output parameter *nr_vcpus_out, which the caller should
have initialized to 0.

The current behavior could be problematic if are creating a domain and,
in the meantime, an existing one is destroyed when we have already done
some steps of the loop. At which point, we'd return a NULL list of vcpus
but with something different than 0 as the number of vcpus in that list.
And this can cause troubles in the callers (e.g., nr_vcpus_on_nodes()),
when they do a libxl_vcpuinfo_list_free().

Crashes due to this are rare and difficult to reproduce, but have been
observed, with stack traces looking like this one:

#0  libxl_bitmap_dispose (map=map@entry=0x50) at libxl_utils.c:626
#1  0x00007fe72c993a32 in libxl_vcpuinfo_dispose (p=p@entry=0x38) at _libxl_types.c:692
#2  0x00007fe72c94e3c4 in libxl_vcpuinfo_list_free (list=0x0, nr=<optimized out>) at libxl_utils.c:1059
#3  0x00007fe72c9528bf in nr_vcpus_on_nodes (vcpus_on_node=0x7fe71000eb60, suitable_cpumap=0x7fe721df0d38, tinfo_elements=48, tinfo=0x7fe7101b3900, gc=0x7fe7101bbfa0) at libxl_numa.c:258
#4  libxl__get_numa_candidate (gc=gc@entry=0x7fe7100033a0, min_free_memkb=4233216, min_cpus=4, min_nodes=min_nodes@entry=0, max_nodes=max_nodes@entry=0, suitable_cpumap=suitable_cpumap@entry=0x7fe721df0d38, numa_cmpf=0x7fe72c940110 <numa_cmpf>, cndt_out=0x7fe721df0cf0, cndt_found=0x7fe721df0cb4) at libxl_numa.c:394
#5  0x00007fe72c94152b in numa_place_domain (d_config=0x7fe721df11b0, domid=975, gc=0x7fe7100033a0) at libxl_dom.c:209
#6  libxl__build_pre (gc=gc@entry=0x7fe7100033a0, domid=domid@entry=975, d_config=d_config@entry=0x7fe721df11b0, state=state@entry=0x7fe710077700) at libxl_dom.c:436
#7  0x00007fe72c92c4a5 in libxl__domain_build (gc=0x7fe7100033a0, d_config=d_config@entry=0x7fe721df11b0, domid=975, state=0x7fe710077700) at libxl_create.c:444
#8  0x00007fe72c92de8b in domcreate_bootloader_done (egc=0x7fe721df0f60, bl=0x7fe7100778c0, rc=<optimized out>) at libxl_create.c:1222
#9  0x00007fe72c980425 in libxl__bootloader_run (egc=egc@entry=0x7fe721df0f60, bl=bl@entry=0x7fe7100778c0) at libxl_bootloader.c:403
#10 0x00007fe72c92f281 in initiate_domain_create (egc=egc@entry=0x7fe721df0f60, dcs=dcs@entry=0x7fe7100771b0) at libxl_create.c:1159
#11 0x00007fe72c92f456 in do_domain_create (ctx=ctx@entry=0x7fe71001c840, d_config=d_config@entry=0x7fe721df11b0, domid=domid@entry=0x7fe721df10a8, restore_fd=restore_fd@entry=-1, send_back_fd=send_back_fd@entry=-1, params=params@entry=0x0, ao_how=0x0, aop_console_how=0x7fe721df10f0) at libxl_create.c:1856
#12 0x00007fe72c92f776 in libxl_domain_create_new (ctx=0x7fe71001c840, d_config=d_config@entry=0x7fe721df11b0, domid=domid@entry=0x7fe721df10a8, ao_how=ao_how@entry=0x0, aop_console_how=aop_console_how@entry=0x7fe721df10f0) at libxl_create.c:2075

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Tested-by: James Fehlig <jfehlig@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
edwintorok pushed a commit that referenced this pull request Apr 28, 2022
At the moment, Xen does not decode any of the arm64 instructions. This
means that when hsr_dabt.isv == 0, Xen cannot handle those instructions.
This will lead to Xen to abort the guests (from which those instructions
originate).

With this patch, Xen is able to decode ldr/str post indexing instructions.
These are a subset of instructions for which hsr_dabt.isv == 0.

The following instructions are now supported by Xen :-
1.      ldr     x2,    [x1],    #8
2.      ldr     w2,    [x1],    #-4
3.      ldr     x2,    [x1],    #-8
4.      ldr     w2,    [x1],    #4
5.      ldrh    w2,    [x1],    #2
6.      ldrb    w2,    [x1],    #1
7.      str     x2,    [x1],    #8
8.      str     w2,    [x1],    #-4
9.      strh    w2,    [x1],    #2
10.     strb    w2,    [x1],    #1

In the subsequent patch, decode_arm64() will get invoked when
hsr_dabt.isv == 0.

Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
edwintorok pushed a commit that referenced this pull request Jan 12, 2023
There was a question raised recently about the requirements for
checking in a patch which was originally written by one maintainer,
then picked up and modified by a second maintainer, and which they now both
agree should be checked in.

It was proposed that in that case, the following set of tags would suffice:

   Signed-off-by: First Author <...>
   Signed-off-by: Second Author <...>
   Reviewed-by: First Author <...>

The rationale was as follows:

1. The patch will be a mix of code, whose copyright is owned by the
various authors (or the companies they work for).  It's important to
keep this information around in the event, for instance, of a license
change or something else requiring knowledge of the copyright owner.

2. The Signed-off-by of the Second Author approves not only their own
code, but First Author's code; the Reviewed-by of the First Author
approves not only their own code, but the Second Author's code.  Thus
all the code has been approved by a maintainer, as well as someone who
was not the author.

In support of this, several arguments were put forward:

* We shouldn't make it harder for maintainers to get their code in
  than for non-maintainers

* The system we set up should not add pointless bureaucracy; nor
  discourage collaboration; nor encourage contributors to get around
  the rules by dropping important information.  (For instance, by
  removing the first SoB, so that the patch appears to have been
  written entirely by Second Author.)

Concerns were raised about two maintainers from the same company
colluding to get a patch in from their company; but such maintainers
could already collude, by working on the patch in secret, and posting
it publicly with only a single author's SoB, and having the other
person review it.

There's also something slightly strange about adding "Reviewed-by" to
code that you've written; but in the end you're reviewing not only the
code itself, but the final arrangement of it.  There's no need to
overcomplicate things.

Encode this in MAINTAINERS as follows:

* Refine the wording of requirement #2 in the check-in policy; such
that *each change* must have approval from someone other than *the
person who wrote it*.

* Add a paragraph explicitly stating that the multiple-SoB-approval
  system satisfies the requirements, and why.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants