x86 config #76

mmcloughlin · 2023-12-09T22:18:33Z

In #75 (comment) you suggested that you've had some success using ISLA with the x86 model.

I have used isla-sail to get x86.ir from sail-x86-from-acl2.

Do you have a config/x86.toml somewhere? Or can you advise on how I might create one?

Thanks!

The text was updated successfully, but these errors were encountered:

mmcloughlin · 2023-12-09T22:51:53Z

I've been hacking around but it's clearly not doing the right thing yet.

My x86.toml is as follows. A bunch of this is copied from plain.toml and almost certainly wrong, like ifetch.

pc = "rip"

ifetch = "Read_ifetch"

read_exclusives = []
write_exclusives = []

[[toolchain]]
name = "default"
assembler = "as"
objdump = "objdump"
nm = "nm"
linker = "ld"

[mmu]
page_table_base = "0x300000"
page_size = "4096"
s2_page_table_base = "0x300000"
s2_page_size = "4096"

# This section contains the base address for loading the code for each
# thread in a litmus test, and the stride which is the distance
# between each thread in bytes. The overall range for thread memory is
# the half-open range [base,top)"
[threads]
base = "0x400000"
top = "0x500000"
stride = "0x10000"

[symbolic_addrs]
base = "0x600000"
top = "0x600000"
stride = "0x10"

[registers]
ignore = []

[registers.defaults]

# A map from register names in the litmus to Sail register specifiers
# (roughtly corresponding to l-expressions in Sail, i.e. subscripting
# R[n] and field accesses R.field are allowed.)
[registers.renames]

[reads]

[writes]

[cache_ops]

I can run:

./isla/target/debug/isla-footprint --arch sail-x86-from-acl2/model/x86.ir --config isla/configs/x86.toml -i 'and $0x1,%al' --function main -s

It appears to start doing something. But I'm guessing at a minimum I need a version of isla_main.sail for the x86 model.

Any help appreciated!

Alasdair · 2023-12-09T23:18:56Z

Yes, the isla-footprint executable tries to run the Sail function isla_footprint(opcode), which probably doesn't exist. I'll have to look and see if I have a working config anywhere.

mmcloughlin · 2023-12-10T01:54:54Z

I'll have to look and see if I have a working config anywhere.

Thanks! That would be much appreciated :)

I wanted to encapsulate the build steps and required versions for generating x86.ir, so if it's useful to anybody:

#!/usr/bin/env bash

set -euxo pipefail

OCAML_COMPILER_VERSION="4.10.0"
ISLA_VERSION="4134411c0f463807edd788f67b3db34c1899d9b6"
SAIL_VERSION="eb8af69724828181bf0e91f1728399fe8a81e6f0"
SAIL_X86_VERSION="bfc016d354d902bf78f1e2e4a28cc82dd6d5cbfe"

# Working directory.
workdir=$(mktemp -d)
cd "${workdir}"

outputs="${workdir}/outputs"
mkdir -p "${outputs}"

# Download dependencies.
function github_download() {
    local org="$1"
    local repo="$2"
    local version="$3"

    archive="https://github.com/${org}/${repo}/archive/${version}.tar.gz"
    mkdir -p "${repo}"
    wget -O- "${archive}" | tar xzf - --strip-components 1 -C "${repo}"
}

github_download "rems-project" "isla" "${ISLA_VERSION}"
github_download "rems-project" "sail" "${SAIL_VERSION}"
github_download "rems-project" "sail-x86-from-acl2" "${SAIL_X86_VERSION}"

# Setup opam.
export OPAMROOT="${workdir}/opam"
mkdir -p "${OPAMROOT}"
opam init --yes --compiler="${OCAML_COMPILER_VERSION}"
eval $(opam env)

opam repo add rems 'https://github.com/rems-project/opam-repository.git'

# Pin and install sail.
opam pin --yes add sail

# Build isla-sail.
ISLA_SAIL_PATH=$(realpath isla/isla-sail)
make -C "${ISLA_SAIL_PATH}" all
export PATH="${ISLA_SAIL_PATH}:${PATH}"

# Build x86 IR.
make -C sail-x86-from-acl2/model x86.ir
cp sail-x86-from-acl2/model/x86.ir "${outputs}/"

# Show outputs.
tree "${outputs}"

bacam · 2023-12-11T10:47:32Z

I've got one that I used for test generation, but it's on my home computer so it'll be this evening before I can look it out. I don't think there's a nice entry point for isla-footprint though, because it expects to call a single decode-and-execute function with the opcode, whereas the model interleaves fetch and decode. If you want to see the top-level structure of the model for yourself, model/main.sail is the place to start.

bacam · 2023-12-11T23:01:27Z

I pushed the one I had here, but it's really barebones.

mmcloughlin · 2023-12-13T17:43:49Z

I pushed the one I had here, but it's really barebones.

Thanks for sharing. As you say the big remaining question is how to define the isla footprint entrypoint.

I don't think there's a nice entry point for isla-footprint though, because it expects to call a single decode-and-execute function with the opcode, whereas the model interleaves fetch and decode. If you want to see the top-level structure of the model for yourself, model/main.sail is the place to start.

I am still getting up to speed with this project and trying to see if it will suit our use case, where x86 support is a critical. Are you able to give me a sense of how much of a problem the interleaved fetch and decode will be for using ISLA footprint? In particular, would that be a blocker for using the x86 model with islaris? Have you used x86 with islaris?

Thanks for the quick responses to all my issues!

Alasdair · 2023-12-13T18:03:22Z

I think a footprint function that just writes the instruction to memory before executing it would work fine in practice. The fetch and execute parts of the trace are delimited by an instruction announce event, so it might just be a case of having to filter out anything prior to the instruction announce.

mmcloughlin · 2023-12-18T17:08:02Z

I've been looking into this a bit and just wanted to check I was on the right track.

My first attempt (please don't laugh) was:

val isla_footprint : bits(32) -> unit

function isla_footprint(opcode) = {
	let ip : sbits(48) = 0x000000400000;
	write_rip(ip);
	wml32(ip, opcode);
	x86_fetch_decode_execute();
}

This fails with errors like:

...
[1  ]: Error Type("subrange_internal (cannot extract) B129 { tag: false, bits: 292, len: 16 } 23 16", SourceLoc { file: 3, line1: 704, char1: 25, line2: 704, char2: 66 })
[1  ]:   zwb @ 104
[1  ]:   zwml32 @ 56
[1  ]:   zisla_footprint @ 5
[0  ]: Error Type("subrange_internal (cannot extract) B129 { tag: false, bits: 292, len: 16 } 23 16", SourceLoc { file: 3, line1: 704, char1: 25, line2: 704, char2: 66 })
[0  ]:   zwb @ 104
[0  ]:   zwml32 @ 56
[0  ]:   zisla_footprint @ 5
...

It's not surprising to me that this didn't work. I imagine there's a bunch of memory setup required that I don't have here. I'm also not sure how the fetch step handles variable length instructions, and therefore if I need to write nop instruction bytes past the end of the 32-bit opcode here. Is there a version of this approach that would work?

I'm now trying to make this work with an approach borrowed from isla-testgen. Specifically I am part way through writing Rust code that uses ISLA libraries to:

Initialize: call initialise_64_bit_mode
Setup registers: set symbolic values for all general-purpose registers
Set initial rip value and configure memory
Setup opcode: constrain the rip read to have a fixed value
Execute: call x86_fetch_decode_execute

I don't have this finished but I'm feeling more comfortable with the APIs now, so I think I can see it through. But I wanted to make sure I'm not barking up the wrong tree.

Please let me know how you would approach this. Thanks!

mmcloughlin · 2023-12-20T16:29:10Z

Another attempt I've made at this was to setup a concrete memory range starting at INIT_PC and assign rip to a Val::Bits(B::from_u64(INIT_PC)). However, this fails very quickly. The first memory read by get_prefixes (called from x86_fetch_decode_execute) succeeds with a concrete byte, but the next memory read is symbolic and errors with Possible symbolic address overlap.

[log]: Memory range: [0x401000, 0x411000) concrete
[log]: Read: Enum(EnumMember { enum_id: EnumId { id: Name { id: 62 } }, member: 0 }) Bits(B64 { len: 64, bits: 4198400 }) I128(1) false
[log]: Read concrete: [144]
[log]: Read: Enum(EnumMember { enum_id: EnumId { id: Name { id: 62 } }, member: 0 }) Symbolic(Sym { id: 88 }) I128(1) false
[log]: Overlapping satisfiable address: Some(Bits64(B64 { len: 64, bits: 4259840 }))
[log]: Symbol 88 taints: ["Name { id: 517 }"]
[0  ]: Error BadRead("Possible symbolic address overlap")
[0  ]:   zmemi @ 16
[0  ]:   zrm_low_32 @ 14
[0  ]:   zrm_low_64 @ 5
[0  ]:   zia32e_la_to_pa_pml4_table @ 61
[0  ]:   zia32e_la_to_pa @ 128
[0  ]:   zrb @ 58
[0  ]:   zload_bytes_from_ea @ 58
[0  ]:   zrme08 @ 5
[0  ]:   zget_prefixes @ 26
[0  ]:   zx86_fetch_decode_execute @ 25
error: Bad read Possible symbolic address overlap
Model: k!35 -> #x0000000000000000
k!30 -> false
k!18 -> false
k!17 -> false

Any pointers appreciated. Thanks.

mmcloughlin · 2023-12-24T22:06:28Z

I have something that produces a trace, though I'm still not sure it's the optimal approach. I'm using a limited version of the method in isla-testgen, specifically setting up memory backed by a SMT byte array, constraining rip to a fixed address, and then constraining the read at that address to equal the desired opcode.

Beyond that, the following command-line was very helpful:

https://github.com/rems-project/isla-testgen/blob/a9759247bfdff9c9c39d95a4cfd85318e5bf50fe/c86-command#L1

Specifically it was critical to default the following registers:

[registers.defaults]
app_view = true
ms_reg = false
fault_reg = false
log_register_writes = false

In addition, it's also necessary to linearize some of the helper functions bool_to_bits, bits_to_bool, ... as listed in the c86 command-line above.

Even with simplification the trace for a simple register add ends up being quite bloated. Regardless, I'm hopeful I can get what I need from it. I'm glad to have got something apparently working, but I suspect there might be a cleaner way so I'm still interested to get your input.

Happy Christmas!

bacam · 2024-01-08T15:50:02Z

I think the reason that you're getting bloated traces is that the test generator doesn't attempt to simplify them at all, whereas footprint has several simplification options that help a lot. It probably wouldn't be too hard to add them.

Another option would be to add a mode to isla-footprint where it sets up a small concrete region of memory with the opcode in it (or a symbolic code region, which is what the test generator does to allow the PC to vary).

The definition of isla_footprint you were trying above has two problems: first, the opcodes on x86 aren't a fixed size (it appears to be trying to treat a 16-bit opcode as a 32-bit one in the error you included), and second that Isla won't use the written values by default in a subsequent read because there may be concurrent writes from another processor, or weird cache behaviour for self-modifying code.

mmcloughlin · 2024-01-08T16:15:38Z

I think the reason that you're getting bloated traces is that the test generator doesn't attempt to simplify them at all, whereas footprint has several simplification options that help a lot. It probably wouldn't be too hard to add them.

I'm running the following functions on the traces:

                simplify::remove_extra_register_fields(&mut events);
                simplify::remove_repeated_register_reads(&mut events);
                simplify::remove_unused_register_assumptions(&mut events);
                simplify::remove_unused(&mut events);
                simplify::propagate_forwards_used_once(&mut events);
                simplify::commute_extract(&mut events);
                simplify::eval(&mut events);

Would you recommend any others? Or are you referring to something else?

Another option would be to add a mode to isla-footprint where it sets up a small concrete region of memory with the opcode in it (or a symbolic code region, which is what the test generator does to allow the PC to vary).

I did try setting up a concrete region, but I got an error since the x86 decoder reads the opcode in multiple reads and the subsequent ones are symbolic. See my note above #76 (comment).

The definition of isla_footprint you were trying above has two problems: first, the opcodes on x86 aren't a fixed size (it appears to be trying to treat a 16-bit opcode as a 32-bit one in the error you included), and second that Isla won't use the written values by default in a subsequent read because there may be concurrent writes from another processor, or weird cache behaviour for self-modifying code.

Right, yes I was aware of the variable-length x86 opcodes but I had hoped that just for a quick check I would be able to test with an opcode of at most 32 bits. I think your second problem there is the bigger issue. My code at the moment uses set_client_info in a similar way to the test generator.

Alasdair · 2024-01-08T16:28:37Z

Do you have a link to any code that can reproduce the issue in that comment? I'm back from Christmas vacation, so I could look into it.

[log]: Symbol 88 taints: ["Name { id: 517 }"]

Means there is some symbolic thing that is affecting the address. I should probably make it so it prints the actual name rather than the id, but you can do isla-footprint -A <arch_file> --debug-id 517 ... and it should print the offending name. The other thing is the symbolic address range check is maybe being too conservative, like if you have a symbolic range X .. Y and a concrete range Z .. W where X < Z and W < Y it might complain.

bacam · 2024-01-08T17:37:43Z

Would you recommend any others? Or are you referring to something else?

No, you've already got them all.

Right, yes I was aware of the variable-length x86 opcodes but I had hoped that just for a quick check I would be able to test with an opcode of at most 32 bits. I think your second problem there is the bigger issue. My code at the moment uses set_client_info in a similar way to the test generator.

Padding the opcode up to 32 bits should get rid of that initial error and there is an upper bound to x86 instructions, so it could work in general. For the memory callbacks you might need to fiddle with the SmtKinds a bit - it's set up by default to avoid writing to code regions. You might be able to make all reads SmtKind::ReadData to allow that.

mmcloughlin · 2024-01-08T22:04:35Z

Do you have a link to any code that can reproduce the issue in that comment?

Sorry I can't easily recover that version of the code. I was hacking and slashing a lot to get something to work for x86. The extremely messy code I have working right now is:

https://gist.github.com/mmcloughlin/943d338e0126cdbb62f3d205f2b1c289

The config and example trace output are included in this gist too. The architecture IR file was derived from the script in #76 (comment).

(Note I have hard-coded the opcode in this version of the code. In practice the opcode is coming from another Rust library I'm working with, but I removed that to make it simpler for you to work with the code if you'd like to.)

If you have any suggestions for how to simplify or improve that code I'd be very interested.

In particular, it would also be nice to have an approach that works for all architectures. Currently I have two different Rust programs for AArch64 and x86. So a unified approach might be either:

Try to find a version of isla_footprint SAIL function that would work for x86
Adopt the same approach for AArch64 of writing the opcode to memory and setting the program counter (testgen code suggests this works)

Thanks!

Alasdair · 2024-01-09T18:17:33Z

I think I understand what the issue is now. There are a few ways I could go about addressing it so I'll have to think about which one is best.

Alasdair · 2024-01-10T16:23:44Z

I am wondering if n64_bit_modep is returning a symbolic value. Maybe some registers need to be set so it knows we are in 64-bit mode.

bacam · 2024-01-10T16:41:56Z

That should be handled by running initialise_64_bit_mode at the start, which the code in the gist appears to do.

Alasdair · 2024-01-10T17:06:18Z

I did add a commit that might help today. I think the issue is that initialise_64_bit_mode only sets the lower 12 bits of the register, so the overall value remains symbolic. From the trace:

(read-reg |msrs| nil (_ vec v0 v1 v2 v3 v4 v5 v6))
(define-const v9 (bvor (bvand v0 #xfffffffffffff000) ((_ zero_extend 52) (bvor (bvand ((_ extract 11 0) v0) #xbff) #x400)))) ; 0:0 - 0:0
(write-reg |msrs| nil (_ vec v9 v1 v2 v3 v4 v5 v6))

Then when the add_to_iptr attempts to increment the program counter it uses this register to determine part of the address and it ends up as a symbolic value.

With the commit I added today, if we have a symbolic address with only has a single satisfiable value it will treat it as concrete which might solve this issue, assuming add_to_iptr only relies on the low 12 bits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86 config #76

x86 config #76

mmcloughlin commented Dec 9, 2023

mmcloughlin commented Dec 9, 2023

Alasdair commented Dec 9, 2023

mmcloughlin commented Dec 10, 2023

bacam commented Dec 11, 2023

bacam commented Dec 11, 2023

mmcloughlin commented Dec 13, 2023

Alasdair commented Dec 13, 2023

mmcloughlin commented Dec 18, 2023 •

edited

Loading

mmcloughlin commented Dec 20, 2023

mmcloughlin commented Dec 24, 2023

bacam commented Jan 8, 2024

mmcloughlin commented Jan 8, 2024

Alasdair commented Jan 8, 2024

bacam commented Jan 8, 2024

mmcloughlin commented Jan 8, 2024 •

edited

Loading

Alasdair commented Jan 9, 2024

Alasdair commented Jan 10, 2024

bacam commented Jan 10, 2024

Alasdair commented Jan 10, 2024

x86 config #76

x86 config #76

Comments

mmcloughlin commented Dec 9, 2023

mmcloughlin commented Dec 9, 2023

Alasdair commented Dec 9, 2023

mmcloughlin commented Dec 10, 2023

bacam commented Dec 11, 2023

bacam commented Dec 11, 2023

mmcloughlin commented Dec 13, 2023

Alasdair commented Dec 13, 2023

mmcloughlin commented Dec 18, 2023 • edited Loading

mmcloughlin commented Dec 20, 2023

mmcloughlin commented Dec 24, 2023

bacam commented Jan 8, 2024

mmcloughlin commented Jan 8, 2024

Alasdair commented Jan 8, 2024

bacam commented Jan 8, 2024

mmcloughlin commented Jan 8, 2024 • edited Loading

Alasdair commented Jan 9, 2024

Alasdair commented Jan 10, 2024

bacam commented Jan 10, 2024

Alasdair commented Jan 10, 2024

mmcloughlin commented Dec 18, 2023 •

edited

Loading

mmcloughlin commented Jan 8, 2024 •

edited

Loading