# Getting Started

This tutorial introduces the TritonDSE library providing a high-level
easy to use interface for Triton.

<!-- also: alert-info, alert-success, alert-danger-->
<div class="alert alert-block alert-warning">
<b>Disclaimer:</b> This library is experimental. Most code has been writing toward satisfying PASTIS objectives. Thus it lacks many
features and the API is subject to changes.
</div>

Tritondse works in pure emulation so it theoretically have to model all program side-effects syscalls etc.
This is not possible so it works in a very opportunistic manner. As of now limitations are the following:

* Limited to Linux ELF binaries
* Only support a subset of libc function calls *(no C++ runtime functions)*
* No modeling of syscalls, *(they have to be modeled manually)*
* TritonDSE have the same weaknesses than Triton in model floating point, or some specific instruction sets.

## I. Loading a program

For the purpose of this tutorial any samples can be used but we are going to use
to following crackme that can be downloaded [here](figs/crackme_xor). We are first
going to create a ``Program`` object which is solely a thin wrapper on [LIEF](https://lief.quarkslab.com).

In [1]:
from tritondse import Program

p = Program("crackme_xor")

print(p.architecture, p.endianness)
print(hex(p.entry_point))

Architecture.X86_64 ENDIANNESS.LITTLE
0x400460


By the default the ``Program`` only expose few fields
required to peform the loading of the program. Thus segments
or imported functions are exposed. The main utility is being
able to retrieve a function object *(as LIEF object)*

In [7]:
p.find_function("main")

<lief.Function at 0x7f3e9c31f7b0>

In [8]:
for seg_addr, seg_data in p.memory_segments():
    print(f"0x{seg_addr:x} size:{len(seg_data)}")

0x400000 size:2036
0x600e10 size:576


To perform any specific processing using directly the LIEF Binary object it can be retrieved *(even though it is not directly exposed)*

In [9]:
p._binary

<lief.ELF.Binary at 0x7f3e9c2b8c30>

## II. Creating a process

A whole program execution state is managed by the class ``ProcessState``
which main represent a program loaded in memory backed by an underlying
``TritonContext`` object. It basically represent a process with all its
runtime data *(memory mapped, file descriptors etc..)*.

In [5]:
from tritondse import ProcessState

pstate = ProcessState(thread_scheduling=100, time_inc_coefficient=0.001)

We now have a virgin process state. We don't need to load the program
manually in the process state as the ``SymbolicExecutor`` will do it for us.

## III. Single Execution

Now we need to load the program in the process state and then start the program
at its entrypoint by providing it a concrete input that will either be injected
in *stdin* or on argv depending on the need. Hopefully everything is done transparently
by ``SymbolicExecutor``.

In [6]:
from tritondse import SymbolicExecutor, Config, Seed

config = Config()
seed = Seed(b"Hello world")

executor = SymbolicExecutor(config, pstate, p, seed)

This object is in charge of performing a single execution with
the given configuration on the process state using the given program.

The ``run`` methods will take care of loading the program, performing
the dynamic relocations (PLT stuff etc) and then to start emulating from
the entrypoint.

In [7]:
executor.run()



We now have successfully performed a single run of our program.
After execution, the ``ProcessState`` has been updated and represent
the program after execution and a ``CoverageSingleRun`` as been produced
which represent to coverage generated by the execution.

In [8]:
executor.coverage.total_instruction_executed

23

In [9]:
executor.exitcode

255

## IV. Manipulating concrete state

A process can be manipulated and modified at any time during the execution. Both the concrete state and symbolic state can be modified.

Process infos:

In [10]:
from tritondse.types import Architecture

In [11]:
print(f"arch: {pstate.architecture.name}  ptrsize:{pstate.ptr_bit_size}")

arch: X86_64  ptrsize:64


#### a. Reading, writing registers (function API)

Most of the API enables addressing register either by an enum identifier *(triton one)* or directly with theirs string.

In [12]:
pstate.registers.rax

rax:64 bv[63..0]

A ``ProcessState`` also provides some alias to access program counter, stack register, base pointer or return register in a portable way.

In [13]:
pstate.program_counter_register, \
pstate.base_pointer_register, \
pstate.stack_pointer_register, \
pstate.return_register

(rip:64 bv[63..0], rbp:64 bv[63..0], rsp:64 bv[63..0], rax:64 bv[63..0])

Then both concrete an symbolic values can be modified using a function-style API.

In [14]:
pstate.write_register(pstate.registers.rax, 0xdeadbeef)

hex(pstate.read_register(pstate.registers.rax))

'0xdeadbeef'

#### b. Reading, writing registers (Pythonic API)

To ease manipulation of the registers concrete values, the ``ProcessState`` introduces a ``cpu`` attributes that transparently updates the underlying triton context.

In [15]:
pstate.cpu.rax

3735928559

In [16]:
pstate.cpu.rax += 4

pstate.cpu.rax

3735928563

In [17]:
print(f"RIP: 0x{pstate.cpu.program_counter:x}")

RIP: 0x400489


#### c. Reading, writing memory

When manipulating memory what differs is whether we write bytes or integers.
In the case of integers, the endianess will be taken in account when reading,
writing in memory.

In [18]:
pstate.read_memory_int(p.entry_point, 4)  # Size in bytes

2303323441

In [19]:
hex(pstate.read_memory_ptr(p.entry_point))  # Read directly the size of a pointer equivalent to read_memory_int(X, pstate.ptr_size)

'0x89485ed18949ed31'

In [20]:
pstate.read_memory_bytes(p.entry_point, 8)

b'1\xedI\x89\xd1^H\x89'

The exact same functions exists for writing: `write_memory_int`, `write_memory_ptr`, `write_memory_bytes`.

A `ProcessState` object also enables checking whether an address is mapped in memory:

In [21]:
pstate.is_valid_memory_mapping(p.entry_point), pstate.is_valid_memory_mapping(0)

(True, False)

## V. Manipulating symbolic state

Both symbolic registers and symbolic memory can be manipulated in a similar fashion than the concrete state.

One should be cautious when manipulating the symbolic state to keep it consistent with the concrete state in order to remain sound.

Symbolic values can be read written with a similar API than concrete state.

In [22]:
new_sym = pstate.actx.bv(32, 64)  # new symbolic expression representing a constant

pstate.write_symbolic_register(pstate.registers.rax, new_sym)  # the expression can either be a AstNode or SymbolicExpression triton object

pstate.read_symbolic_register(pstate.registers.rax)

(define-fun ref!122 () (_ BitVec 64) (_ bv32 64)) ; assign rax: 

The same can be done on memory with `read_symbolic_memory_byte`, `read_symbolic_memory_bytes`, `read_symbolic_memory_int` and theirs equivalent for writing.

<!-- also: alert-info, alert-success, alert-danger-->
<div class="alert alert-block alert-warning">
<b>Disclaimer:</b> Writing an arbitrary symbolic value in a register or memory might break soundness, and the dependency with previous definition of the variable. In standard usage a user, is usually not supposed to modify symbolic values but rather to concretize values or adding new constraints in the path predicate.
</div>

We using the concrete value of a register *(or memory)* to produce side-effects on the system we usually have to
concretize to value in order to remain sound wrt to execution. We can do it with `concretize_register` that will
enforce the symbolic value to match the current concrete value.

In [23]:
pstate.concretize_register(pstate.registers.rax)

We also can push our own constraints in the symbolic state.

In [25]:
sym_rax = pstate.read_symbolic_register(pstate.registers.rax)

constraint = sym_rax.getAst() == 4
print(constraint)

pstate.push_constraint(constraint)

(= (_ bv32 64) (_ bv4 64))


## VI. Configuration

As seen before, a `SymbolicExecutor` takes a a `Config` object as input.
It tunes multiple parameters that will be used during execution and exploration.
These parameters are the following:


* symbolize_argv (bool): Symbolize parameters given on the command line
* symbolize_stdin (bool): Symbolize reads on ``stdin``
* pipe_stdout (bool): Pipe the program stdout to Python's stdout
* pipe_stderr (bool): Pipe the program stderr to Python's stderr
* skip_sleep_routine (bool): Whether to emulate sleeps routine or to skip it
* smt_timeout (int): Timeout for a single SMT query in milliseconds
* execution_timeout (int): Timeout of a single execution *(in secs)*
* exploration_timeout (int): Overall timeout of the exploration (in secs)*
* exploration_limit (int): Number of execution iterations. *(0 means unlimited)*
* thread_scheduling (int): Number of instructions to execute before switching to the next thread
* smt_queries_limit (int): Limit of SMT queries to perform for a single execution
* coverage_strategy (CoverageStrategy): Coverage strategy to apply for the whole exploration
* branch_solving_strategy (BranchCheckStrategy): Branch solving strategy to apply for a single execution
* debug (bool): Enable debug logging or not
* workspace (str): Workspace directory to use
* program_argv (List[str]): Concrete program argument as given on the command line
* time_inc_coefficient (float): Execution time of each instruction *(for rdtsc)*

In [26]:
c = Config()
c.symbolize_argv = True

## VII. Exploration

Now that we performed a single run, lets try to explore the program by symbolizing
`argv` to see how many different paths we are able to take.

In [1]:
from tritondse import SymbolicExplorator, SymbolicExecutor, ProcessState, Seed, Config, CoverageStrategy, Program

import logging
logging.basicConfig(level=logging.DEBUG)

# Load the program
p = Program("crackme_xor")

dse = SymbolicExplorator(Config(symbolize_argv=True, debug=True, pipe_stdout=True), p)

# create a dummy seed representing argv and add it to inputs
seed = Seed(b"./crackme AAAAAAAAAAAAAAA")
dse.add_input_seed(seed)

dse.explore()

DEBUG:root:Creating the /tmp/triton_workspace/1620994430/corpus directory
DEBUG:root:Creating the /tmp/triton_workspace/1620994430/crashes directory
DEBUG:root:Creating the /tmp/triton_workspace/1620994430/hangs directory
DEBUG:root:Creating the /tmp/triton_workspace/1620994430/worklist directory
DEBUG:root:Creating the /tmp/triton_workspace/1620994430/metadata directory
DEBUG:root:Seed 78fd4aa0744187fcda352908d6263e3b.00000019.tritondse.cov dumped [NEW]
INFO:root:Pick-up seed: 78fd4aa0744187fcda352908d6263e3b.00000019.tritondse.cov (fresh: True)
INFO:root:Initialize ProcessState with thread scheduling: 200
DEBUG:root:Loading program crackme_xor [4]
DEBUG:root:Loading 0x400000 - 0x4007f4
DEBUG:root:Loading 0x600e10 - 0x601050
DEBUG:root:Hooking puts at 0x601018
DEBUG:root:Hooking __libc_start_main at 0x601020
INFO:root:Starting emulation
DEBUG:root:Enter external routine: __libc_start_main
DEBUG:root:__libc_start_main hooked
DEBUG:root:argc = 2
DEBUG:root:argv[0] = b'./crackme'
DEBUG:r

fail


INFO:root:hit 0x400489: hlt instruction stop.
INFO:root:Emulation done [ret:0]  (time:0.02s)
INFO:root:Instructions executed: 61  symbolic branches: 1
INFO:root:Memory usage: 93.25Mb
INFO:root:Seed 78fd4aa0744187fcda352908d6263e3b generate new coverage
INFO:root:Query n°1, solve:0x004005a2 (time: 0.06s) [[92mSAT[0m]
INFO:root:New seed model ba39b0af614b34616b62e732d2cd2c3f.00000019.tritondse.cov dumped [NEW]
INFO:root:Corpus:1 Crash:0
INFO:root:Seed Scheduler: worklist:1 Coverage objectives:1  (fresh:0)
INFO:root:Coverage instruction:61 edges:0
INFO:root:Elapsed time: 0m0s

INFO:root:Pick-up seed: ba39b0af614b34616b62e732d2cd2c3f.00000019.tritondse.cov (fresh: False)
INFO:root:Initialize ProcessState with thread scheduling: 200
DEBUG:root:Loading program crackme_xor [4]
DEBUG:root:Loading 0x400000 - 0x4007f4
DEBUG:root:Loading 0x600e10 - 0x601050
DEBUG:root:Hooking puts at 0x601018
DEBUG:root:Hooking __libc_start_main at 0x601020
INFO:root:Starting emulation
DEBUG:root:Enter external

fail


INFO:root:hit 0x400489: hlt instruction stop.
INFO:root:Emulation done [ret:0]  (time:0.04s)
INFO:root:Instructions executed: 81  symbolic branches: 2
INFO:root:Memory usage: 106.82Mb
INFO:root:Corpus:2 Crash:0
INFO:root:Seed Scheduler: worklist:0 Coverage objectives:0  (fresh:0)
INFO:root:Coverage instruction:62 edges:0
INFO:root:Elapsed time: 0m0s

INFO:root:Branches reverted: 1  Branches still fail: 0
INFO:root:Total time of the exploration: 0m0s


<ExplorationStatus.IDLE: 2>

In [2]:
dse.execution_count

2

We now have completed a very simple exploration, where we covered two distincts paths.

## VIII. Workspace & Corpus

All inputs, crashes and various metadata are stored in a workspace. Unless explicitely specified
the workspace is created in */tmp/triton_workspace/[timestamp]*. If a workspace directory is given
via the `Config` this one is loaded *(which enables restarting an interrupted run)*.

The whole corpus and crashes generated shall now be available in this directory.