# Welcome to REVEN's Analysis Python API!

This notebook demonstrates the Analysis Python API of REVEN.

You can execute the cells that contain python code using the Ctrl+Enter (execute and stay in same cell) or the Shift+Enter (execute and go to next cell) shortcuts.

Please execute the cells in order, as they are dependent of each other.

In this notebook, you can confirm if a code cell is running by looking at the `In [ ]:` on the left of the cell. If the brackets are empty, then the cell was not executed, if they contain a number (`e.g. In [24]:`), the execution is finished, and if they contain a star (e.g. `In [*]:`), then the cell is currently executing.

Once you are done running this notebook, and if this demo contains a tutorial notebook associated to the demo's trace, you can [go back to notebook selection](./) and choose the tutorial notebook.

In [None]:
# Try executing this cell with Shift+Enter or Ctrl+Enter!
print("Hello REVEN!")
2 + 40

In [None]:
# API imports

import reven2  # analysis API
import reven2.types as types   # shortcut when reading a specific type
from reven2.preview.project_manager import ProjectManager  # get access to the Project Manager

In [None]:
# You can explore the API's documentation by suffixing any object, function, ... with `?`
reven2?

# This will open an help window when you'll execute this cell

# Connecting to a server



To use the Python API, you have to connect to a *REVEN server* started on the scenario you want to analyze. To do this, you must provide the *host* and *port* of your REVEN server:

In [None]:
# Connecting to a reven server
hostname = "127.0.0.1"
port = 1337
server = reven2.RevenServer(hostname, port)
server

If you are using the Python API from the same machine than the REVEN server itself, then the host is `"localhost"` (or `127.0.0.1`), otherwise it is the address of your server. To find the port, you can go to the Analyze page for the scenario you want to connect with, and the port number will be displayed in the label above the buttons (`REVEN running on port xxxx`):

![Project manager find port in analyze](img/quasar_reven_port.png)

Alternatively, you can find the port in the Active sessions list:

![Project manager find port in sessions](img/quasar_reven_sessions.png)

Finally, if you have an Axion client connected to your REVEN server, you can find the port in the titlebar of the Axion window:

![Axion find port in title](img/axion_titlebar.png)

## Connecting to a server from the scenario's name

**NOTE:** This section only applies to REVEN enterprise edition. 

**NOTE:** This feature is not provided in the demo.

You can use a feature of the Workflow API to get a connection to a server from the scenario's name, rather than by specifying a port:

```ipython
>>> from reven2.preview.project_manager import ProjectManager
>>> pm = ProjectManager("http://localhost:8880")  # URL to the REVEN Project Manager
>>> connection = pm.connect("cve-2016-7255")  # No need to specify "13370"
>>> server = connection.server
>>> server
Reven server (localhost:13370) [connected]
```

This is useful, as the server port will typically change at each reopening of the scenario, while the scenario name remains the same.

If no server is open for that particular scenario when executing the `ProjectManager.connect ` method call, then a new one will be started.


In the demo sessions, the port will always be **1337**.

# Root object of the API, tree of objects



The `RevenServer` instance serves as the root object of the API from where you can access all the features of the API. The following diagram gives a high-level view of the Python API:

![high level diagram of the API](img/archi.png)

For instance, from there you can get the execution trace and ask for the total number of transitions in the trace:

In [None]:
# Getting the trace object
trace = server.trace
# Getting the number of transitions in the trace
trace.transition_count

# Main concepts



## Getting a point in time

As is visible in Axion, all instructions are identified by a single unique integer, called the *transition id*. The transition id starts at 0 for the first instruction in the trace, and is incremented by 1 for each consecutive instruction.

Note: We are using the term "transition" rather than "instruction" here, because technically, not all "transitions" in the trace are "instructions": when an interrupt or a fault occurs, it is also denoted by a `Transition` that changed the `Context`, although no `Instruction` was executed. Similarly, instructions that execute only partially (due to being interrupted by e.g. a pagefault) are not considered as normal `Instructions`. You can see a transition as a generalized instruction, i.e. something that modifies the context.

### Getting a transition

You can get interesting transition numbers from Axion's instruction view:

![Finding a transition in Axion](img/axion_transition.png)

In [None]:
# Getting a transition
transition = trace.transition(1234)
# Displays the transition as seen in Axion
print(transition)
# Is this transition an instruction?
print(transition.instruction is not None)

# Note that the transition you'll get going to be different from the one in the screenshot,
# since this is using the demo's trace which is likely to be different from the one used in the
# screenshot.

### Getting a context

A `Transition` represents a *change* in the trace, while a `Context` represents a *state* in the trace.

From a transition, you can get either the context before the transition was applied, or the context after the transition was applied:

In [None]:
# Comparing rip before and after executing an instruction
ctx_before = transition.context_before()
ctx_after = transition.context_after()

print("{:#x}".format(ctx_before.read(reven2.arch.x64.rip)))
print("{:#x}".format(ctx_after.read(reven2.arch.x64.rip)))

# Directly getting a context from the trace object
print(trace.context_before(0x1234) == trace.transition(0x1234).context_before())

# Getting a transition back from a context
print(transition.context_before().transition_after() == transition)

## Reading a context

A common operation on a `Context` instance is to read the state of the CPU registers as well as memory.

The API provides the `read` method on `Context`, that allows reading from a source.

### Getting a register or an address

To read from a register source, you can reference elements exposed by the `arch` package:

In [None]:
# For convenience, we recommend this import
import reven2.arch.x64 as regs

ctx = transition.context_before()
print(ctx.read(regs.rax))

print(ctx.read(regs.al))

# Are we in kernel land?
print(ctx.read(regs.cs) & 3 == 0)

To read from a source address, use the `address` module to construct addresses:

In [None]:
# Convenience import of useful types from the address module
from reven2.address import LogicalAddress, LinearAddress, PhysicalAddress

# Comparing the bytes at RIP in memory with the bytes of the instruction
rip = ctx.read(regs.rip)
instruction = transition.instruction
ctx.read(LogicalAddress(rip, regs.cs), instruction.size) == instruction.raw

*Find a transition that writes memory (for instance by looking at the instruction view in 
Axion), and try to read the manipulated memory before and after it gets written, using the API!*

In [None]:
# Fill this cell...

# Transition at which the memory is written to
mem_write_transition = trace.transition(???)
ctx_before_mem_write_transition = mem_write_transition.context_before()
ctx_after_mem_write_transition = mem_write_transition.context_after()

# Address that is written to
written_address = LogicalAddress(???)

# Size to read, in bytes
address_size = ???

print("{:#x}".format(ctx_before_mem_write_transition.read(written_address, address_size)))
print("{:#x}".format(ctx_after_mem_write_transition.read(written_address, address_size)))

### Reading as a type

The `types` package of the API provides classes and instances dedicated to the representation of data types. They allow to read a register or some memory as a specific data type.

In [None]:
# Convenience import of the types package to the root namespace
from reven2 import types

# Reading rax as various integer types
print("U8={}".format(ctx.read(regs.rax, types.U8)))
print("U16={}".format(ctx.read(regs.rax, types.U16)))
print("I8={}".format(ctx.read(regs.rax, types.I8)))

# Reading in a different endianness (default is little endian)
print("U16, big-endian={}".format(ctx.read(regs.rax, types.BigEndian(types.U16))))

*Find a String in memory (either in UTF8 or in UTF16), and then try to read this memory as a String.*

In [None]:
# Fill this cell...

# Transition number where the string is in memory
ctx_with_string = trace.context_before(???)

# Address where the string starts
string_address = LogicalAddress(???)

# The string's encoding: one of types.Encoding.Utf16 or types.Encoding.Utf8
encoding = ???

# Maximum characters to look up: if the string is not NUL-terminated, the string's size,
# otherwise, some "big enough" value (like 1000)
max_character_count = ???

ctx_with_string.read(string_address, types.CString(encoding, max_character_count=max_character_count))

In [None]:
# Run this cell after having filled the previous cell

# Reading the same memory as a small array of bytes
ctx_with_string.read(string_address, types.Array(types.U8, 4))

*Find a context where there is a pointer in some register (for instance in `rcx` or `rdx`),
then read its pointed-to value.*

In [None]:
# Fill this cell...

# Transition number where a register contains a pointer
ctx_with_ptr = trace.context_before(???)

# The register that contains the pointer (e.g., regs.rcx)
ptr_source = regs.???

# The type of the pointee object (e.g., types.U64)
pointee_type = types.???

# Dereferencing our pointer in two steps
ptr_addr = LogicalAddress(ctx_with_ptr.read(ptr_source, types.USize))
print(ctx_with_ptr.read(ptr_addr, pointee_type))

# or, dereferencing our pointer in a single step
print(ctx_with_ptr.deref(ptr_source, types.Pointer(pointee_type)))

## Identifying points of interest



One of the first tasks you need to perform during an analysis is finding an interesting point from where to start the analysis. The API provides some tools designed to identify these *points of interest*.

### Getting and using symbol information



A typical starting point for an analysis is to search points where a specific *symbol* is executed. In the API, this is done in two steps:

1. Identify the symbol in the available symbols of the trace.
2. Search for the identified symbol.

For the first step, you need to recover the OS Specific Information (OSSI) instance tied to your `RevenServer` instance:

In [None]:
# recovering the OSSI object
ossi = server.ossi

Note that for the OSSI feature to work in the API, the necessary OSSI resources must have been generated. Failure to do so may result in several of the called methods to fail with an exception. Please refer to the documentation of each method for more information.

From there you can use the methods of the `Ossi` instance to get the binaries that were executed in the trace, and all the symbols of these binaries.

Note that each of these methods, like all methods returning several results of the API, return [python generator objects](https://docs.python.org/2/library/stdtypes.html#generator-types).

In [None]:
# Getting the first binary named "ntoskrnl.exe" in the list of executed binaries in the trace
ntoskrnl = next(ossi.executed_binaries("ntoskrnl.exe"))
print(ntoskrnl)

# Getting the list of the symbols in "ntoskrnl.exe" containing "NtCreateFile"
nt_create_files = list(ntoskrnl.symbols("NtCreateFile"))
print(nt_create_files)

Once you have a symbol or a binary, you can use the search feature to look for contexts whose `rip` location matches the symbol or binary.

In [None]:
 # Getting the first context inside of the first call to `NtCreateFile` in the trace
create_file_ctx = next(trace.search.symbol(nt_create_files[0]))
print(create_file_ctx)

**Note:** If the previous cell fails with `StopIteration`, then this means that there is no call to `NtCreateFile` in this scenario! You can retry the previous cell, but looking for a different symbol (you can look in Axion for called symbols).

In [None]:
# Getting the first context executing the `ntoskrnl` binary
ntoskrnl_binary = next(ossi.executed_binaries("ntoskrnl"))
ntoskrnl_ctx = next(trace.search.binary(ntoskrnl_binary))
print(ntoskrnl_ctx)

**Note:** If the previous cell fails with `StopIteration`, then this means that `ntoskrnl` is never executed in this scenario (this would be surprising)! You can retry the previous cell, but looking for a different binary (you can look in Axion for called binary).

For any context, you can request the current OSSI location and process:

In [None]:
# Checking that the current symbol is NtCreateFile
print(create_file_ctx.ossi.location())

# Getting the current process
print(create_file_ctx.ossi.process())

**Note:** Keep in mind that when the current symbol is unknown (missing PDB, JIT code, shellcode, ...), then the `ossi.location().symbol` method can return `None`. Similarly, when the whole location is unknown, it is set to `None`.

### Searching for executed addresses in the trace

If you don't have a symbol attached to your address, you can also search for a specific address using the search function.

*Find an address that is executed in your trace (by e.g. looking in Axion), and then
find the first context executing this address using the API*

In [None]:
# Fill this cell...

# Some address that was executed in your trace (go look in Axion, e.g. 0x7ff72169c730)
address = ???

executed_ctx = next(trace.search.pc(address))
print(executed_ctx)

### Searching for strings in the trace

You can use the strings feature to search points in the trace where strings are first accessed or created.

In [None]:
# Getting the first of all the strings in the trace
first_string = next(trace.strings())
print(first_string)

# Looking for strings containing a specific substring
filtered_string = next(trace.strings(first_string.data))
print(filtered_string)

# Getting the list of memory accesses for the string
for access in first_string.memory_accesses():
     print(access)

### Custom iteration in the trace

Another way of searching interesting points is by iterating over contexts or transitions, and then looking for various information by inspecting the context or transition. 

**Beware** that if you iterate on a large portion of the trace, it may take a **very long time** to complete, so prefer the predefined search APIs that use optimized indexes whenever it is possible.

Remember that, in this notebook, you can confirm if a cell is running by looking at the `In [ ]:` on the left of the cell. If the brackets are empty, then the cell was not executed, if they contain a number (`e.g. In [24]:`), the execution is finished, and if they contain a star (e.g. `In [*]:`), then the cell is currently executing.

If a cell is taking too long to execute, you can cancel its execution by using the `Kernel >  Interrupt` menu option (or the "stop" square icon in the toolbar if displayed).



In [None]:
# Running this cell may take some time!

def find_mnemonic(trace, mnemonic, from_transition=None, to_transition=None):
    for i in range(from_transition.id if from_transition is not None else 0,
                   to_transition.id if to_transition is not None else trace.transition_count):
        t = trace.transition(i)
        if t.instruction is not None and mnemonic in t.instruction.mnemonic:
            yield t

rep_transition = next(find_mnemonic(trace, "rep"))
print(rep_transition)

Combining the predefined search APIs with manual iteration allows to iterate over a smaller portion of the trace to extract useful information:

In [None]:
# Finding all files that are created in a call to NtCreateFile
def read_filename(ctx):
    # filename is stored in a UNICODE_STRING structure,
    # which is stored inside of an object_attribute structure,
    # a pointer to which is stored as third argument (r8) to the call
    object_attribute_addr = ctx.read(regs.r8, types.USize)
    # the pointer to the unicode string is stored as third member at offset 0x10 of object_attribute
    punicode_addr = object_attribute_addr + 0x10
    unicode_addr = ctx.read(LogicalAddress(punicode_addr), types.USize)
    # the length is stored as first member of UNICODE_STRING, at offset 0x0
    unicode_length = ctx.read(LogicalAddress(unicode_addr) + 0, types.U16)
    # the buffer is stored as third member of UNICODE_STRING, at offset 0x8
    buffer_addr = ctx.read(LogicalAddress(unicode_addr) + 8, types.USize)
    filename = ctx.read(LogicalAddress(buffer_addr),
                        types.CString(encoding=types.Encoding.Utf16, max_size=unicode_length))
    return filename

for (index, ctx) in enumerate(trace.search.symbol(nt_create_files[0])):
    if index > 5:
        break
    print("{}: {}".format(ctx, read_filename(ctx)))

## Moving in the trace

Once you identified point(s) of interest, the next step in the analysis is to navigate by following data from these points.

The API provides several features that can be used to do so.

### Using the memory history

The main way to use the [*memory history*](http://doc.tetrane.com/latest/Axion/Axion-Views.html#memory-history) in the trace is to use the `Trace.memory_accesses` method. This method allows to look for the next access to some memory range, starting from a transition and in a given direction:

*Find a virtual address whose accesses you'd like to see. For instance, go look in Axion for some buffer that gets written and/or read to.*

In [None]:
# Fill the cell...

# Choosing a memory range to track
# Some transition where the target buffer is mapped
mapped_transition = trace.transition(???)
# Address of the beginning of the buffer
address = LogicalAddress(???)
# Size of the tracked buffer
size = ???

# Get the next memory access to this location from a transition where the address is mapped
next_access = next(trace.memory_accesses(address, size,
                    from_transition=mapped_transition))
print(next_access)


If you get a `StopIteration` exception after executing the cell above, it means that the selected memory buffer doesn't have any access after `mapped_transition` in this trace.

In [None]:
# You can also look in the backward direction
previous_access = next(trace.memory_accesses(address, size, 
                                             from_transition=mapped_transition,
                                             is_forward=False))
print(previous_access)
print()
# Getting all accesses to that memory range in the trace
for access in trace.memory_accesses(next_access.physical_address, size):
    print(access)

If you get a `StopIteration` exception after executing the cell above, it means that the selected memory buffer doesn't have any access before `mapped_transition` in this trace.

Note that the memory history works with physical addresses under the hood. Although it accepts virtual addresses in input, the range of virtual addresses in translated to physical ranges before querying the memory history. As a result, the vitual address range needs to mapped at the context of the translation for the call to succeed.

A secondary method to use is the `Transition.memory_accesses` method that provides all the memory accesses that occurred at a given transition.

In [None]:
# Get a list of all accesses at transition 0
[(access.virtual_address, access.size) for access in trace.transition(0).memory_accesses()]

### Using the backtrace

For any context, you can get the associated call stack by calling the `Context.stack` property:

In [None]:
# Getting the call stack
rep_ctx = rep_transition.context_before()
stack = rep_ctx.stack
print(stack)
print()

# Displaying a human-readable backtrace
print(stack)

From there, you can use the backtrace to navigate in at least two ways:

* By going back to the caller of the current frame.



In [None]:
# Finding back the caller transition if it exists
print(next(stack.frames()).creation_transition)

* By going back to the previous stack. This allows for instance to switch from kernel land to user land, or to find/skip syscalls when necessary.

In [None]:
stack.prev_stack()

# Feature overview


The following table offers a summarized comparison between widgets and features of Axion and Python API methods:




| Widget | API |
|--------|-----|
| CPU    | `Context.read` |
| Instruction view | `Transition`, `Context.ossi.location`, `Context.ossi.process` |
| Hex dump | Context.read |
| Memory History | `Trace.memory_accesses`, `Transition.memory_accesses` |
| Search | `Trace.search` |
| Backtrace| `Context.stack` |
| String | `Trace.strings` |
| Taint | Available in preview: `preview.taint` |

# This is the end of this demo!



Thank you for reading this notebook to the end! 

For further information about the Python API, you can refer to the following resources:

* We have [python API analysis scripts](https://github.com/tetrane/reven2-resources/tree/master/automation/analysis) available on [our github](https://github.com/tetrane).
* The [latest text-only version of this guide](http://doc.tetrane.com/latest/Python-API/Index.html) is available in [our documentation](http://doc.tetrane.com).
* The full [Python API reference documentation](http://doc.tetrane.com/latest/python-doc/reven2.html)

If this demo contains a specific tutorial notebook, now is the time to go back to [notebook selection](./) and choose the tutorial notebook.

In any case, feel free to modify the existing cells and to use the API!