Skip to content

hail0hydra/exploit-development-resources

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploit-development

A repository for exploit development learners.
Pre-requisites: C programming and Operating System basics

Exploit:
An exploit is a piece of software, a chunk of data, or a sequence of commands that takes advantage of a bug or vulnerability in order to cause unintended or unanticipated behavior to occur on computer software.

First of all, an exploit developer should know how a computer executes binaries. Initially, a programmer writes a code which is human readable.
A Computer does not understand the code directly, thats why compilers help us to convert the human readable code to machine code.

The compilation process:
Human readable code => Preprocessor => Preprocessed code
Preprocessed code => Compiler => Assembly code
Assembly code => Assembler => Binary
Binary => Linkers => Linked binary

Compilation process

i) A programmer writes a C program (let's assume a hello world program in C). You may notice #include< stdio.h > in the first line. It means the programmer is importing all the functions in standard input output library and so he/she can use the printf function to print hello world on the screen. So the stdio.h is the preprocessor here and it should be included in our program to work properly since the programmer does not define the printf function, he just uses the ready made function printf which is defined in stdio.h. The process of including the preprocessor headers in the programmer's code is called preprocessing.
Note: You can force the compiler to just preprocess your code by -E flag in gcc.
ii) The next step is to compile the preprocessed code into assembly instructions. You can do this by -S flag in gcc (gcc -S yourprogram.c). A file(with extension s) will be created and that is your assembly version of hello world.

gcc_to_compile

iii) The next step is the assemble the compiled file into a relocatable file and this can be done by -c flag in gcc (gcc -c yourprogram.c). A file(with extension o will be created).

gcc_to_assemble

iv) The final step is that the linker links our program with various object files along with libraries into an executable file.
The default action of gcc is all these steps. (gcc yourprogram.c -o yourprogram)

Registers:
A register is a high speed storage area inside the processor. The CPU uses the registers to store data and perform operations such as addition, subtraction with stored values. In x86 architecture, the list of registers are eax(accumulator), ebx(base), ecx(counter), edx(data), esi(source index), edi(source destination), esp(stack pointer) , ebp(base pointer), eip(instruction pointer). The size of the register is measured in bits. x86 machines have 32 bit length of registers.

Some units to know:
Bit = Binary digit (it is either one or zero)
Byte = 8 bits
Word = The length of the word depends on architecture of the CPU (For instance, 32 bit CPU has word length of 32 bits)
Dword = 2 words (double word)
Qword = 4 words (quad word)

Endianness:
Endianness is primarily expressed in little-endian or big-endian. A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address. Endianness may also be used to describe the order in which the bits are transmitted over a communication channel, e.g., big-endian in a communications channel transmits the most significant bits first.

Little Endian Big Endian

The stack:
A stack is a datastructure which follows Last In First Out(LIFO) and the possible operations in stack are push and pop. It is the area of space allocated by the operating system in the memory when the binary is running for storage purposes.
We can either push a value to the top of the stack or pop a value from the top of the stack. A special register esp(Stack Pointer) always point to the top of the stack or in other words, the value of esp is always the address of the top element in the stack.

The stack

Assembly language:
Assembly language is a low-level, architecuture dependent programming language. It is a set of instructions(called pneumonics) which is executed in a sequential order. An exploit developer should atleast have an idea of what an assembly language looks like and how it works. The assembly code is converted into machine code with assemblers.

To know about the most common instructions required for an exploit developer, refer this assembly instructions pdf

Calling conventions:
Note: To exploit buffer overflows, the calling conventions should be understood thoroughly.
CDECL:
i) Arguments are passed on the stack in Right-to-Left order, and return values are passed in eax. ii) The calling function cleans the stack. This allows CDECL functions to have variable-length argument lists (aka variadic functions). For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used.

STDCALL:
STDCALL, also known as "WINAPI" (and a few other names, depending on where you are reading it) is used almost exclusively by Microsoft as the standard calling convention for the Win32 API. Since STDCALL is strictly defined by Microsoft, all compilers that implement it do it the same way.
i) STDCALL passes arguments right-to-left, and returns the value in eax. (The Microsoft documentation erroneously claimed that arguments are passed left-to-right, but this is not the case.)
ii) The called function cleans the stack, unlike CDECL. This means that STDCALL doesn't allow variable-length argument lists.

By default gcc uses Cdecl.
CDECL function call in detail:
Let's assume currently the CPU is executing instructions in a function named bar and we are calling foo(1, 2, 3);

push 3
push 2
push 1
call _foo
add esp,0xc

cdecl calling convention


One chunk of memory contains the stack frame, which we’re already familiar with. The EBP and ESP registers point to the bottom and top of the stack frame, respectively, so the processor can figure out where the stack is.
Another chunk of memory, which we haven’t talked about yet, contains the CPU instructions being executed. The EIP register contains the memory address of the current instruction. To advance to the next instruction, the CPU just increments EIP. The call instruction, and all the jump instructions we’ve already encountered, work by manipulating EIP. In these diagrams I’ll show EIP pointing to the instruction we’re about to execute.
When bar wants to call foo, the first step is putting the function arguments on the stack where foo can find them. They’re pushed onto the stack in reverse order:
push 3
push 2
push 1
Which means, now the stack looks like this:

cdecl calling convention


Next bar issues the call instruction, which does two things:
Push the address of the instruction after call (the return address) onto the stack.
Jump to _foo (by moving the address of _foo into EIP).
Now the stack looks like this:

cdecl calling convention


Okay, we’re officially in foo now. Next step is the function prologue to set up a new stack frame:
push ebp
mov ebp,esp

cdecl calling convention


Now we can execute the body of foo. We can access its parameters because they’re at a predictable location on the stack relative to EBP: ebp + 0x8, ebp + 0xc, and ebp + 0x10, respectively.
Once we’ve done some things in foo, and placed a return value in EAX, it’s time to return to bar. Except for that return value, we want everything on the stack to be exactly the same as it was before the call. The first step is to run the function epilogue to restore the old stack frame:

mov esp,ebp ; deallocate any local variables on the stack
pop ebp ; restore old EBP
The stack now looks exactly the same as it did right after the call instruction, before the function prologue. That means the return address is on top of the stack again.
Then we execute the ret instruction, which pops the top value off the stack and jumps to it unconditionally (i.e. copies it into EIP).

cdecl calling convention


Now we just have to remove the function arguments from the stack, and we’re done. No need to pop them off one by one; we can just adjust the value of ESP. add esp,0xc
Now the stack has been restored to exactly the way it was before the call, and we can proceed with the rest of bar.
And now we’re finally ready to implement the code-generation stage of the compiler!

Image Source: norasandler.com

Some useful links

Corelan.be

Opensecuritytraining.info

Securitytube.net

Massimiliano Tomassoli's blog

Samsclass.info

Securitysift.com

COURSES

Corelan

Offensive Security

SANS

VULNERABLE APPLICATIONS

Exploit-exercises.com

EXPLOITS DATABASE

About

A tutorial and resources for exploit development learners

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published