# Introduction
In the most traditional buffer overflow exploit cases, hackers try to change the execution
flow of the process and execute a new shell ("/bin/sh") with higher privileges by exploiting
a exploitable process with higher privileges. When they do
this, it is inevitable to call the execve syscall. The blog article will walk you
through what exactly the execve syscall do both in the *c* level and *assembly* level.
# What does execve do in *c* level
Generally speaking, you can use execve to execute a new command. The way to call it is
as follows:

```c
#include <unistd.h>

/**
 * filename: full pathname of the executable file, no PATH search is needed.
 * argv: all arguments of the command.
 * envp: leave it alone for now.
 * Both argv and envp should be terminated by a null pointer.
 * example: ls -a
 * filename = "/bin/ls", argv = {"ls", "-a", NULL}
 */
int execve(const char *filename, char *const argv[],
           char *const envp[]);
```

The final step of an attack is start a new shell, then the program in c will be
like this (we leave the argument *envp* to be NULL here for simplification):

```c
#include <stdio.h>
#include <unistd.h>

void  main()
{
        char * name[2];
        name[0] = "/bin/sh";
        name[1] = NULL;
        execve(name[0], name, NULL);
}
```

save it as *exec.c*, then compile and run:

```shell
wangxin@ubuntu:~/buffer_overflow$ gcc -o exec exec.c
wangxin@ubuntu:~/buffer_overflow$ ./exec
$
```
We successfully get a new shell.
# What does execve do in *assembly* level
We continue to take "/bin/sh" as an example. Now we wanna know what happen in the
assembly level. we use GDB, a very helpful tool, to disassembly our compiled program.
But before that, we should first focus on what's in the compile command:

```shell
gcc -m32 -mpreferred-stack-boundary=2 -fno-stack-protector -Wall -static -g -o exec exec.c
```  

The options in the compile command seems complicated. However, these options actually
is to *simplify* the disassembly program
we get in GDB. So disassembly and we get:

```
(gdb) disass main
Dump of assembler code for function main:
   0x08048bbc <+0>:	push   %ebp
   0x08048bbd <+1>:	mov    %esp,%ebp
   0x08048bbf <+3>:	sub    $0x8,%esp
   0x08048bc2 <+6>:	movl   $0x80bbf88,-0x8(%ebp)
   0x08048bc9 <+13>:	movl   $0x0,-0x4(%ebp)
   0x08048bd0 <+20>:	mov    -0x8(%ebp),%eax
   0x08048bd3 <+23>:	push   $0x0
   0x08048bd5 <+25>:	lea    -0x8(%ebp),%edx
   0x08048bd8 <+28>:	push   %edx
   0x08048bd9 <+29>:	push   %eax
   0x08048bda <+30>:	call   0x806c2d0 <execve>
   0x08048bdf <+35>:	add    $0xc,%esp
   0x08048be2 <+38>:	nop
   0x08048be3 <+39>:	leave  
   0x08048be4 <+40>:	ret    
End of assembler dump.
(gdb) disass execve
Dump of assembler code for function execve:
   0x0806c2d0 <+0>:	push   %ebx
   0x0806c2d1 <+1>:	mov    0x10(%esp),%edx
   0x0806c2d5 <+5>:	mov    0xc(%esp),%ecx
   0x0806c2d9 <+9>:	mov    0x8(%esp),%ebx
   0x0806c2dd <+13>:	mov    $0xb,%eax
   0x0806c2e2 <+18>:	call   *0x80ecab0
   0x0806c2e8 <+24>:	pop    %ebx
   0x0806c2e9 <+25>:	cmp    $0xfffff001,%eax
   0x0806c2ee <+30>:	jae    0x806fdc0 <__syscall_error>
   0x0806c2f4 <+36>:	ret    
End of assembler dump.
```
The 0x0806c2e2 <+18> in execve frame is	"call \*0x80ecab0", in which star means
this is an absolute address. What's that? It seems obscure but actually is just
a call to invoke system call, which is just execve here. Traditionally, the
assembly instruction to invoke
a system call is "int 0x80", which invoke an interrupt and 0x80 is the interrupt
number. Nowadays, this is not the case any more. Things have become more sophisticated.
Let's see what happen if we disassembly
the function in \*0x80ecab0:

```
(gdb) disass 0x80ecab0
Dump of assembler code for function _dl_sysinfo:
   0x080ecab0 <+0>:	lock in (%dx),%eax
   0x080ecab2 <+2>:	push   %es
   0x080ecab3 <+3>:	or     %al,-0x6c(%eax)
End of assembler dump.
(gdb) disass _dl_sysinfo
Dump of assembler code for function _dl_sysinfo_int80:
   0x0806edf0 <+0>:	int    $0x80
   0x0806edf2 <+2>:	ret    
End of assembler dump.
```
or see it another way:

```
(gdb) x/2x 0x80ecab0
0x80ecab0 <_dl_sysinfo>:	0x0806edf0	0x08099440
(gdb) disass 0x0806edf0
Dump of assembler code for function _dl_sysinfo_int80:
   0x0806edf0 <+0>:	int    $0x80
   0x0806edf2 <+2>:	ret    
End of assembler dump.
```
We can just regard it as "int 0x80".
>The linux kernel and glibc have a mechanism to choose between the different ways
> to invoke a system call. The kernel sets up a virtual shared library for each
> process, it's called the VDSO (virtual dynamic shared object), which you can
> see in the output of cat /proc/<pid>/maps. If the kernel is old and doesn't
> provide a VDSO, glibc provides a default implementation for \_dl_sysinfo.

```
.hidden _dl_sysinfo_int80:
int $0x80
ret
```

## Under the hood
Then it's time to go back to see what *every single instruction* has been done
before invoke the system call.

```
0x08048bbc <+0>:	push   %ebp
;Enter in the main frame, save EBP.
0x08048bbd <+1>:	mov    %esp,%ebp
;Assign EBP and point it to (new) main frame.
0x08048bbf <+3>:	sub    $0x8,%esp
;Allocate memory for char * name[2], each pointer is 4-byte long.
0x08048bc2 <+6>:	movl   $0x80bbf88,-0x8(%ebp)
;0x80bbf88 is the addr of string "/bin/sh", assign the addr to ebp-8, i.e name[0].
0x08048bc9 <+13>:	movl   $0x0,-0x4(%ebp)
;0x0 is NULL, this corresponds to name[1] = NULL.
0x08048bd0 <+20>:	mov    -0x8(%ebp),%eax
;assign the addr of "/bin/sh" in (ebp-8) to eax.

;***the following instructions is to push all the argument of execve in the***
;***stack back forward.***
0x08048bd3 <+23>:	push   $0x0
;push NULL in the stack, which corresponds to the third argument envp.  
0x08048bd5 <+25>:	lea    -0x8(%ebp),%edx
;mind: lea is to put the addr of (ebp-8) name array in edx.
0x08048bd8 <+28>:	push   %edx
;push edx in the stack.
0x08048bd9 <+29>:	push   %eax
;push addr of "/bin/sh" in the stack.
```