Skip to content

cpp vtables Part1 Basics

jiaxw32 edited this page Nov 5, 2023 · 1 revision

C++ vtables - Part 1 - Basics

原文链接

In this mini post-series we’ll explore how clang implements vtables & RTTI. In this part we’ll start with some basic classes and later on cover multiple inheritance and virtual inheritance.

在这个小的系列文章中,我们将探索clang如何实现虚函数表和RTTI。在本部分中,我们将从一些基本类开始,并且稍后会涵盖多重继承和虚继承。

Please note that this mini-series will include some digging into the binary generated for our different pieces of code via gdb. This is somewhat low-level(ish), but I’ll do all the heavy lifting for you. I don’t believe many future posts will be this low-level.

请注意,这个小系列文章将通过gdb来深入研究我们不同代码片段生成的二进制文件。这有点低级(或者说底层),但我会为您完成所有繁重的工作。我相信未来很少有文章会如此底层。

Disclaimer: everything written here is implementation specific, may change in any future version, and should not be relied on. We look into this for educational reasons only.

免责声明:这里所写的一切都是特定实现相关的,可能会在任何未来版本中发生变化,并且不应依赖于此。我们仅出于教育目的而进行了研究。

cool, let’s start.

酷,我们开始吧。

Part 1 - vtables - Basics

Let’s examine the following code:

#include <iostream>
using namespace std;

class NonVirtualClass {
 public:
  void foo() {}
};

class VirtualClass {
 public:
  virtual void foo() {}
};

int main() {
  cout << "Size of NonVirtualClass: " << sizeof(NonVirtualClass) << endl;
  cout << "Size of VirtualClass: " << sizeof(VirtualClass) << endl;
}
$ # compile and run main.cpp
$ clang++ main.cpp && ./a.out
Size of NonVirtualClass: 1
Size of VirtualClass: 8

NonVirtualClass has a size of 1 because in C++ classes can’t have zero size. However, this is not important right now.

NonVirtualClass 的大小为1,因为在C++中类大小不能 0。然而,这现在并不重要。

VirtualClass’s size is 8 on a 64 bit machine. Why? Because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class. This post series is about their content and how they are used.

VirtualClass 在64位机器上的大小为8。为什么呢?因为它内部有一个隐藏指针指向一个虚函数表(vtable)。虚函数表是静态翻译表,每个虚拟类都会创建一张。本系列文章将介绍它们的内容以及如何使用。

To get some deeper understanding on how vtables look let’s explore the following code with gdb to find out how the memory is laid out:

为了更深入地理解虚函数表的结构,让我们通过gdb来探索以下代码,找出内存布局:

#include <iostream>

class Parent {
 public:
  virtual void Foo() {}
  virtual void FooNotOverridden() {}
};

class Derived : public Parent {
 public:
  void Foo() override {}
};

int main() {
  Parent p1, p2;
  Derived d1, d2;

  std::cout << "done" << std::endl;
}
$ # compile our code with debug symbols and start debugging using gdb
$ clang++ -std=c++14 -stdlib=libc++ -g main.cpp && gdb ./a.out
...
(gdb) # ask gdb to automatically demangle C++ symbols
(gdb) set print asm-demangle on
(gdb) set print demangle on
(gdb) # set breakpoint at main
(gdb) b main
Breakpoint 1 at 0x4009ac: file main.cpp, line 15.
(gdb) run
Starting program: /home/shmike/cpp/a.out

Breakpoint 1, main () at main.cpp:15
15	  Parent p1, p2;
(gdb) # skip to next line
(gdb) n
16	  Derived d1, d2;
(gdb) # skip to next line
(gdb) n
18	  std::cout << "done" << std::endl;
(gdb) # print p1, p2, d1, d2 - we'll talk about what the output means soon
(gdb) p p1
$1 = {_vptr$Parent = 0x400bb8 <vtable for Parent+16>}
(gdb) p p2
$2 = {_vptr$Parent = 0x400bb8 <vtable for Parent+16>}
(gdb) p d1
$3 = {<Parent> = {_vptr$Parent = 0x400b50 <vtable for Derived+16>}, <No data fields>}
(gdb) p d2
$4 = {<Parent> = {_vptr$Parent = 0x400b50 <vtable for Derived+16>}, <No data fields>}

Here’s what we learned from the above:

  • Even though the classes have no data members, there’s a hidden pointer to a vtable;
  • vtable for p1 and p2 is the same. vtables are static data per-type;
  • d1 and d2 inherit a vtable-pointer from Parent which points to Derived’s vtable;
  • All vtables point to an offset of 16 (0x10) bytes into the vtable. We’ll also discuss this later.

这是我们从上面学到的内容:

  • 尽管这些类没有数据成员,但有一个指向虚函数表(vtable)的隐藏指针;
  • p1和p2共享同一个vtable。每个类型都有自己的静态数据vtable;
  • d1和d2继承了Parent类中指向Derived类vtable的虚函数表指针;
  • 所有的vtables都指向虚函数表内偏移为16字节(0x10)处。我们稍后会详细讨论这一点。

Let’s continue with our gdb session to see the contents of the vtables. I will use the x command, which dumps memory to the screen. I ask it to print 300 bytes in hex format, starting at 0x400b40. Why this address? Because above we saw that the vtable pointer points to 0x400b50, and the symbol for that address is vtable for Derived+16 (16 == 0x10).

继续我们的gdb会话,查看vtables的内容。我将使用x命令,在屏幕上转储内存。我要求它以十六进制格式打印300个字节,从0x400b40开始。为什么是这个地址?因为在前面我们看到vtable指针指向0x400b50,并且该地址的符号是Derived+16(16 == 0x10)。

(gdb) x/300xb 0x400b40
0x400b40 <vtable for Derived>:	    0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x400b48 <vtable for Derived+8>:	0x90	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400b50 <vtable for Derived+16>:	0x80	0x0a	0x40	0x00	0x00	0x00	0x00	0x00
0x400b58 <vtable for Derived+24>:	0x90	0x0a	0x40	0x00	0x00	0x00	0x00	0x00

0x400b60 <typeinfo name for Derived>:	0x37	0x44	0x65	0x72	0x69	0x76	0x65	0x64
0x400b68 <typeinfo name for Derived+8>:	0x00	0x36	0x50	0x61	0x72	0x65	0x6e	0x74
0x400b70 <typeinfo name for Parent+7>:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00

0x400b78 <typeinfo for Parent>:	    0x90	0x20	0x60	0x00	0x00	0x00	0x00	0x00
0x400b80 <typeinfo for Parent+8>:	0x69	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400b88:	                        0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00

0x400b90 <typeinfo for Derived>:	0x10	0x22	0x60	0x00	0x00	0x00	0x00	0x00
0x400b98 <typeinfo for Derived+8>:	0x60	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400ba0 <typeinfo for Derived+16>:	0x78	0x0b	0x40	0x00	0x00	0x00	0x00	0x00

0x400ba8 <vtable for Parent>:	    0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x400bb0 <vtable for Parent+8>:	    0x78	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400bb8 <vtable for Parent+16>:	0xa0	0x0a	0x40	0x00	0x00	0x00	0x00	0x00
0x400bc0 <vtable for Parent+24>:	0x90	0x0a	0x40	0x00	0x00	0x00	0x00	0x00
...

Note: we’re looking at demangled symbols. If you really want to know, _ZTV is a prefix for vtable, _ZTS is a prefix for type-string (name) and _ZTI is for type-info.

注意:我们正在查看解码后的符号。如果你真的想知道,_ZTV 是虚函数表(vtable)的前缀,_ZTS 是类型字符串(名称)的前缀,而 _ZTI 则是类型信息(type-info)的前缀。

Here’s Parent’s vtable layout:

Address Value Meaning
0x400ba8 0x0 top_offset (more on this later)
0x400bb0 0x400b78 Pointer to typeinfo for Parent (also part of the above memory dump)
0x400bb8 0x400aa0 Pointer to Parent::Foo(). Parent’s _vptr points here.
0x400bc0 0x400a90 Pointer to Parent::FooNotOverridden()

Here’s Derived’s vtable layout:

Address Value Meaning
0x400b40 0x0 top_offset (more on this later)
0x400b48 0x400b90 Pointer to typeinfo for Derived (also part of the above memory dump)
0x400b50 0x400a80 Pointer to Derived::Foo(). Derived’s _vptr points here.
0x400b58 0x400a90 Pointer to Parent::FooNotOverridden() (same as Parent’s)
  • step1
(gdb) # find out what debug symbol we have for address 0x400aa0
(gdb) info symbol 0x400aa0
Parent::Foo() in section .text of a.out
  • step2
(gdb) info symbol 0x400a90
Parent::FooNotOverridden() in section .text of a.out
  • step3
(gdb) info symbol 0x400a80
Derived::Foo() in section .text of a.out

Remember that the vtable pointer in Derived pointed to a +16 bytes offset into the vtable? The 3rd pointer is the address of the first method pointer. Want the 3rd method? No problem - add 2 * sizeof(void*) to vtable-pointer. Want the typeinfo record? jump to the pointer before.

记住,Derived 中的 vtable 指针指向了一个偏移量为 +16 字节的位置?第三个指针是第一个方法指针的地址。想要获取第三个方法?没问题 - 将 vtable 指针加上 2 * sizeof(void*) 的大小即可。想要 typeinfo 记录?跳转到前面的指针。

Moving on - what about the typeinfo records layout?

接下来,我们来看一下 typeinfo 记录的布局如何吧。

Parent’s:

Address Value Meaning
0x400b78 0x602090 Helper class for type_info methods
0x400b80 0x400b69 String representing type name
0x400b88 0x0 0 meaning no parent typeinfo record

And here’s Derived’s typeinfo record:

Address Value Meaning
0x400b90 0x602210 Helper class for type_info methods
0x400b98 0x400b60 String representing type name
0x400ba0 0x400b78 Pointer to Parent’s typeinfo record
  • step1
(gdb) info symbol 0x602090
vtable for __cxxabiv1::__class_type_info@@CXXABI_1.3 + 16 in section .bss of a.out
  • step2
(gdb) x/s 0x400b69
0x400b69 <typeinfo name for Parent>:	"6Parent"
  • step3
(gdb) info symbol 0x602210
vtable for __cxxabiv1::__si_class_type_info@@CXXABI_1.3 + 16 in section .bss of a.out
  • step4
(gdb) x/s 0x400b60
0x400b60 <typeinfo name for Derived>:	"7Derived"

If you want to read more about __si_class_type_info you can find some info here, and also here.

如果你想了解更多关于__si_class_type_info的信息,你可以在这里找到一些资料,还有这里

This exhausts my gdb skills, and also concludes this post. I assume some people will find this too low-level, or maybe just unactionable. If so, I’d recommend skipping parts 2 and 3, jumping straight to part 4.

这是我对gdb技巧的全部了解,并且也结束了本篇文章。我猜有些人可能觉得这个内容太底层,或者可能没有实际可行性。如果是这样的话,我建议跳过第二部分和第三部分,直接进入第四部分。

Clone this wiki locally