Skip to content

Commit

Permalink
[DFSan] Change shadow and origin memory layouts to match MSan.
Browse files Browse the repository at this point in the history
Previously on x86_64:

  +--------------------+ 0x800000000000 (top of memory)
  | application memory |
  +--------------------+ 0x700000008000 (kAppAddr)
  |                    |
  |       unused       |
  |                    |
  +--------------------+ 0x300000000000 (kUnusedAddr)
  |       origin       |
  +--------------------+ 0x200000008000 (kOriginAddr)
  |       unused       |
  +--------------------+ 0x200000000000
  |   shadow memory    |
  +--------------------+ 0x100000008000 (kShadowAddr)
  |       unused       |
  +--------------------+ 0x000000010000
  | reserved by kernel |
  +--------------------+ 0x000000000000

  MEM_TO_SHADOW(mem) = mem & ~0x600000000000
  SHADOW_TO_ORIGIN(shadow) = kOriginAddr - kShadowAddr + shadow

Now for x86_64:

  +--------------------+ 0x800000000000 (top of memory)
  |    application 3   |
  +--------------------+ 0x700000000000
  |      invalid       |
  +--------------------+ 0x610000000000
  |      origin 1      |
  +--------------------+ 0x600000000000
  |    application 2   |
  +--------------------+ 0x510000000000
  |      shadow 1      |
  +--------------------+ 0x500000000000
  |      invalid       |
  +--------------------+ 0x400000000000
  |      origin 3      |
  +--------------------+ 0x300000000000
  |      shadow 3      |
  +--------------------+ 0x200000000000
  |      origin 2      |
  +--------------------+ 0x110000000000
  |      invalid       |
  +--------------------+ 0x100000000000
  |      shadow 2      |
  +--------------------+ 0x010000000000
  |    application 1   |
  +--------------------+ 0x000000000000

  MEM_TO_SHADOW(mem) = mem ^ 0x500000000000
  SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000

Reviewed By: stephan.yichao.zhao, gbalats

Differential Revision: https://reviews.llvm.org/D104896
  • Loading branch information
browneee committed Jun 26, 2021
1 parent 2e9c75d commit 45f6d55
Show file tree
Hide file tree
Showing 13 changed files with 417 additions and 231 deletions.
36 changes: 26 additions & 10 deletions clang/docs/DataFlowSanitizerDesign.rst
Expand Up @@ -76,25 +76,41 @@ The following is the memory layout for Linux/x86\_64:
+---------------+---------------+--------------------+
| Start | End | Use |
+===============+===============+====================+
| 0x700000008000|0x800000000000 | application memory |
| 0x700000000000|0x800000000000 | application 3 |
+---------------+---------------+--------------------+
| 0x300000000000|0x700000008000 | unused |
| 0x610000000000|0x700000000000 | unused |
+---------------+---------------+--------------------+
| 0x200000008000|0x300000000000 | origin |
| 0x600000000000|0x610000000000 | origin 1 |
+---------------+---------------+--------------------+
| 0x200000000000|0x200000008000 | unused |
| 0x510000000000|0x600000000000 | application 2 |
+---------------+---------------+--------------------+
| 0x100000008000|0x200000000000 | shadow memory |
| 0x500000000000|0x510000000000 | shadow 1 |
+---------------+---------------+--------------------+
| 0x000000010000|0x100000008000 | unused |
| 0x400000000000|0x500000000000 | unused |
+---------------+---------------+--------------------+
| 0x000000000000|0x000000010000 | reserved by kernel |
| 0x300000000000|0x400000000000 | origin 3 |
+---------------+---------------+--------------------+
| 0x200000000000|0x300000000000 | shadow 3 |
+---------------+---------------+--------------------+
| 0x110000000000|0x200000000000 | origin 2 |
+---------------+---------------+--------------------+
| 0x100000000000|0x110000000000 | unused |
+---------------+---------------+--------------------+
| 0x010000000000|0x100000000000 | shadow 2 |
+---------------+---------------+--------------------+
| 0x000000000000|0x010000000000 | application 1 |
+---------------+---------------+--------------------+

Each byte of application memory corresponds to a single byte of shadow
memory, which is used to store its taint label. As for LLVM SSA
registers, we have not found it necessary to associate a label with
each byte or bit of data, as some other tools do. Instead, labels are
memory, which is used to store its taint label. We map memory, shadow, and
origin regions to each other with these masks and offsets:

* shadow_addr = memory_addr ^ 0x500000000000

* origin_addr = shadow_addr + 0x100000000000

As for LLVM SSA registers, we have not found it necessary to associate a label
with each byte or bit of data, as some other tools do. Instead, labels are
associated directly with registers. Loads will result in a union of
all shadow labels corresponding to bytes loaded, and stores will
result in a copy of the label of the stored value to the shadow of all
Expand Down
232 changes: 184 additions & 48 deletions compiler-rt/lib/dfsan/dfsan.cpp
Expand Up @@ -64,30 +64,34 @@ int __dfsan_get_track_origins() {

// On Linux/x86_64, memory is laid out as follows:
//
// +--------------------+ 0x800000000000 (top of memory)
// | application memory |
// +--------------------+ 0x700000008000 (kAppAddr)
// | |
// | unused |
// | |
// +--------------------+ 0x300000000000 (kUnusedAddr)
// | origin |
// +--------------------+ 0x200000008000 (kOriginAddr)
// | unused |
// +--------------------+ 0x200000000000
// | shadow memory |
// +--------------------+ 0x100000008000 (kShadowAddr)
// | unused |
// +--------------------+ 0x000000010000
// | reserved by kernel |
// +--------------------+ 0x000000000000
// +--------------------+ 0x800000000000 (top of memory)
// | application 3 |
// +--------------------+ 0x700000000000
// | invalid |
// +--------------------+ 0x610000000000
// | origin 1 |
// +--------------------+ 0x600000000000
// | application 2 |
// +--------------------+ 0x510000000000
// | shadow 1 |
// +--------------------+ 0x500000000000
// | invalid |
// +--------------------+ 0x400000000000
// | origin 3 |
// +--------------------+ 0x300000000000
// | shadow 3 |
// +--------------------+ 0x200000000000
// | origin 2 |
// +--------------------+ 0x110000000000
// | invalid |
// +--------------------+ 0x100000000000
// | shadow 2 |
// +--------------------+ 0x010000000000
// | application 1 |
// +--------------------+ 0x000000000000
//
// To derive a shadow memory address from an application memory address, bits
// 45-46 are cleared to bring the address into the range
// [0x100000008000,0x200000000000). See the function shadow_for below.
//
//

// MEM_TO_SHADOW(mem) = mem ^ 0x500000000000
// SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000

extern "C" SANITIZER_INTERFACE_ATTRIBUTE
dfsan_label __dfsan_union_load(const dfsan_label *ls, uptr n) {
Expand Down Expand Up @@ -160,14 +164,12 @@ static uptr OriginAlignDown(uptr u) { return u & kOriginAlignMask; }
static dfsan_origin GetOriginIfTainted(uptr addr, uptr size) {
for (uptr i = 0; i < size; ++i, ++addr) {
dfsan_label *s = shadow_for((void *)addr);
if (!is_shadow_addr_valid((uptr)s)) {
// The current DFSan memory layout is not always correct. For example,
// addresses (0, 0x10000) are mapped to (0, 0x10000). Before fixing the
// issue, we ignore such addresses.
continue;
}
if (*s)

if (*s) {
// Validate address region.
CHECK(MEM_IS_SHADOW(s));
return *(dfsan_origin *)origin_for((void *)addr);
}
}
return 0;
}
Expand Down Expand Up @@ -317,10 +319,12 @@ static void ReverseCopyOrigin(const void *dst, const void *src, uptr size,
// operation.
static void MoveOrigin(const void *dst, const void *src, uptr size,
StackTrace *stack) {
if (!has_valid_shadow_addr(dst) ||
!has_valid_shadow_addr((void *)((uptr)dst + size)) ||
!has_valid_shadow_addr(src) ||
!has_valid_shadow_addr((void *)((uptr)src + size))) {
// Validate address regions.
if (!MEM_IS_SHADOW(shadow_for(dst)) ||
!MEM_IS_SHADOW(shadow_for((void *)((uptr)dst + size))) ||
!MEM_IS_SHADOW(shadow_for(src)) ||
!MEM_IS_SHADOW(shadow_for((void *)((uptr)src + size)))) {
CHECK(false);
return;
}
// If destination origin range overlaps with source origin range, move
Expand Down Expand Up @@ -833,8 +837,149 @@ void dfsan_clear_thread_local_state() {
}

extern "C" void dfsan_flush() {
if (!MmapFixedSuperNoReserve(ShadowAddr(), UnusedAddr() - ShadowAddr()))
Die();
const uptr maxVirtualAddress = GetMaxUserVirtualAddress();
for (unsigned i = 0; i < kMemoryLayoutSize; ++i) {
uptr start = kMemoryLayout[i].start;
uptr end = kMemoryLayout[i].end;
uptr size = end - start;
MappingDesc::Type type = kMemoryLayout[i].type;

if (type != MappingDesc::SHADOW && type != MappingDesc::ORIGIN)
continue;

// Check if the segment should be mapped based on platform constraints.
if (start >= maxVirtualAddress)
continue;

if (!MmapFixedSuperNoReserve(start, size, kMemoryLayout[i].name)) {
Printf("FATAL: DataFlowSanitizer: failed to clear memory region\n");
Die();
}
}
}

// TODO: CheckMemoryLayoutSanity is based on msan.
// Consider refactoring these into a shared implementation.
static void CheckMemoryLayoutSanity() {
uptr prev_end = 0;
for (unsigned i = 0; i < kMemoryLayoutSize; ++i) {
uptr start = kMemoryLayout[i].start;
uptr end = kMemoryLayout[i].end;
MappingDesc::Type type = kMemoryLayout[i].type;
CHECK_LT(start, end);
CHECK_EQ(prev_end, start);
CHECK(addr_is_type(start, type));
CHECK(addr_is_type((start + end) / 2, type));
CHECK(addr_is_type(end - 1, type));
if (type == MappingDesc::APP) {
uptr addr = start;
CHECK(MEM_IS_SHADOW(MEM_TO_SHADOW(addr)));
CHECK(MEM_IS_ORIGIN(MEM_TO_ORIGIN(addr)));
CHECK_EQ(MEM_TO_ORIGIN(addr), SHADOW_TO_ORIGIN(MEM_TO_SHADOW(addr)));

addr = (start + end) / 2;
CHECK(MEM_IS_SHADOW(MEM_TO_SHADOW(addr)));
CHECK(MEM_IS_ORIGIN(MEM_TO_ORIGIN(addr)));
CHECK_EQ(MEM_TO_ORIGIN(addr), SHADOW_TO_ORIGIN(MEM_TO_SHADOW(addr)));

addr = end - 1;
CHECK(MEM_IS_SHADOW(MEM_TO_SHADOW(addr)));
CHECK(MEM_IS_ORIGIN(MEM_TO_ORIGIN(addr)));
CHECK_EQ(MEM_TO_ORIGIN(addr), SHADOW_TO_ORIGIN(MEM_TO_SHADOW(addr)));
}
prev_end = end;
}
}

// TODO: CheckMemoryRangeAvailability is based on msan.
// Consider refactoring these into a shared implementation.
static bool CheckMemoryRangeAvailability(uptr beg, uptr size) {
if (size > 0) {
uptr end = beg + size - 1;
if (!MemoryRangeIsAvailable(beg, end)) {
Printf("FATAL: Memory range %p - %p is not available.\n", beg, end);
return false;
}
}
return true;
}

// TODO: ProtectMemoryRange is based on msan.
// Consider refactoring these into a shared implementation.
static bool ProtectMemoryRange(uptr beg, uptr size, const char *name) {
if (size > 0) {
void *addr = MmapFixedNoAccess(beg, size, name);
if (beg == 0 && addr) {
// Depending on the kernel configuration, we may not be able to protect
// the page at address zero.
uptr gap = 16 * GetPageSizeCached();
beg += gap;
size -= gap;
addr = MmapFixedNoAccess(beg, size, name);
}
if ((uptr)addr != beg) {
uptr end = beg + size - 1;
Printf("FATAL: Cannot protect memory range %p - %p (%s).\n", beg, end,
name);
return false;
}
}
return true;
}

// TODO: InitShadow is based on msan.
// Consider refactoring these into a shared implementation.
bool InitShadow(bool init_origins) {
// Let user know mapping parameters first.
VPrintf(1, "dfsan_init %p\n", &__dfsan::dfsan_init);
for (unsigned i = 0; i < kMemoryLayoutSize; ++i)
VPrintf(1, "%s: %zx - %zx\n", kMemoryLayout[i].name, kMemoryLayout[i].start,
kMemoryLayout[i].end - 1);

CheckMemoryLayoutSanity();

if (!MEM_IS_APP(&__dfsan::dfsan_init)) {
Printf("FATAL: Code %p is out of application range. Non-PIE build?\n",
(uptr)&__dfsan::dfsan_init);
return false;
}

const uptr maxVirtualAddress = GetMaxUserVirtualAddress();

for (unsigned i = 0; i < kMemoryLayoutSize; ++i) {
uptr start = kMemoryLayout[i].start;
uptr end = kMemoryLayout[i].end;
uptr size = end - start;
MappingDesc::Type type = kMemoryLayout[i].type;

// Check if the segment should be mapped based on platform constraints.
if (start >= maxVirtualAddress)
continue;

bool map = type == MappingDesc::SHADOW ||
(init_origins && type == MappingDesc::ORIGIN);
bool protect = type == MappingDesc::INVALID ||
(!init_origins && type == MappingDesc::ORIGIN);
CHECK(!(map && protect));
if (!map && !protect)
CHECK(type == MappingDesc::APP);
if (map) {
if (!CheckMemoryRangeAvailability(start, size))
return false;
if (!MmapFixedSuperNoReserve(start, size, kMemoryLayout[i].name))
return false;
if (common_flags()->use_madv_dontdump)
DontDumpShadowMemory(start, size);
}
if (protect) {
if (!CheckMemoryRangeAvailability(start, size))
return false;
if (!ProtectMemoryRange(start, size, kMemoryLayout[i].name))
return false;
}
}

return true;
}

static void DFsanInit(int argc, char **argv, char **envp) {
Expand All @@ -848,18 +993,9 @@ static void DFsanInit(int argc, char **argv, char **envp) {

InitializeFlags();

dfsan_flush();
if (common_flags()->use_madv_dontdump)
DontDumpShadowMemory(ShadowAddr(), UnusedAddr() - ShadowAddr());

// Protect the region of memory we don't use, to preserve the one-to-one
// mapping from application to shadow memory. But if ASLR is disabled, Linux
// will load our executable in the middle of our unused region. This mostly
// works so long as the program doesn't use too much memory. We support this
// case by disabling memory protection when ASLR is disabled.
uptr init_addr = (uptr)&DFsanInit;
if (!(init_addr >= UnusedAddr() && init_addr < AppAddr()))
MmapFixedNoAccess(UnusedAddr(), AppAddr() - UnusedAddr());
CheckASLR();

InitShadow(__dfsan_get_track_origins());

initialize_interceptors();

Expand Down
24 changes: 2 additions & 22 deletions compiler-rt/lib/dfsan/dfsan.h
Expand Up @@ -61,16 +61,14 @@ extern bool dfsan_init_is_running;
void initialize_interceptors();

inline dfsan_label *shadow_for(void *ptr) {
return (dfsan_label *)(((uptr)ptr) & ShadowMask());
return (dfsan_label *)MEM_TO_SHADOW(ptr);
}

inline const dfsan_label *shadow_for(const void *ptr) {
return shadow_for(const_cast<void *>(ptr));
}

inline uptr unaligned_origin_for(uptr ptr) {
return OriginAddr() - ShadowAddr() + (ptr & ShadowMask());
}
inline uptr unaligned_origin_for(uptr ptr) { return MEM_TO_ORIGIN(ptr); }

inline dfsan_origin *origin_for(void *ptr) {
auto aligned_addr = unaligned_origin_for(reinterpret_cast<uptr>(ptr)) &
Expand All @@ -82,24 +80,6 @@ inline const dfsan_origin *origin_for(const void *ptr) {
return origin_for(const_cast<void *>(ptr));
}

inline bool is_shadow_addr_valid(uptr shadow_addr) {
return (uptr)shadow_addr >= ShadowAddr() && (uptr)shadow_addr < OriginAddr();
}

inline bool has_valid_shadow_addr(const void *ptr) {
const dfsan_label *ptr_s = shadow_for(ptr);
return is_shadow_addr_valid((uptr)ptr_s);
}

inline bool is_origin_addr_valid(uptr origin_addr) {
return (uptr)origin_addr >= OriginAddr() && (uptr)origin_addr < UnusedAddr();
}

inline bool has_valid_origin_addr(const void *ptr) {
const dfsan_origin *ptr_orig = origin_for(ptr);
return is_origin_addr_valid((uptr)ptr_orig);
}

void dfsan_copy_memory(void *dst, const void *src, uptr size);

void dfsan_allocator_init();
Expand Down
8 changes: 2 additions & 6 deletions compiler-rt/lib/dfsan/dfsan_allocator.cpp
Expand Up @@ -33,15 +33,11 @@ struct DFsanMapUnmapCallback {
void OnUnmap(uptr p, uptr size) const { dfsan_set_label(0, (void *)p, size); }
};

static const uptr kAllocatorSpace = 0x700000000000ULL;
static const uptr kMaxAllowedMallocSize = 8UL << 30;

struct AP64 { // Allocator64 parameters. Deliberately using a short name.
// TODO: DFSan assumes application memory starts from 0x700000008000. For
// unknown reason, the sanitizer allocator does not support any start address
// between 0x701000000000 and 0x700000008000. After switching to fast8labels
// mode, DFSan memory layout will be changed to the same to MSan's. Then we
// set the start address to 0x700000000000 as MSan.
static const uptr kSpaceBeg = 0x701000000000ULL;
static const uptr kSpaceBeg = kAllocatorSpace;
static const uptr kSpaceSize = 0x40000000000; // 4T.
static const uptr kMetadataSize = sizeof(Metadata);
typedef DefaultSizeClassMap SizeClassMap;
Expand Down

0 comments on commit 45f6d55

Please sign in to comment.