-
Notifications
You must be signed in to change notification settings - Fork 0
lucas-rami/Kernel
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# Pebbles Kernel - Operating Systems Design and Implementation - Project 3 ## 1 Features ### 1.1 ZFOD The Pebbles Kernel does ZFOD for all frames. When a program is loaded into memory for the first time, it figures out the size of each region from the elf file and allocates necessary frames at that instant. But, when a program requests more stack_space or more user space in general through the new_pages system call, we mark the page as requested by setting bit 9(10th bit from the right) of the page table entry for this address. We also set the frame address as that of the zeroed out frame throughout the kernel which is initialized when the kernel boots up. The permissions for this page is set to read only so there is no actual frame allocation until we encounter the first write for this address. When we receive the first write for any such requested memory, a page fault occurs and we check if the page had been requested earlier by looking at bit 9 in the page table entry and if that is the case we allocate a new zeroed out frame and update the page table entry and modify the permissions to read/write. ZFOD helps us in cases where the user allocates much more memory than he actually needs. In that case, we will never have to waste time allocating and zeroing out the whole frame. We can just wait for the user to write to the memory to do that thereby deferring the expensive operation and only doing it when absolutely required. ### 1.2 Software Exception Handler The Pebbles kernel allows the user to install a software exception handler where the user gets a chance to handle any exceptions as they please. The kernel does this by giving the user a "swexn" system call which can be used to register this software exception handler. During any exception, the kernel has the responsibility of passing all the register values as it is on the exception handler stack specified by the user at the time of registration of the handler. The kernel makes the exception stack with the appropriate register values in the ureg data structure for the exception handler to use and run its logic. On a page fault, once we are sure that the address causing the page fault isn't requested by the user on an earlier instant, we create this exception stack for the registered handler and switch to running that in the user space. There, the user code handles the exception where they can reserve more frames using the syscall "new_pages" if the user program needs it. ## 2 Syscalls ### 2.1 Fork The fork system call copies the memory regions from the parent to the child task. Our kernel implementation rejects a call to fork if it is done by a task with more than one thread running. On calling fork, the invoking task creates a new kernel stack, pcb and tcb for the child thread and then proceeds to copy the whole memory regions mapped to the child thread. As all threads actually use memory above a particular threshold(USER_MEM_START) and the memory region till that address being reserved for the kernel is directly mapped, we do not want to waste page tables for this directly mapped address space. As a result, we create the page tables for this memory region only once when we are creating the first task and then on forking a new child, only the pointers to these page tables are stored in the page directories of the child thereby saving us space worth 4 page sizes for each new task. After copying the memory regions, we create a new stack for the child thread so it can start running at the exact same point as the parent but just change the value of the eax register which contains the return value of fork which is used to distinguish between a child and a parent. A key point here is that we maintain a count of free kernel frames. Before copying, we ensure that there are at least the number of frames being currently used by the parent so that we do not run out of memory later. This count includes the frames that have been requested but haven't been actually allocated along with the number of frames that the task is actually using. If we don't have enough free frames in the system, we fail the fork system call. ### 2.2 Exec The exec system call loads in a new program and overwrites the memory region being used by a current task. Our implementation reject a call to exec if it is done by a task with more than one thread running. Exec validates that the arguments(executable name and the argument vector) are actually part of the current task's memory and then loads in the program from the ELF file setting up its page tables. The next step is to create the stack for the execed task such that its main wrapper gets the argument vector passed as a parameter to the exec system call. Once we are sure exec will pass, we free the address space of the invoking task and set up the kernel stack so that we can run the execed task next. Our implementation modifies the values on the same kernel stack and after an iret, the execed task starts its execution. Exec also de-registers any software exception handlers that were registered earlier on the invoking thread. ### 2.3 Swexn The swexn system call allows the user to handle exceptions through user code. The user has to specify the handler function, the exception stack and the register values it wants the thread to have once it has exited swexn. After doing some validation on the arguments passed, the system call registers/de-registers the exception handler based on the arguments. Registering is as simple as just storing the pointers to exception stack, the exception handler and a void* parameter for the handler. If the value of ureg passed is not NULL, the system call makes sure that the thread exits with the register values as specified in the argument. ### 2.4 Wait The wait system call is used to reap and cleanup the memory being used by zombie tasks. The system call first validates the argument that it has received and then goes on to check if the current task has any zombie children. If so, the current thread deques the pcb of the zombie task and extracts the exit status and original thread id of the zombie task as the wait call should fill in the status in the argument passed if it isn't NULL and should return the thread id of the original thread of the task that exited. Once we are done extracting the information we need, we free the pcb of the reaped task and then the kernel stack completing our cleanup for the task. Before freeing the kernel stack, we ensure that we cleanup any memory that is in the garbage collector queue as the garbage collector queue might contain pointers stored on the kernel stack of the last thread. Once we are done emptying the garbage collector queue, we can safely delete the kernel stack for the last thread. If there is no zombie child present for this task, we add the current thread's tcb to the queue of waiting threads for this task and deschedule this thread blocking it until a child vanishes. Once, a thread is woken up by the child task exiting, it again repeats the drill and extracts any information it needs from the pcb of the reaped task and then proceeds to free the pcb and the kernel stack of the last thread ### 2.5 Vanish The vanish system call is called by any thread to stop executing itself. If the thread isn't the last thread for this task, the system call simply decrements the number of threads for this task, adds its tcb and the kernel stack to the garbage collector queue after freeing all the entries in the garbage collector queue. This is done so that in case wait isn't called for a long time, with each vanish system call we will at least delete the tcb and hence free up some sort of space in the kernel. After this, the tcb is deleted from the hash table and this thread is blocked by running the next thread but not putting this thread at the end of the runnable queue again. If in fact the thread calling vanish is the last thread of the task, we make the parent of all our running children as init. Then, we free up the memory being used by this task by freeing up the user space memory that this task was using. We delete everything including the page table entries and the directory by switching to init's page table directory as the kernel memory is directly mapped for all threads. Next, we delete the linked list we created for keeping track of all the memory regions allocated using new_pages. Next, we check if the current task has any zombie children present. If so, we append our zombie list to that of init so that init can reap them later. Next we take a lock on the parent of the current task and remove this task from the running children list of the parent. Then, we check if there is at least one thread of my parent task waiting to reap a task. If yes, we dequeue the tcb of the waiting thread from the waiting threads queue of the parent and make it runnable after filling in the values that it would need. Otherwise, we add ourselves to the zombie queue of our parent hoping that it will reap this task sometime in the future. ### 2.6 New_pages The new_pages system call can be used by the user to reserve more memory for itself. As our kernel does ZFOD, during new pages, we do not actually allocate a physical frame to the user to read/write data on. We have a zerof filled frame for the entire kernel and every address requested through new_pages initially has the address of this frame in its page table entry with only read permissions. This is done as many users allocate much more memory than they ever user and deferring the action of allocating and zeroing out a frame makes the implementation more efficient. During this system call, we set bit 9 in the page table entry for all addresses being requested marking the page as one that the user has asked for in advance. If we get a page fault on such addresses, we allocate the frame that the user had asked for earlier and re-run the instruction. ### 2.7 Remove_pages The remove_pages system call is used by the user to remove/deallocate memory it requested through new_pages. As only the starting address is given as a parameter to remove_pages, we have to maintain a linked list of all addresses allocated through the new_pages system call and the size of that region. We search through this linked list to find out if the argument passed to remove_pages is an actual memory region allocated through new_pages. If yes, we remove that node from the linked list and then go on to reset bit 9 and the page table entry for this address. This ensures that once remove_pages is called and then the user, tries to access the same memory address we should treat it as a valid page fault and not allocate a frame for it considering it as something that the user has requested before. ### 2.8 Page fault handler The page fault handler has been made an INTERRUPT_GATE specially because we use a lot of register values such as cr2 during our handling of the page fault and if a page fault occurs while we are handling the first page fault, we might end up in a race where the cr2 value might get overwritten. To avoid this data race, we have made page fault an INTERRUPT_GATE instead of a TRAP_GATE. In our kernel, if a page fault happens, there can be three scenarios. First, the page was requested by a new_pages system call and the user is trying to write on it. In this scenario, we allocate a frame and return from the page fault handler running the same instruction again. Second, the page was not requested by new_pages system call but we have a software exception handler installed and we call that with the proper stack and it takes care of the rest. Third, the page wasn't requested through the new_pages system call and there is no exception handler either. In this case, this is a valid page fault and we print an error message on the screen and set the status to -2 and then call vanish for this thread. ### 2.9 Sleep The sleep system calls deschedules the calling thread for at least n ticks, where n is provided by the function's caller. The kernel maintains a queue of sleeping threads, which is examined by the timer callback function on each timer interrupt. The key goal was to make this callback function runs as fast as possible, avoiding linear search in the sleeping threads queue. When a thread calls sleep(), the kernel will traverse the queue of sleeping threads and look at how much ticks remain before the timer wakes up each thread. The kernel will insert the invoking thread in the queue so that all threads closer to the HEAD have less ticks to wait than it, and so that all threads closer to the TAIl have more ticks to wait than it. Hence, at any time, the queue contains the threads in increasing number of ticks remaining before being woken up by the timer handler. Of course, this queue might be modifed bu the timer callback function, so sleep() disable interrupts before traversing the queue, seeking for the good position for the new thread. We trade this drop in performance (linear traversal with interrupts disabled) with a really fast timer callback function. The timer callback is efficient because it knows whether a thread should be woken up by looking at only one variable, which is fast and is what happens most of the time. When a thread should be woken up on a tick, the callback function needs only to look at the HEAD element in the queue of sleeping threads as it is sorted in increasing order of remaining sleeping time. ### 2.10 Readline The readline() system call reads the next line from the console and copies it into a user provided buffer. When the function is called, the kernel checks that the buffer lies within writable user-space memory, otherwise writing to it would cause a Page Fault. The invoking thread then has to lock a mutex before proceeding since each thread should waits for its turn to access the input stream. When the invoking thread is able to take the lock it fills a readline_t data structure with information about its buffer (address and length) as well as its TCB. The invoking thread then deschedules itself and yields to the keyboard consumer kernel thread. This thread reads the queue of key events and buffers read characters until a '\n' is typed in, at which point it commits the content of its own buffer into the user-provided buffer. Characters not consumed by this procedure are made available for the next call (they will be typed in the shell automatically during the next call to readline()). The keyboard consumer thread then deschedules itself and yields to the thread descheduled on readline(), which will release the lock on readline() and return from the system call.
About
From scratch implementation of a kernel for x86 architecture
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published