Skip to content

From scratch implementation of a kernel for x86 architecture

Notifications You must be signed in to change notification settings

lucas-rami/Kernel

Repository files navigation

# Pebbles Kernel - Operating Systems Design and Implementation - Project 3

## 1 Features

### 1.1 ZFOD
The Pebbles Kernel does ZFOD for all frames. When a program is loaded into 
memory for the first time, it figures out the size of each region from the
elf file and allocates necessary frames at that instant. But, when a program
requests more stack_space or more user space in general through the new_pages
system call, we mark the page as requested by setting bit 9(10th bit from the 
right) of the page table entry for this address. We also set the frame address
as that of the zeroed out frame throughout the kernel which is initialized 
when the kernel boots up. The permissions for this page is set to read only
so there is no actual frame allocation until we encounter the first write 
for this address. When we receive the first write for any such requested 
memory, a page fault occurs and we check if the page had been requested 
earlier by looking at bit 9 in the page table entry and if that is the case
we allocate a new zeroed out frame and update the page table entry and 
modify the permissions to read/write.

ZFOD helps us in cases where the user allocates much more memory than he 
actually needs. In that case, we will never have to waste time allocating
and zeroing out the whole frame. We can just wait for the user to write 
to the memory to do that thereby deferring the expensive operation and 
only doing it when absolutely required. 

### 1.2 Software Exception Handler
The Pebbles kernel allows the user to install a software exception handler
where the user gets a chance to handle any exceptions as they please. The 
kernel does this by giving the user a "swexn" system call which can be 
used to register this software exception handler. During any exception,
the kernel has the responsibility of passing all the register values as
it is on the exception handler stack specified by the user at the time of
registration of the handler. The kernel makes the exception stack with the
appropriate register values in the ureg data structure for the exception
handler to use and run its logic. On a page fault, once we are sure that
the address causing the page fault isn't requested by the user on an earlier
instant, we create this exception stack for the registered handler and switch
to running that in the user space. There, the user code handles the exception
where they can reserve more frames using the syscall "new_pages" if the user
program needs it.


## 2 Syscalls

### 2.1 Fork

The fork system call copies the memory regions from the parent to the 
child task. Our kernel implementation rejects a call to fork if it is done
by a task with more than one thread running. On calling fork, the invoking
task creates a new kernel stack, pcb and tcb for the child thread and then
proceeds to copy the whole memory regions mapped to the child thread. As all
threads actually use memory above a particular threshold(USER_MEM_START) and
the memory region till that address being reserved for the kernel is directly
mapped, we do not want to waste page tables for this directly mapped address
space. As a result, we create the page tables for this memory region only once
when we are creating the first task and then on forking a new child, only the
pointers to these page tables are stored in the page directories of the child
thereby saving us space worth 4 page sizes for each new task. After copying 
the memory regions, we create a new stack for the child thread so it can start
running at the exact same point as the parent but just change the value of the
eax register which contains the return value of fork which is used to 
distinguish between a child and a parent.

A key point here is that we maintain a count of free kernel frames. Before 
copying, we ensure that there are at least the number of frames being currently
used by the parent so that we do not run out of memory later. This count 
includes the frames that have been requested but haven't been actually 
allocated along with the number of frames that the task is actually using.
If we don't have enough free frames in the system, we fail the fork system 
call.


### 2.2 Exec

The exec system call loads in a new program and overwrites the memory region
being used by a current task. Our implementation reject a call to exec if it
is done by a task with more than one thread running. Exec validates that the
arguments(executable name and the argument vector) are actually part of the
current task's memory and then loads in the program from the ELF file setting
up its page tables. The next step is to create the stack for the execed 
task such that its main wrapper gets the argument vector passed as a 
parameter to the exec system call. Once we are sure exec will pass, we free 
the address space of the invoking task and set up the kernel stack so that we
can run the execed task next. Our implementation modifies the values on the 
same kernel stack and after an iret, the execed task starts its execution.
Exec also de-registers any software exception handlers that were registered
earlier on the invoking thread.

### 2.3 Swexn

The swexn system call allows the user to handle exceptions through user code.
The user has to specify the handler function, the exception stack and the 
register values it wants the thread to have once it has exited swexn. 
After doing some validation on the arguments passed, the system call
registers/de-registers the exception handler based on the arguments. 
Registering is as simple as just storing the pointers to exception stack, 
the exception handler and a void* parameter for the handler. If the value
of ureg passed is not NULL, the system call makes sure that the thread exits
with the register values as specified in the argument. 

### 2.4 Wait

The wait system call is used to reap and cleanup the memory being used by 
zombie tasks. The system call first validates the argument that it has
received and then goes on to check if the current task has any zombie
children. If so, the current thread deques the pcb of the zombie task and 
extracts the exit status and original thread id of the zombie task as the
wait call should fill in the status in the argument passed if it isn't NULL
and should return the thread id of the original thread of the task that
exited. Once we are done extracting the information we need, we free the 
pcb of the reaped task and then the kernel stack completing our cleanup for
the task. Before freeing the kernel stack, we ensure that we cleanup any
memory that is in the garbage collector queue as the garbage collector queue 
might contain pointers stored on the kernel stack of the last thread. 
Once we are done emptying the garbage collector queue, we can safely delete
the kernel stack for the last thread. If there is no zombie child present
for this task, we add the current thread's tcb to the queue of waiting threads
for this task and deschedule this thread blocking it until a child vanishes.
Once, a thread is woken up by the child task exiting, it again repeats the 
drill and extracts any information it needs from the pcb of the reaped task 
and then proceeds to free the pcb and the kernel stack of the last thread

### 2.5 Vanish

The vanish system call is called by any thread to stop executing itself.
If the thread isn't the last thread for this task, the system call simply 
decrements the number of threads for this task, adds its tcb and the kernel
stack to the garbage collector queue after freeing all the entries in the 
garbage collector queue. This is done so that in case wait isn't called for 
a long time, with each vanish system call we will at least delete the tcb
and hence free up some sort of space in the kernel. After this, the tcb is
deleted from the hash table and this thread is blocked by running the next
thread but not putting this thread at the end of the runnable queue again.
If in fact the thread calling vanish is the last thread of the task, we
make the parent of all our running children as init. Then, we free up the
memory being used by this task by freeing up the user space memory that this
task was using. We delete everything including the page table entries and the
directory by switching to init's page table directory as the kernel memory
is directly mapped for all threads. Next, we delete the linked list we 
created for keeping track of all the memory regions allocated using 
new_pages. Next, we check if the current task has any zombie children present.
If so, we append our zombie list to that of init so that init can reap them 
later. Next we take a lock on the parent of the current task and remove
this task from the running children list of the parent. Then, we check if
there is at least one thread of my parent task waiting to reap a task. If yes,
we dequeue the tcb of the waiting thread from the waiting threads queue of the
parent and make it runnable after filling in the values that it would need.
Otherwise, we add ourselves to the zombie queue of our parent hoping that it
will reap this task sometime in the future.

### 2.6 New_pages

The new_pages system call can be used by the user to reserve more memory for
itself. As our kernel does ZFOD, during new pages, we do not actually allocate
a physical frame to the user to read/write data on. We have a zerof filled 
frame for the entire kernel and every address requested through new_pages 
initially has the address of this frame in its page table entry with only 
read permissions. This is done as many users allocate much more memory than 
they ever user and deferring the action of allocating and zeroing out a frame
makes the implementation more efficient. During this system call, we set 
bit 9 in the page table entry for all addresses being requested marking the
page as one that the user has asked for in advance. If we get a page fault
on such addresses, we allocate the frame that the user had asked for earlier
and re-run the instruction.

### 2.7 Remove_pages

The remove_pages system call is used by the user to remove/deallocate memory
it requested through new_pages. As only the starting address is given as a
parameter to remove_pages, we have to maintain a linked list of all addresses
allocated through the new_pages system call and the size of that region. We
search through this linked list to find out if the argument passed to 
remove_pages is an actual memory region allocated through new_pages. If yes,
we remove that node from the linked list and then go on to reset bit 9
and the page table entry for this address. This ensures that once remove_pages
is called and then the user, tries to access the same memory address we should
treat it as a valid page fault and not allocate a frame for it considering it
as something that the user has requested before.

### 2.8 Page fault handler

The page fault handler has been made an INTERRUPT_GATE specially because we
use a lot of register values such as cr2 during our handling of the page fault
and if a page fault occurs while we are handling the first page fault, we 
might end up in a race where the cr2 value might get overwritten. To avoid
this data race, we have made page fault an INTERRUPT_GATE instead of a 
TRAP_GATE. In our kernel, if a page fault happens, there can be three 
scenarios. First, the page was requested by a new_pages system call and the 
user is trying to write on it. In this scenario, we allocate a frame and 
return from the page fault handler running the same instruction again.
Second, the page was not requested by new_pages system call but we have a 
software exception handler installed and we call that with the proper
stack and it takes care of the rest. Third, the page wasn't requested through
the new_pages system call and there is no exception handler either. In this
case, this is a valid page fault and we print an error message on the screen
and set the status to -2 and then call vanish for this thread.

### 2.9 Sleep

The sleep system calls deschedules the calling thread for at least n ticks,
where n is provided by the function's caller. The kernel maintains a queue of 
sleeping threads, which is examined by the timer callback function on each timer
interrupt. The key goal was to make this callback function runs as fast as
possible, avoiding linear search in the sleeping threads queue.

When a thread calls sleep(), the kernel will traverse the queue of sleeping
threads and look at how much ticks remain before the timer wakes up each thread.
The kernel will insert the invoking thread in the queue so that all threads
closer to the HEAD have less ticks to wait than it, and so that all threads
closer to the TAIl have more ticks to wait than it. Hence, at any time, the
queue contains the threads in increasing number of ticks remaining before
being woken up by the timer handler. Of course, this queue might be modifed
bu the timer callback function, so sleep() disable interrupts before traversing
the queue, seeking for the good position for the new thread. We trade this
drop in performance (linear traversal with interrupts disabled) with a really
fast timer callback function. 

The timer callback is efficient because it knows whether a thread should
be woken up by looking at only one variable, which is fast and is what happens
most of the time. When a thread should be woken up on a tick, the callback 
function needs only to look at the HEAD element in the queue of sleeping
threads as it is sorted in increasing order of remaining sleeping time.

### 2.10 Readline

The readline() system call reads the next line from the console and copies
it into a user provided buffer. When the function is called, the kernel checks 
that the buffer lies within writable user-space memory, otherwise writing to
it would cause a Page Fault. The invoking thread then has to lock a mutex before
proceeding since each thread should waits for its turn to access the input stream.
When the invoking thread is able to take the lock it fills a readline_t data
structure with information about its buffer (address and length) as well as its
TCB. The invoking thread then deschedules itself and yields to the keyboard 
consumer kernel thread. This thread reads the queue of key events and buffers
read characters until a '\n' is typed in, at which point it commits the content
of its own buffer into the user-provided buffer. Characters not consumed by this 
procedure are made available for the next call (they will be typed in the shell 
automatically during the next call to readline()). The keyboard consumer thread
then deschedules itself and yields to the thread descheduled on readline(), which
will release the lock on readline() and return from the system call.



About

From scratch implementation of a kernel for x86 architecture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published