Skip to content

13. Abstraction of Process

Jose edited this page Jun 23, 2021 · 14 revisions

In this chapter, we will see how we actually define a user process - the basic unit of an isolated running entity. The essential idea is to virtualize the CPU and allow multiple processes to time-share the CPU.

If you have studied the OSTEP book, you will notice that the order is reversed here: we first build support for virtualizing the memory (paging + heap memory allocator), then implement the support for virtualizing the CPU (user processes + scheduling). This is due to practical issues: concretizing the abstraction of processes requires a functioning virtual memory system and a way to allocating chunks of memory on the kernel heap.

Main References of This Chapter

Scan through them before going forth:

  • TODO: TODO ✭
  • TODO

The Abstraction of Process

So far, our kernel is still in its booting phase and is using the 16KiB booting stack for doing the initialization stuff. We now move towards a user-mode/kernel-mode execution pattern with the concept of processes. A process is the basic unit of a running entity in an operating system, e.g., a shell, a GUI desktop, a text dumper, a web browser, etc.

To execute a program as a process, the program gets loaded into memory from a compiled executable file (in some executable format recognizable by the OS, e.g., ELF) on some persistent storage (figure 4.1 from OSTEP by Remzi & Andrea):

FIGURE

The kernel needs to maintain the following set of information for each process in its process control block (PCB):

  • Address space: each process runs in its separate virtual address space (which could be very sparse and large); the kernel needs to allocate some space on kernel heap to hold the process's page directory/tables, and also take care of handling page faults and allocating frames for valid pages.
  • Context registers: the saved values of CPU registers the last time the process gets de-scheduled. (Explanation of time-sharing and context switching in later chapters.)
  • Kernel stack: the process normally runs in user mode, but when it wants to do something that involves shared resource (e.g., any kind of I/O such as printing to terminal, getting its process ID, allocating more memory beyond what has been mapped for it, etc.), it does a system call (software interrupt); or sometimes it gets interrupted by hardware (hardware interrupt), e.g., the timer. It changes to kernel mode execution with kernel privileges, when it uses a kernel function stack allocated somewhere on kernel heap. (Explanation of execution mode and system calls in later chapters.)
  • I/O-related information in persistence chapters.

For now, we assume that Hux only supports a single-core CPU. Running on multiple CPUs or single multiprocessors (SMPs) definitely makes it more challenging. It involves careful locking and coordination to get things right. See xv6 if interested in what a minimal SMP kernel looks like.

CPU State & Process Control Blocks (PCB)

TODO

The init Process

Most operating systems, after setting up the virtual memory system and other necessary support, will load the processor with the binary of an init process and switch to user-mode/kernel-mode execution at the point (the booting stack is no longer used from this point). The init process then creates basic user processes that provide the user interface (UI, e.g., GUI or shell) and other daemon processes. This is the time when the client gets a chance to interact with the OS, log in, and do useful stuff (forking more user processes from the shell).

Our Hux kernel does not have file system support yet. One problem thus arise: how does the kernel get the content of a piece of completely independent binary code (the compiled initcode program)? To achieve this, we have to do some tweaks to our compilation & linking process. See this Stackoverflow post on xv6 for more.

  1. Write an initcode program and compile it to binary, separately from the kernel image;
  2. In the kernel image linking process, use the -b binary flag to embed the compiled initcode binary into the kernel image (i.e., put it somewhere locatable);
  3. The linker, though not documented, exposes three symbols named _binary_<objfile>_start, _binary_<objfile>_end, and _binary_<objfile>_size. These three symbols can be extern'ed in the kernel code, so the byte array starting at _binary_initcode_start is the binary content of initcode.

The initcode Assembly

TODO

Embedding Into Kernel Image

TODO

Start Running

In our kernel, we then somehow context switch the CPU to run the binary content of initcode in user mode. For now, let's try running initcode in kernel mode. We will explain user mode and context switching in later chapters.

TODO

Progress So Far

TODO

Current repo structure:

TODO