Skip to content

Latest commit

 

History

History
53 lines (32 loc) · 10.8 KB

overview.md

File metadata and controls

53 lines (32 loc) · 10.8 KB

Pharo Virtual Machine

In this article we present a short overview of the Virtual Machine (VM) used by Pharo. This VM presents a series of interesting capacities that highlights the power of Smalltalk. Also, it is an excellent VM to do research and to use in production thanks to its rich story and promising future.

What is a Virtual Machine?

Pharo executes on top of a virtual machine: the Pharo VM. This condition is shared with the original Smalltalk-80 and many of the Smalltalk versions that have enriched the programming language history. Running on top of a VM provides a common runtime environment for different operating systems and platforms. A VM abstracts and encapsulates the differences that exist between different machines, operating systems and platforms. This feature allows the same Pharo program to run in different machines and operating systems.
When a Pharo method is compiled it is compiled to bytecode. Bytecode is a binary representation of the program. We can see it as a custom machine code of the VM. This bytecode encapsulates all the operations of the language in a compact way, most of the operations are encoded in a single byte that is the origin of the name bytecode. Also, the bytecode works knowing that our language has objects and instances. Each Pharo instruction is codified in one or more bytecodes. For example we have operations to access an instance variable, send a message, and to access temporal variables. The produced bytecode then runs in any Pharo VM without need to change it for a given platform: the same program runs in a Windows, Mac, Linux or Raspberry PI machine.

Also, the VM encapsulates all the low level operations, allowing the programmers to concentrate on higher level concepts producing great software with less effort. For example, the VM handles all the network communications presenting a single way of using them. It handles the differences of using network operations in different operating systems.

Pharo is using its own VM because it allows us to control the whole execution of the language. Also, it is important to keep a VM completely open source and available to be used in commercial applications, open source projects and in academia.

Levels of Interactions

On Overview of the Pharo VM

As said before, the VM is the responsible to run our program that is compiled to bytecode. To do so the VM has different components that collaborate to run our program. All the components have a heavy interdependency between each other, and sometimes is difficult to establish a clear limit between them, but we will try to identify them by their responsibilities:

  • Bytecode Interpreter The Bytecode interpreter is a key component to execute our program. Basically, it reads the bytecode of our method and it executes the task it is required. It is a stack-based machine, so all the operations pass through a stack of values. For example, if the bytecode is “Read Instance Variable 0” it will read the first instance variable of the receiver of the message and push it in the stack. Following operations will take such value and do other operations. Another example is the bytecode “Send Message” that sends an specific message to an object. The receiver of the new message is taken from the stack with all the arguments of this message (they are also taken from the stack). The interpreter is a key point of the VM, it “executes” the operations our program is doing.

  • Object Memory Representation The interpreter manipulates our objects, sending messages to them, creating new ones and storing them in the memory. Another important component of the VM is how these instances are represented in the memory. This component is really important as a good memory model allows the interpreter to execute faster and to use less memory. However, these two variables are always one against the other. So, the key point of a good memory representation is that it should balance the speed for accessing the objects and the space they take. One example of what the memory representation includes is how the objects are structured, if they have a header, how the instance variables are stored, etc. For example, in Pharo each object needs to know of which class it is. So a possible solution is to add a pointer to the class in the header of the object. However, in modern architectures it represents 64 bits (or 8 bytes) per object. This is a lot of overhead per object, so the Pharo memory model uses 22 bits to encode a class id. This decision minimizes the use of memory but it requires a look-up in a class table, it is a clear example of this difficult balance.

  • Garbage Collector Pharo is a language that has automatic memory management, so this is the component that performs that task. This component has to record the memory that our program (and the rest of the Pharo environment) is using, when this can be returned to the operating system and when to require more memory. In modern operating systems, our program does not run alone, it has to interact with other programs running at the same time. It has also to be a good “citizen” and use only the necessary resources. Everybody has seen programs that take more resources than needed, at the end crashing the operating system. So, the Garbage collector will periodically run checking what objects are accessible and are used by the program. If an unused object is found, it will free this object allowing other objects to be instantiated in its place. Another important responsibility is to keep a list of free space, so the VM knows in all moments what memory is usable and it does not require to ask more to the operating system. Finally, during the execution of our problem, the memory will be fragmented. Basically we will have a lot of small free pieces of memory, but they will not be big enough to create some objects. So the garbage collector also performs the compaction of the memory. Basically moving all the objects in the memory to leave all the free pieces all together, so the VM can have a single big piece of memory.

  • Primitives Having real world programs requires that we want to do things with operations outside our nice object environment. For example, we want to do a network request, or call the operating system to open a window, or to read the input of the user in the console. All these operations in a VM are implemented as primitives. Basically, a primitive is a routine that is executed and communicates with the outside world. In Pharo, a primitive is basically a method that is not implemented in Pharo, it is implemented in the VM level. When a message is sent and these methods are activated, these routines are the ones that are executed. For example, when a file is opened a primitive is executed. This primitive will call the operating system to open the file and will store the file descriptor that identifies the open file for the operating system. Each basic operation on files requires talking with the operating system, so these operations are implemented as primitives of the virtual machine.

  • Optimizations Finally the VM includes another important component, a set of optimizations. It is possible to implement a VM just with the previous components, although it will be slow. A VM to be used in production implements a series of optimizations. For example, a simple optimization is the one that is used in the lookup of methods. When a message is sent to an object, the method to execute is searched in all the hierarchy of classes of that object. So, a simple cache of methods improves a lot the speed of execution. One of the most important examples of optimization is the use of Just-in-Time compilation. This technique allows the VM to transform a method from its bytecode to its machine code representation. Once the method is compiled in machine code it can be executed directly by the machine processor, running faster than the execution on the interpreter. Also other optimizations can be done in this transformation. As this transformation takes time and is machine dependent, we want to do it only in the methods that are executed a lot of times. As we have seen before, in the VM everything is about balancing.

A little of history, and a view of the future.

The Pharo VM is the product of continuous evolution during a long period of time. It includes elements that were introduced in the original versions of ST-80 and that have been evolving since then. It includes modern techniques such as Just-in-time compilation and automatic garbage collection. It started as a fork of the Squeak VM and it includes development performed by the OpenSmalltalk-VM community. This project has been active since 1995 and includes contributions from a lot of different developers. We have forked from this project not because of technical reasons but for philosophical ones. Our fork is guided by a democratization of the access to the VM: having a real productive open VM for everyone and to be modified by anyone. We are proud of the past but open to a brighter future.

One of the main advantages of Pharo VM is that it is written in Smalltalk. It can be run as any other smalltalk program. This allows us to write tests, to simulate the execution of the VM, to modularize it, use all the existing Pharo tools and to document it as any other program. The VM is a really big program with lots of complex design decisions and it requires a lot of specific knowledge, but it is our goal to open it up to be more accessible to all Pharo users. To simplify, we want to eliminate all the accidental complexity so we can challenge what is really important.

To have this goal we have started a process to extract and publish all the knowledge that have been inscribed in the VM during its long life. We are writing tests, documentation and cleaning up the old code. During development of the VM is it possible to execute it as any other Pharo program. We are using this ability to write tests for them. Having tests ease the modification of the VM reducing the risk and opening it to modifications.

We have improved the build process and applied techniques that are present in any large software application. We are convinced that the VM should be open to the community and it has to allow the community and the language to continue evolving and covering its needs.

It is not only documentation what we have in the future. We are pushing a real open-source version for ARM 64bits; updated versions for Windows, Linux and Mac. And new ways of interacting with the operating system and other libraries. All our roadmap is open and it is available in the repository of the VM.

Links: