Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
—Antoine de Saint-Exupery
μpb (or more commonly, “upb”) is an implementation of the Protocol Buffers serialization format released by Google in mid-2008. The Greek letter mu (μ) is the SI prefix for “micro”, which reflects the goal of keeping upb as small as possible while providing a great deal of flexibility and functionality.
upb is written in 2300 sloc of C, and compiles to just under 30kb of object code on x86.
The Google implementation of Protocol Buffers is open source, released under a liberal license (BSD). Other people have written implementations also, such as protobuf-c. Why did I write a completely new implementation from scratch? Why should anybody use my implementation?
I will give two main reasons, besides the goal of minimalism (which has either already won you over or failed to pique your interest):
upb is designed for maximum flexibility. What this means is that it gives you as a programmer more choices about how you want to store and process your data. Specifically:
memcpy()are expensive when overused, especially taking into account the cache effects. Deep in upb’s design is a recognition of this fact, and interfaces that let you optimize for intelligent memory management. For example, upb is capable of making strings reference the original protobuf data (rather than copying), and upb’s memory management interface lets you reuse submessages instead of destroying and reallocating them.
upb is designed to be a toolbox of paradigms for manipulating protocol buffer data. upb is built in layers, and any of the layers are available for clients to use as they see fit.
In addition, there are (or will be) several different code generation strategies, for compiled languages that wish to use generated code.
Despite this promise, Protocol Buffers haven’t seen much adoption in dynamic languages because the existing implementations aren’t very efficient. upb was designed from the outset to be an ideal implementation for supporting very fast Protocol Buffers implementations for dynamic languages.
One key part of this strategy was designing the table-driven parsing code-path — the method of operation that doesn’t require you to generate and compile C or C++ for each message — as fast as possible. It is inconvenient for users of dynamic languages to have a compile step in their development cycle.
Another important feature is developing memory-management interfaces that can integrate with the memory managers of dynamic languages. This is no easy task, because each language runtime does memory management differently. Some use reference counting, some use garbage collection, some use a combination, and the interfaces for interacting with the memory managers are different for every runtime. A key goal of upb was to design a memory management scheme that could gracefully integrate with all of these.