Skip to content

Using C structs as a kind of namespace mechanism to reduce global symbol bloat

Mark C. Miller edited this page Aug 8, 2019 · 1 revision

Using C structs as a kind of namespace mechanism to reduce global symbol bloat

The discussion here is similar to material described elsewhere

Well designed libraries have source code distributed among a number of different compilation modules (e.g. source files).

A problem with this for C libraries is that symbols that are not really part of the public API wind up polluting the global namespace for the sole purpose that different compilation modules can share and find them.

An example in Silo is the use of built-in compression libraries. These are not part of Silo’s public API. However, the compression library’s symbols wind up in Silo’s global namespace. This presents two problems. First, it represents unnecessary symbol bloat. Worse, if Silo was ever linked into an executable where those same libraries were being used from a different source, we would suffer symbol collisions.

Note that the use of compiler flags such as GCC’s -fvisibility to control symbol visibility can only address some of the issues mentioned here. In particular, in C code, symbols that need to be shared among different compilation modules wind up in the one and only global namespace.

In integrating Peter Lindstrom’s ZFP compression library into Silo, I have decided to take a different route and solve both of these problems. Ultimately, what I am demonstrating here is just another way of name-mangling to avoid symbol collision. But, it has the added benefit that the mangled names are containerized in a struct, much like a C++ namespace does so that total symbol count in the global namespace is also reduced.

The solution is basically to create an single, uber struct-valued symbol where each member of the struct is a function pointer.

  1. Change all public functions in all the compilation modules to static storage class.
    • This makes these symbols private to each compilation module
    • I suppose another way of doing this, with GNU compilers anyways, is to set default symbol visibility to hidden using -fvisibility=hidden flag so that no symbols, by default, appear in the global namespace
  2. Effectively wrap the library’s public header file function and variable definitions section in an uber, constant-valued struct (demonstrated below) type where each member of the struct is a function pointer for one of the functions in the library’s interface
  3. In any caller of the library, call the library’s method via a macro that unravels the struct pointer and member function pointer dereferencing and then calls the given function

Example demonstrated below.

Original library header file, libgs.h


int gs_a(int a, double b);
void gs_b(int a);

And the library source file. . .

#include
#include

int gs_a(int a, double b)
{
printf(“In gs_a\n”);
return a*b;
}

void gs_b(int a)
{
printf(“In gs_b\n”);
return;
}

And, finally the caller/client code…


#include
int main()
{
gs_a(1,2.0);
gs_b(1);
return 0;
}

In this case, both gs_a and gs_b will appear in the global namespace.

The goal is to put these functions inside a container namespace.

Here is the modified header and source files


extern struct _gs_funcs {

int ADD_FUNC(gs_a,(int a, double b));
void ADD_FUNC(gs_b,(int a));

} gs_funcs_struct;
extern struct gs_funcs* gs_funcs;


where

#define ADDFUNC (*Func) Args

And, the libgs implementation. Note the static declarations on the symbols

#include <libgs.h>
#include <stdio.h>
static int gs_a(int a, double b)
{
    printf("In gs_a\n");
    return a*b;
}

static void gs_b(int a)
{
    printf("In gs_b\n");
    return;
}

struct gs_ {
    int (*_gs_a)(int a, double b);
    void (*_gs_b)(int a);
} gs = {gs_a, gs_b};

And the caller/client code


#include

int main()
{
gs.gs_a(1,2.0);
gs.gs_b(1);
return 0;
}

Cyrus’ Thoughts

  • To clarify, this is just to bottle things internally? GS would be used by you inside of Silo to call new lib routines — not by folks using the Silo API?
    (You would only use it to bottle symbols you are using that you don’t want to be part of the public interface)

> MCM: Yes, thats right. But, after talking with Peter Lindstrom, I got ride of the GS() macro. Now, all the functions are just members of the uber-gs struct. And, that struct is the only symbol that is in the global namespace.

  • a few concerns:
    • this requires us to modify the library we want to use, which seems like a potential maintenance pain
    • Actually, I got 80% of the way through Peter’s ZFP library (see below), without major changes to the lib.
    • will this make debugging harder, having the function pointer as part of this mix?
    • Nope. Debugging is same. You still set a break point functions named as you’d expect
  • For collision b/c the names are too simple, it would be better if the code we want to link to followed some minimal standards that avoids name collisions.
  • Yes and no. Prepending a consistent moniker to all public symbols helps in providing a pseudo-sorta namespace thingy for C symbols. However, it does nothing to reduce the actual symbol count of the global namespace. The approach developed here does that (again, see below for experiences doing this with ZFP).
  • For example: if hdf5, as a C-API, didn’t use unique prefixes for its functions, and simply called its functions “open”, “write”, and “read” that would be horrible.
  • Agreed
  • Would we want to rebottle their public API to use it or ask them to resolve this?
  • I think it depends on whether or not you care about polluting the global namespace with symbols from internally used libs.
  • I see the concern about a client using a TPL directly and in Silo also having those symbols, but this could happen now — so is it a real concern? It seems like the client code’s build system could resolve this.
  • Hmm. This cannot happen now. If you are thinking of PDBLite, it defines a different symbol set with lite_ prepending all symbols. We did this specifically to avoid collision with the real PDB library.

Experiences Retrofitting ZFP

So, I applied these concepts to Peter Lindstrom’s ZFP, version 0.5.0, and I capture my experiences here.

First, I started with the expectation it would be possible to do this by adjusting only the ZFP header file slightly. In fact, my original intention was to adjust ZFP’s header file so that the ZFP library could be compiled either way. The problem with this appraoch is in the population of the uber struct that serves as the function namespace. If all of ZFP’s public API was in a single compilation module (an unreasonable expectation in general), then the uber struct can be both declared and defined using some macro magic on the header file. An example of a portion of the original and modified ZFP header file is shown below.

The key difference is the introduction of a CPP macro, DEF_FUNC, to split and wrap the API function declarations. Each function declaration is split into 3 parts; the return type, the function’s name and the it’s formal argument list. In the above example, note the commas, the double closing parenthases and the lack of terminating semi-colon on the function declaration lines.

Using some macro magic, this same header file is included twice in the cooresponding source code file implementing ZFP. The first is is for a normal inclusion to declare the functions. The second time it is included, the same header file is used to initialize a const uber struct containing function pointers to each of these API functions. The macro magic below implements this.

/* Minor adjustment to typical guard to prevent multiple inclusion 
    so that this header file can be included twice in the ZFP
    implementation source */
#if defined(INIT_C_STRUCTSPACE) || !defined(ZFP_H)
#define ZFP_H
.
.
.
/* Macro magic to define the zfp uber struct and the DEF_FUNC macro
    to either declare or initialize the members of the zfp uber struct */
#ifdef INIT_C_STRUCTSPACE
#define DEF_FUNC(Ret, Func, Args) (Ret(*)Args) Func,
 struct zfp_structspace zfp = {
#else
#define DEF_FUNC(Ret,Func,Args) Ret (*Func) Args;
 struct zfp_structspace {
#  endif
.
.
.
/* ZFP API function declarations wrapped with DEF_FUNC logic */
.
.
.

/* Terminate either declaration or initialization of zfp uber struct */
};
extern struct zfp_structspace zfp;

Next, in the cooresponding zfp.c source file implementing the ZFP API, we have this additional logic at the end of the source file


#define INIT_C_STRUCTSPACE
#include “zfp.h”

This second inclusion of the zfp header file but with INIT_C_STRUCTSPACE defined, winds up creating the code that initializes the zfp uber struct with all the function pointer values.

Finally, all functions in zfp.c must be modified to be declared static scope so they are not visible outside zfp.o. The only symbol visible outside the zfp compilation module is the zfp uber struct. However, the members of the zfp uber struct contain pointers to all the static functions. So, once this is done, it means that none of ZFP’s API functions appear in the global symbol table. The only way to call those functions is through the zfp uber struct as in. . .

zfp.zfp_stream_open(…);
zfp.zfp_stream_mode(…);

What If The Public API Is Defined Across Multiple Compilation Modules?

Ok, so the above description works great. But, only if the public API is defined in a single compilation module. This isn’t always the best way to do business. Typically, the public symbols of a C-Language API are defined in multiple compilation modules. This means that there is now no single place to write a static iniitalizer_-like block of code to populate all the function pointers of the uber struct. Not even "_designated initializers":https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html can help. Is it possible to initialize portions of the struct in each compilation module? Yes, but not using static initialization as is done in the above example. In this more general case, you need to write a special initialization function that populates the cooresponding members of the uber struct with the given compilation module’s public API symbols.

In applying this approach to ZFP, I wound up doing more hand coding of the individual compilation modules than I would have liked. In addition because of the way ZFP source code was structured, I wound up having to define ~30 such initializer functions all of which by necessity are part of the global namespace. In addition, calling this initialization function becomes something the ZFP client is obligated to do. It becomes a new API call.

So, at this point, I did some hand coding on ZFP reducing it to 4 compilation modules from the original 14; bitstream.c, encoder.c, decoder.c and zfp.c. In each compilation module, there is an initialization function, zfpbs_init(), zfp_encoder_init(), zfp_decoder_int() and zfp_init(), which turns around and calls the others.

After having made these changes I then needed to adjust ZFP’s examples code to use the modified API, which now accesses all the methods through the uber struct.

Conclusions

I reduced public symbol count from ~125 to 4. In addition, I ensured that if Silo is ever linked in a context where the original ZFP library is also being used, there will be no chance of symbol collisions.

Designing into a Library vs. Retrofitting a Library

  • Its easier to design this kind of feature into a library from the start rather than having to go back and retro-fit the library.
  • It is possible to design a library so that it can be compiled either way. But, callers of the library will need to make a choice on how they intend to call the library.
  • It would also be possible, at compile time, to select the symbol name of the uber struct the library is wrapped with.