-
Notifications
You must be signed in to change notification settings - Fork 76
Stability of std::typeinfo and impact on cross-compiler, cross invocation pool compatibility #776
Comments
That's all true, type_num is not guaranteed to be stable between compilers/versions. Also, I think (not 100% sure) that it's possible for hash_code to be different on different invocations of the same binary. Besides layout we also check things like alignment of basic types and data encoding (little endian vs big endian) but we do not check a compiler. I think there is no easy solution for auto generated type ids. The safest approach would be probably to just provide your own type_num implementation for each type (with some hardcoded value). You might also use some simple "heuristic" to define the type_value. For example, if you're interested in versioning the layout you could have something like this:
The type_num will change once the size of the T changes (a new member is added). It will not protect you from reordering the member etc. but might be enough for some cases. |
Here's the source for hash_code for GCC: size_t hash_code() const noexcept
{
# if !__GXX_MERGED_TYPEINFO_NAMES
return _Hash_bytes(name(), __builtin_strlen(name()),
static_cast<size_t>(0xc70f6907UL));
# else
return reinterpret_cast<size_t>(__name); # endif
} It seems in both branches of the For GCC and CLang both use the Itanium ABI naming spec. So essentially changing the name of a class/struct (in CLang/GCC) will mean a new ABI name and libpmemobj will lose the reference. For MSVC, it looks like its just the undecorated namespace and class name alone. I wrote the attached to try out GCC, CLang and MSVC: #include <iostream>
#include <typeinfo>
class MyClass {
public:
explicit MyClass(int id): member(id) {
std::cout << std::endl;
}
int member;
};
int main() {
MyClass c(1);
std::cout << typeid(c).name() << " - " << typeid(c).hash_code() << std::endl;
return 0;
} Results:
So theoretically a pool should work between GCC and CLang binaries but not MSVC. I don't know if there are differences in fundamental type implementations between CLang and GCC which would cause issues? One consideration for libpmemobj++ would be that using its own hash of the typeid::name() (+ size?) would give more stability/control of a potential change of the hash implementation by compilers. But then it would cause issues with backward compatability with existing pools. I do also like the idea of being able to provide your own type IDs, however the type_id is so deep in the implementation I'm not sure what that might look like. Plus you still need the default for the fundamental/standard types. Another pool validation test could be to get the hash_code of a known object (like p) and compare it a known value - this would catch a change in hash_code implementation and therefore incompatible pool. In general though, I am thinking of ways this could be leveraged to help with layout updates, eg how to update the implmentation of a class without invalidating the pool layout. Will post about that separately. |
Currently it's quite easy to provide your own id for a specific type - you can just specialize the type_num() function. However, you can't easily replace the implementation for all types (if you want to use make_persistent). For Itanium ABI name will also change when you add a template parameter for a class (even defaulted). And you'd probably want to be able to extend the type with new template parameters. In libpmemobj-cpp we don't really rely on type_nums for compatibility. We use additional flags inside of our classes. (like here: https://github.com/pmem/libpmemobj-cpp/blob/master/include/libpmemobj%2B%2B/container/concurrent_hash_map.hpp#L2166-#L2186). I'd like to have more generic mechanism but I'm not sure we should rely on gcc/clang implementation of typeid on libpmemobj-cpp level. This would mean we have to check for pool compatibility even in cases where users do not wish to use type numbers. Maybe the best way would be to change the implementation of type_num() in libpmemobj to just return size (or something which is always 100% compatible) and if users wish to have more detailed id they should specialize the function? (and use offsetof + sizeof on members and hash it somehow) |
That's what I was wondering... how to specialise type_id but also keep using make_persistent.
Is this at the libpmemobj level or libpmemobj++ level? It seems type numbers are a requirement for all libpmemobj++ usages since its used by make_persistent? In a way its useful that the type_id is predictable from the name, since it might allow us to do the object swap that I outline in #783. However, there is also the need to make sure the object layout is what we expect it to be. How to meet both needs? |
Yeah, it's pretty hard to come up with some general solution.
It's not really a requirement, you could just set the type_num to 0 for all types and it will work. There is a mode of allocation (with no metadata header) which does not support type nums at all. You can read about this here: https://pmem.io/pmdk/manpages/linux/master/libpmemobj/pmemobj_ctl_get.3 under POBJ_HEADER_NONE |
For accurate automatic compatibility, you'd need compile-time reflections to reproducibly compute |
Libpmemobj++ uses a type identifier to identify object types and pass to libpmemobj. The type ID is sourced from here in common.hpp
Looking at std::typeinfo::hash_code, it notes that type_info is implementation specific. This is not normally an issue for volatile-memory based programs since its RTTI, however since we are effectively persisting the hash_code beyond program runtime, it becomes a risk for us?
It seems this means that:
Are there any checks beyond the layout string that confirm if the pool layout is based on GCC, CLang or MSVC etc?
Another concern is this:
While the hash-code collision issue is possible but hopefully unlikely, the bigger concern is what might drive a change between invocations? Could that include even the same binary? Or between two different compiles of the same "program"?
It seems one workaround might be hash the type_info::name() string - as that that might be more stable across compiles and compilers. Eg from std::type_info::name:
However, then you run the risk of the programmer changing the class name and losing the reference to the original type identifier.
This might allow some options or workarounds to allow updates to object implementations (ref https://github.com/axomem/nucleus/issues/22) too.
The text was updated successfully, but these errors were encountered: