-
Notifications
You must be signed in to change notification settings - Fork 931
opal/class: enable use of opal classes after opal_class_finalize #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit enables the use of opal classes after a call to opal_class_finalize. This is needed to support re-initializing opal after calling opal_finalize_util (). This is needed to support the following without leaking: MPI_T_init_thread (); MPI_T_finalize (); MPI_Init (); MPI_Finalize (); Before this commit the above code would crash in MPI_Init because the constructor array for some class was freed by opal_class_finalize (). The fix is to turn the cls_initialized member of opal_class_t into an init epoch identifier and compare it against opal_class_init_epoch instead of 1. On each call to opal_class_finalize the opal_class_init_epoch counter is incremented forcing re-initialization of classes after finalize. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
|
It looks like an extremely convoluted solution. Why is setting |
|
@bosilca Because the class may be in a dynamic component that has been dlclose'd. I couldn't find a way around that. |
|
I still like the idea of having a destructor function in opal. I feel it is a cleaner solution to the problem but I know there is some resistance to the idea. This fixes the underlying problem without a destructor. |
|
|
Refer to this link for build results (access rights to CI server needed): |
|
But if the component has been closed I don't see how your proposed patch fixes the problem. The destructor approach seems somehow cleaner, but still a stretch for a environment that clearly defines initialization and finalization functions. Why not extending the class structure to contain a pointer to the component that defines it. This will create a link between the registered classes and the component that registered them. Upon registration a class increases the ref_count of it's parent class, as well as the ref_count of the component holding the class definition. Upon class destruction, we do the opposite. The ref_count on the component will prevent the system from unloading them for as long as they have a class registered. |
|
If a dlclose'd component is re-loaded its classes will again have cls_initialized set to 0 and they will go through the class initialization. That has always worked and that behavior will not change. This commit ensures that we get the same behavior from classes not in dlclose'd components (opal/class, etc) by ensuring their cls_initialized member is stale without having to see which classes can still be accessed. As for adding a pointer to the component. That will require changing OBJ_CLASS_INSTANCE to include a pointer to a component (if there is one). That is a lot of work but may be the correct solution in the end. |
found at: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php The finalize function is set using either the linker option -fini or __attribute__((destructor)) depending on compiler support. I have confirmed that this hybrid approach works with all the major compilers. The attribute is supported by gcc, clang, llvm, xlc, and icc. The fini function will support pgi. If a compiler/linker combination does not support either the destructor or fini function a message will be printed on re-init indicating it is not supported (an improvement over the current behavior-- SEGV). I moved the following to the destructor function: - Class system finalize. This solves a bug when MPI_T_finalize is called before MPI_Init. The only downside to this change is we will leave the footprint of the opal class system after MPI_Finalize. This footprint should be relatively small. This is an alternative to open-mpi#517 but the two PRs are not mutually-exclusive (with some modifications). This commit should also be safe for 1.8.x as it does not change internal or external ABI (open-mpi#517 changes internal ABI). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit is related to an RFC from June 2014. Disscussion can be found at: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php The finalize function is set using either the linker option -fini or __attribute__((destructor)) depending on compiler support. I have confirmed that this hybrid approach works with all the major compilers. The attribute is supported by gcc, clang, llvm, xlc, and icc. The fini function will support pgi. If a compiler/linker combination does not support either the destructor or fini function a message will be printed on re-init indicating it is not supported (an improvement over the current behavior-- SEGV). I moved the following to the destructor function: - Class system finalize. This solves a bug when MPI_T_finalize is called before MPI_Init. The only downside to this change is we will leave the footprint of the opal class system after MPI_Finalize. This footprint should be relatively small. This is an alternative to open-mpi#517 but the two PRs are not mutually-exclusive (with some modifications). This commit should also be safe for 1.8.x as it does not change internal or external ABI (open-mpi#517 changes internal ABI). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit is related to an RFC from June 2014. Disscussion can be found at: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php The finalize function is set using either the linker option -fini or __attribute__((destructor)) depending on compiler support. I have confirmed that this hybrid approach works with all the major compilers. The attribute is supported by gcc, clang, llvm, xlc, and icc. The fini function will support pgi. If a compiler/linker combination does not support either the destructor or fini function a message will be printed on re-init indicating it is not supported (an improvement over the current behavior-- SEGV). I moved the following to the destructor function: - Class system finalize. This solves a bug when MPI_T_finalize is called before MPI_Init. The only downside to this change is we will leave the footprint of the opal class system after MPI_Finalize. This footprint should be relatively small. This is an alternative to open-mpi#517 but the two PRs are not mutually-exclusive (with some modifications). This commit should also be safe for 1.8.x as it does not change internal or external ABI (open-mpi#517 changes internal ABI). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit is related to an RFC from June 2014. Disscussion can be found at: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php The finalize function is set using either the linker option -fini or __attribute__((destructor)) depending on compiler support. I have confirmed that this hybrid approach works with all the major compilers. The attribute is supported by gcc, clang, llvm, xlc, and icc. The fini function will support pgi. If a compiler/linker combination does not support either the destructor or fini function a message will be printed on re-init indicating it is not supported (an improvement over the current behavior-- SEGV). I moved the following to the destructor function: - Class system finalize. This solves a bug when MPI_T_finalize is called before MPI_Init. The only downside to this change is we will leave the footprint of the opal class system after MPI_Finalize. This footprint should be relatively small. This is an alternative to open-mpi#517 but the two PRs are not mutually-exclusive (with some modifications). This commit should also be safe for 1.8.x as it does not change internal or external ABI (open-mpi#517 changes internal ABI). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit is related to an RFC from June 2014. Disscussion can be found at: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php The finalize function is set using either the linker option -fini or __attribute__((destructor)) depending on compiler support. I have confirmed that this hybrid approach works with all the major compilers. The attribute is supported by gcc, clang, llvm, xlc, and icc. The fini function will support pgi. If a compiler/linker combination does not support either the destructor or fini function a message will be printed on re-init indicating it is not supported (an improvement over the current behavior-- SEGV). I moved the following to the destructor function: - Class system finalize. This solves a bug when MPI_T_finalize is called before MPI_Init. The only downside to this change is we will leave the footprint of the opal class system after MPI_Finalize. This footprint should be relatively small. This is an alternative to open-mpi#517 but the two PRs are not mutually-exclusive (with some modifications). This commit should also be safe for 1.8.x as it does not change internal or external ABI (open-mpi#517 changes internal ABI). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit is related to an RFC from June 2014. Disscussion can be found at: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php The finalize function is set using either the linker option -fini or __attribute__((destructor)) depending on compiler support. I have confirmed that this hybrid approach works with all the major compilers. The attribute is supported by gcc, clang, llvm, xlc, and icc. The fini function will support pgi. If a compiler/linker combination does not support either the destructor or fini function a message will be printed on re-init indicating it is not supported (an improvement over the current behavior-- SEGV). I moved the following to the destructor function: - Class system finalize. This solves a bug when MPI_T_finalize is called before MPI_Init. The only downside to this change is we will leave the footprint of the opal class system after MPI_Finalize. This footprint should be relatively small. This is an alternative to open-mpi#517 but the two PRs are not mutually-exclusive (with some modifications). This commit should also be safe for 1.8.x as it does not change internal or external ABI (open-mpi#517 changes internal ABI). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
|
Can one of the admins verify this patch? |
|
Who or what is an "admin"?? |
|
I would still like to see this make it into master but I would like to verify that there are no performance implications of this change. I doubt that there are but we do have a lot of OBJ_CONSTRUCT calls in the critical path. |
This commit is related to an RFC from June 2014. Disscussion can be found at: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php The finalize function is set using either the linker option -fini or __attribute__((destructor)) depending on compiler support. I have confirmed that this hybrid approach works with all the major compilers. The attribute is supported by gcc, clang, llvm, xlc, and icc. The fini function will support pgi. If a compiler/linker combination does not support either the destructor or fini function a message will be printed on re-init indicating it is not supported (an improvement over the current behavior-- SEGV). I moved the following to the destructor function: - Class system finalize. This solves a bug when MPI_T_finalize is called before MPI_Init. The only downside to this change is we will leave the footprint of the opal class system after MPI_Finalize. This footprint should be relatively small. This is an alternative to open-mpi#517 but the two PRs are not mutually-exclusive (with some modifications). This commit should also be safe for 1.8.x as it does not change internal or external ABI (open-mpi#517 changes internal ABI). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
|
bot:retest |
|
@hjelmn Where are we on this? |
|
This PR works fine for what it was intended for. The problem is that now that we have an opal destructor it is no longer necessary except in the small number of cases where there is neither a destructor attribute or a fini function. This can happen on OS X with the PGI compiler. |
|
@bosilca What's your current feeling on this one? |
|
It does have it's benefit. Go for it 👍 ! |
|
:bot:retest |
opal/class: enable use of opal classes after opal_class_finalize
V2.x window updates
This commit enables the use of opal classes after a call to
opal_class_finalize. This is needed to support re-initializing opal
after calling opal_finalize_util (). This is needed to support the
following without leaking:
MPI_T_init_thread ();
MPI_T_finalize ();
MPI_Init ();
MPI_Finalize ();
Before this commit the above code would crash in MPI_Init because the
constructor array for some class was freed by opal_class_finalize ().
The fix is to turn the cls_initialized member of opal_class_t into an
init epoch identifier and compare it against opal_class_init_epoch
instead of 1. On each call to opal_class_finalize the
opal_class_init_epoch counter is incremented forcing re-initialization
of classes after finalize.
Signed-off-by: Nathan Hjelm hjelmn@lanl.gov