Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug for ctypes when calling name-conflicting functions in shared lib #118892

Closed
mingzeng8 opened this issue May 10, 2024 · 7 comments
Closed

Bug for ctypes when calling name-conflicting functions in shared lib #118892

mingzeng8 opened this issue May 10, 2024 · 7 comments
Labels
topic-ctypes type-bug An unexpected behavior, bug, or error

Comments

@mingzeng8
Copy link

mingzeng8 commented May 10, 2024

Bug report

Bug description:

Hi,
I'm new here, and apologize if not writing this properly.
I found a bug when using ctypes on Ubuntu 22.04.4 LTS 64-bit, and both python versions 3.10.12. and 3.12.2 have been tested. The attachment show how it occurs.
show_ctypes_bug.tar.gz
Unpack this file, go to the folder, and run show_bug.sh. This file firstly create the shared lib libfoo.so, and link to create the executable main. Then it runs the executable and has expected output. But then it runs warper.py, which use ctypes to call the function foo(), and we obtain unexpected results. The outputs are the following:

Excuting main
In func main.
In func foo.
In func hcreate.
In func hdestroy.
In func whateverelse.

Excuting python warper.py
In func foo.
In func whateverelse.

We can see that main has correctly called the functions hcreate, hdestroy, and whateverelse, but python warper.py has not called the correct hcreate and hdestroy. The reason is that the self-defined function names hcreate and hdestroy conflict with the system library. The regularly linked executable main has correctly treated the conflict, but ctypes has not, and it may have called the functions in the system library instead of the self-defined ones.

I also show the testing files in the following:
foo.c

#include <stdio.h>
int hcreate(){
printf("In func hcreate.\n");
return 0;
};

int hdestroy(){
printf("In func hdestroy.\n");
return 0;
};

int whateverelse(){
printf("In func whateverelse.\n");
return 0;
};

int foo(){
printf("In func foo.\n");
hcreate();
hdestroy();
whateverelse();
return 0;
}

main.c

#include <stdio.h>
int foo();

int main(int argc, char** argv){
printf("In func main.\n");
foo();
return 0;
}

warper.py

from ctypes import *

dll = cdll.LoadLibrary("./libfoo.so")
dll.foo()

show_bug.sh

rm *.o
rm *.so
gcc -c foo.c
gcc -c main.c

gcc foo.o -shared -o libfoo.so
gcc main.o -o main -L. -lfoo
#ar rcs libfoo.a foo.o
#gcc main.o -o main -L. -l:libfoo.a

echo
export LD_LIBRARY_PATH=.
echo Excuting main
./main
echo
echo Excuting python warper.py
python warper.py

CPython versions tested on:

3.12

Operating systems tested on:

Linux

@mingzeng8 mingzeng8 added the type-bug An unexpected behavior, bug, or error label May 10, 2024
@eryksun
Copy link
Contributor

eryksun commented May 10, 2024

This is the expected behavior for the given implementation of libfoo. Here are some options to explore:

  • If you want hcreate(), hdestroy() and whateverelse() to be local to the compilation unit, declare them as static.
  • With glibc, if you need the symbols outside of the compilation unit, mark the exported function with default visibility using __attribute__((visibility("default"))) int foo(). Then compile libfoo with the symbol visibility option -fvisibility=hidden.
  • With glibc, if you want all of the functions to be exported, you can compile with -fvisibility=protected to ensure that the libfoo always uses its own protected symbols. However, that's discouraged, as discussed by Drepper in "How To Write Shared Libraries".
  • Another option on Linux is to load the shared library using ctypes.CDLL('./libfoo.so', mode=RTLD_DEEPBIND). Using RTLD_DEEPBIND (8) places the lookup scope of the symbols in the shared object ahead of the global scope. This is not standardized by POSIX. I think RTLD_DEEPBIND is also supported by FreeBSD, but not macOS or OpenBSD. This might actually be similar to the default symbol lookup behavior on macOS, but I don't know. OpenBSD has RTLD_SELF.

@mingzeng8
Copy link
Author

Hi, thanks a lot for your review. Here are my replies:

  1. The static declaration can solve the problem in this example. But it cannot be used in the real case I am facing, because the hcreate() and hdestroy() functions are in an old lib, named SDDS (https://www.aps.anl.gov/Accelerator-Operations-Physics/Software). This attachment is a mimic of the project (how to use: unpack, go to the folder, and execute show_bug.sh).
    Mimic_ELEGANT.tar.gz
  2. Just use the -fvisibility=hidden option can solve this problem. In the real situation, I have to add this compiling flag in the SDDS configuration.
  3. The -fvisibility=protected compiling option works for the simple example. But it triggers an error when creating a shared lib based on the old project: /usr/bin/ld: motion.o: relocation R_X86_64_PC32 against protected symbol `_Z25stochastic_laserModulatorPddd' can not be used when making a shared object
  4. I think the best solution would be telling ctypes to search for the names inside the shared lib first. In my case, use mode=RTLD_DEEPBIND gives an error:
    NameError: name 'RTLD_DEEPBIND' is not defined
    Maybe this is because my glibc version is 2.31. So I cannot use this mode unless upgrade my system.

My expectation is that, the ctypes loaded functions should search for names inside the shared lib first, just as what the main executable does. In the real situation, hcreate() from the SDDS lib package returns a pointer, and the get_beamline() function does some operation to the structure that the pointer is pointed. Because the function loaded by ctypes is not calling the correct hcreate(), the program returns an error message "Terminated by SIGSEGV" without any further useful information. Meanwhile, the regular executable generated by gcc does not have this problem. I spent a week to finally find the actual reason.

I thinks it may be good to let ctypes to search for names inside the shared lib file first, as default, or warn the users to be careful about the name conflicts. I do physical research myself, and I did not know hcreate() and hdestroy() are in the standard lib. Even if I knew, I would not expect that the function loaded by ctypes is calling the standard lib instead of calling the functions defined inside the same shared lib. Such logic can be dangerous, because it is different from what a regular executable does.

@eryksun
Copy link
Contributor

eryksun commented May 11, 2024

I think the best solution would be telling ctypes to search for the names inside the shared lib first. In my case, use mode=RTLD_DEEPBIND gives an error:

On Linux, the value of the mode constant RTLD_DEEPBIND is 8. This constant is available as os.RTLD_DEEPBIND.

My expectation is that, the ctypes loaded functions should search for names inside the shared lib first, just as what the main executable does.

POSIX dlsym() does work like that. Thus calling dll.hcreate() works as expected if hcreate is exported by the shared library. The issue is symbol lookup for relocation references used by foo() when hcreate, hdestroy, and whateverelse are unprotected global symbols instead of local symbols (i.e. when not declared static or not declared with "hidden" visibility). By default the relocations are resolved against previously loaded symbols in the process (e.g. the Python process) when the library is loaded via dlopen(), per what's called "load order". With RTLD_DEEPBIND, however, the loader on Linux prefers the shared object's own exported symbols instead. Here's the exact language specified by POSIX for the default behavior of dlopen():

Symbols introduced into the process image through calls to dlopen() may be used in relocation activities. Symbols so introduced may duplicate symbols already defined by the program or previous dlopen() operations. To resolve the ambiguities such a situation might present, the resolution of a symbol reference to symbol definition is based on a symbol resolution order. Two such resolution orders are defined: load order and dependency order. Load order establishes an ordering among symbol definitions, such that the first definition loaded (including definitions from the process image file and any dependent executable object files loaded with it) has priority over executable object files added later (by dlopen()). Load ordering is used in relocation processing [my emphasis]. Dependency ordering uses a breadth-first order starting with a given executable object file, then all of its dependencies, then any dependents of those, iterating until all dependencies are satisfied. With the exception of the global symbol table handle obtained via a dlopen() operation with a null pointer as the file argument, dependency ordering is used by the dlsym() function. Load ordering is used in dlsym() operations upon the global symbol table handle.

@eryksun
Copy link
Contributor

eryksun commented May 11, 2024

Maybe this is because my glibc version is 2.31. So I cannot use this mode unless upgrade my system.

I forgot to address this. RTLD_DEEPBIND was added in glibc 2.3.4, which was released in 2004. glibc 2.31 was released in 2020.

@mingzeng8
Copy link
Author

Thanks a lot for your detailed explanation. I've learned a lot. The mode option does not work, unfortunately.

On Linux, the value of the mode constant RTLD_DEEPBIND is 8. This constant is available as os.RTLD_DEEPBIND.

In my example Mimic_ELEGANT.tar.gz, set mode=8 or mode=os.RTLD_DEEPBIND in ctypes.CDLL() in the file warper.py does not solve the problem. The hcreate() and hdestroy() called by the function loaded in ctypes are still not what I want. The output of show_bug.sh is:

Executing main
In func get_beamline.
In func hcreate.
hcreate() has returned 0.
In func hdestroy.
hdestroy() has returned 0.
In func whateverelse.
whateverelse() has returned 0.
get_beamline() has returned 0.

Executing python warper.py
In func get_beamline.
hcreate() has returned 1.
hdestroy() has returned 0.
In func whateverelse.
whateverelse() has returned 0.
get_beamline() has returned 0.

python warper.py still have not called the hcreate() and hdestroy() defined inside the shared lib.

@eryksun
Copy link
Contributor

eryksun commented May 11, 2024

Use the following for warper.py:

import os
from ctypes import *

lib = CDLL("./libelegant.so", mode=os.RTLD_DEEPBIND)
lib.elegant_main()

There was an initial cdll.LoadLibrary("./libelegant.so") that didn't belong there. cdll.LoadLibrary() does a default load, which defeats the purpose of using RTLD_DEEPBIND in the subsequent CDLL load.

Here's the output I get:

$ ./show_bug.sh 

Executing main
In func get_beamline.
In func hcreate.
hcreate() has returned 0.
In func hdestroy.
hdestroy() has returned 0.
In func whateverelse.
whateverelse() has returned 0.
get_beamline() has returned 0.

Executing python warper.py
In func get_beamline.
In func hcreate.
hcreate() has returned 0.
In func hdestroy.
hdestroy() has returned 0.
In func whateverelse.
whateverelse() has returned 0.
get_beamline() has returned 0.

@mingzeng8
Copy link
Author

Hi @eryksun , this perfectly solve my problem. I really appreciate. Although mode=8 is not default, I may recommend to add it every time to my colleagues.
I think we may close this issue.

@eryksun eryksun closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-ctypes type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants