Fork overview, and thoughts to improve basis for chance of upstreaming #1

mara004 · 2023-11-11T16:03:35Z

See below for an overview of this fork. Note, this writeup is a non-exhaustive work in progress.

This information may be valuable for working towards a basis that could be merged back into upstream at some point, though this seems fairly hypothetical for the near term, given time constraints, and mismatched design intents (e.g. relating to backwards compatibility).

However, this fork of ctypesgen may be a good starting point for any active future development, with a significantly overhauled code base that should be nicer to work with.

Selection of improvements from this fork

Removal of bloated old string classes that scream technical debt.
Enforcement of explicit string encoding/decoding. (We might want to add back implicit string handling as opt-in in the future, see below.)
Note, the old string classes are incompatible with some python releases of the 3.7/3.8 branches.
See also Incompatibility with python 3.8.1 pypdfium2#76, String seems incorrect ctypesgen/ctypesgen#77, bpo-16575: Add checks for unions passed by value to functions. python/cpython#16799, Make string auto-conversion optional & use leaner strings classes ctypesgen/ctypesgen#177
Bloated old library loader replaced with new lean library loader that is more explicit/controllable.
See also Are library loader classes actually necessary? ctypesgen/ctypesgen#176, and 569dc4b for some oversights/peculiarities in the old library loader.
Resolve . to the module directory, not the caller's CWD. Don't add compile libdirs to runtime.
Preventing the assignment of invalid/non-existent struct fields by correction of __slots__ declaration. This fix should be fairly easy for upstream to pick. See also Correct definition of __slots__ on structs (must be defined in class body) ctypesgen/ctypesgen#183
Implemented relative imports with --link-modules, and library handle sharing with --no-embed-preamble, Removed incorrect POINTER override. This properly fixes Recognize structs from common header files in different wrapper modules ctypesgen/ctypesgen#86 (shared headers), and allows to divide bindings to a library in multiple outputs (e.g. translate each header to a separate python file).
More powerful/flexible means of control over symbol inclusion via --symbol-rules.
Pre-processor auto-detection and significant improvements to call style (see 7559e81).
Removed questionable UNCHECKED wrapper from preamble.
Do not bypass c_void_p -> int auto-conversion (see Readme or commit for background).
Propagate exception if no output members were found. (Previously would have been a warning, but the if-check was defunct.)
New style-related printer options that allow to disable symbol if-guards¹ and macro guards.
Proper newline concept for the python printer, see a538742.
Free library handles after use, to allow for in-session deletion of DLLs. This allows to activate a formerly skipped test case on Windows.
Internal code cleanups and test suite improvements.

small, self-contained fixes have usually been submitted upstream and may have been merged

Fixed conflicting names resolver never actually being called. 105a3c6, Fix conflicting names resolver being never called ctypesgen/ctypesgen#193. But what the code does is still poor, unfortunately.

Points to consider

Restoring implicit UTF-8 string encoding/decoding as optional?

ctypesgen originally did implicit UTF-8 encoding/decoding of in/out strings.
While that tends to be bad practice and callers had better handle strings explicitly instead, it would seem reasonable to retain an optional backward compatibility layer for existing callers.
I also imagine it might be convenient for a library that consistently uses UTF-8 for everything.

Adding the old string classes back is certainly not an option for us. However, it may be possible to create a lean replacement. See Make string auto-conversion optional & use leaner strings classes ctypesgen/ctypesgen#177 for a suggestion (copy below), or String seems incorrect ctypesgen/ctypesgen#77 (comment) for an alternative draft by @olsonse.
Note that in/out must be handled in a single class.
The windows-specific stdcall convention

Our fork lost it for simplicity while rewriting the library loader. It should be fairly easy to add back, just wondering how to test (as this lies beyond our use case), and how to integrate it nicely.

Does the calling convention really have to be decided on function level with two library handles for cdecl/stdcall, or would it be sufficient to decide at library level, with a single handle? Is there any example of a single library actually exporting functions with different calling conventions?
Note that the ctypes API is designed around deciding at library handle level, not at function level, which suggests the expected use case is a library ABI with homogeneous calling convention.

Possible resolution: Added an option to take a caller-given dll class. It requires a small user interaction and does not support mixed calling conventions, but seems like a nice bloatless way to support a pure stdcall binary.
Removal of support for multiple libraries in one bindings file

This feature was a significant complexity burden in some code areas, including pollution around symbols in printer/output code. For now we decided to remove it - callers can use --no-embed-preamble and --link-modules to create separate bindings files. This also encourages individual/explicit rather than unified loader config.

However, see Recognize structs from common header files in different wrapper modules ctypesgen/ctypesgen#86 (comment) for some interesting considerations regarding a possible cleaner re-implementation.

Other notes

Shifts in design intent: We would prefer to stick with plain ctypes as much as possible and avoid cluttering the bindings with custom wrappers.
CLI: We changed the command-line interface from action=append to action=extend and nargs=+/*. This implied switching headers from positional to flag argument to avoid confusion/interference with flags that take multiple arguments. There are more CLI changes not listed here, see diff for details.

Done tasks

Restored test suite usability by adapting to fork changes.
Restored macro guards as opt-out

Note, this is meant for use with inherently ABI correct packaging only ↩

The text was updated successfully, but these errors were encountered:

mara004 · 2023-11-11T16:53:32Z

Dumping my lean string class replacement draft below as it's not very well visible in the PR diff.
This may be a slightly updated version.

class _wraps_c_char_p:
    def __init__(self, raw, value):
        self.raw = raw
        self.value = value

    # provided for clarity, not actually necessary due to __getattr__ wrapper below
    def decode(self, encoding="utf-8", errors="strict"):
        return self.value.decode(encoding, errors=errors)

    def __str__(self):
        return self.decode()

    def __getattr__(self, attr):
        return getattr(self.value, attr)


class String(ctypes.c_char_p):
    @classmethod
    def _check_retval_(cls, result):
        value = result.value
        return value if value is None else _wraps_c_char_p(result, value)

    @classmethod
    def from_param(cls, obj):
        if isinstance(obj, str):
            obj = obj.encode("utf-8")
        return super().from_param(obj)

mara004 · 2023-12-04T15:53:01Z

Another improvement that comes to my mind for autostrings would be making the kind of encoding configurable.

e.g. pdfium mostly uses UTF16LE, so autostrings with this might actually be convenient for pypdfium2, though formally a default encoding remains a problem - it would still be a concern with any APIs that use other encodings, like UTF-8 or ASCII.

mara004 pinned this issue Nov 11, 2023

mara004 changed the title ~~Improving basis for chance of merging back into upstream~~ Fork overview, and thoughts to improve basis for chance of upstreaming Nov 11, 2023

mara004 mentioned this issue Dec 4, 2023

Make string auto-conversion optional & use leaner strings classes ctypesgen/ctypesgen#177

Open

This was referenced Dec 16, 2023

Include statements included in target header files ctypesgen/ctypesgen#87

Open

Discussing possible fusion with pypdfium2-team development fork ctypesgen/ctypesgen#195

Open

mara004 mentioned this issue Jan 8, 2024

Ideas #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fork overview, and thoughts to improve basis for chance of upstreaming #1

Fork overview, and thoughts to improve basis for chance of upstreaming #1

mara004 commented Nov 11, 2023 •

edited

mara004 commented Nov 11, 2023 •

edited

mara004 commented Dec 4, 2023 •

edited

Fork overview, and thoughts to improve basis for chance of upstreaming #1

Fork overview, and thoughts to improve basis for chance of upstreaming #1

Comments

mara004 commented Nov 11, 2023 • edited

Selection of improvements from this fork

Points to consider

Other notes

Done tasks

Footnotes

mara004 commented Nov 11, 2023 • edited

mara004 commented Dec 4, 2023 • edited

mara004 commented Nov 11, 2023 •

edited

mara004 commented Nov 11, 2023 •

edited

mara004 commented Dec 4, 2023 •

edited