[nanvix] E: Remove lxml built-in shim from cpython#22
Open
esaurez wants to merge 1 commit into
Open
Conversation
lxml is a third-party package, not a CPython stdlib module, and the way it was integrated on Nanvix diverges from how upstream CPython loads third-party extensions. Upstream ships no lxml at all: on Linux `pip install lxml` builds native CPython extension modules (lxml/etree.cpython-<plat>.so in site-packages, carrying DT_NEEDED for libxml2/libxslt) which the standard ExtensionFileLoader dlopens. No CPython modifications, no Setup.local entry, no C trampoline. Nanvix instead carried a band-aid: because makesetup cannot express dotted module names, the Cython output was registered under the flat name `_lxml_etree` via Modules/Setup.local, fronted by a C trampoline (lxml_etree_builtin.c) that forwarded PyInit, plus a Python bridge (lxml/etree.py) re-exporting it under the dotted lxml.etree name, plus a system shared library (liblxml_etree.so) separate from the extension. That is two artifacts where Linux has one, and several layers of glue that exist only to route a third-party package through the stdlib build machinery. This removes the entire band-aid from CPython. lxml will return the upstream way once the nanvix/lxml port emits native CPython extension modules into site-packages -- at which point CPython's standard importer loads lxml.etree directly with zero CPython-side changes. Tracked in nanvix-todo/lxml-port-ship-as-native-cpython-extensions.md. Removed ------- - Modules/lxml_etree_builtin.c, Modules/lxml_elementpath_builtin.c (the flat-name PyInit trampolines). - .nanvix/setup_local.py: _lxml_etree / _lxml_elementpath entries. - .nanvix/lxml.py: deleted. Its generate_setup_local() (general Setup.local generation, not lxml-specific despite the file name) is moved into .nanvix/setup_local.py next to render_setup_local(); the unused clear_setup_local() and the lxml runtime staging (stage_lxml_runtime / _ETREE_SHIM) are dropped. - .nanvix/build.py / package.py / test.py: drop the lxml module import, the lxml runtime staging calls, and the standalone lxml.etree import/parse smoke snippet. - .nanvix/z.py: drop lxml / libxml2 / libxslt from _DEP_EXPECTED_LIBS and the now-dead python-packages/ payload extraction (only lxml used it; the native-extension form will not). - .nanvix/nanvix.toml: drop libxml2 / libxslt / lxml dependency pins. - .nanvix/config.py: drop test_nanvix_lxml from the regrtest list. - Lib/test/test_nanvix_lxml.py: deleted. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the lxml built-in shim from CPython entirely. lxml is a third-party package, not a CPython stdlib module, and the way it was integrated on Nanvix diverges from how upstream CPython loads third-party extensions. This PR aligns Nanvix with upstream: CPython should contain no lxml at all.
How upstream CPython handles lxml
It doesn't — lxml is third-party. On Linux,
pip install lxmlbuilds native CPython extension modules (lxml/etree.cpython-<plat>.soin site-packages, carryingDT_NEEDEDforlibxml2/libxslt), which CPython's standardExtensionFileLoaderdlopens. Zero CPython modifications, noSetup.localentry, no C trampoline, no Python bridge.The band-aid being removed
Because
makesetupcannot express dotted module names, Nanvix registered the Cython output under the flat name_lxml_etreeviaModules/Setup.local, fronted by a C trampoline (lxml_etree_builtin.c) forwardingPyInit, plus a Python bridge (lxml/etree.py) re-exporting it under the dottedlxml.etreename, plus a separate system shared library (liblxml_etree.so). That is two artifacts where Linux has one, and several glue layers that exist only to route a third-party package through the stdlib build machinery.What this restores
lxml returns the upstream way once the
nanvix/lxmlport emits native CPython extension modules into site-packages — at which point CPython's standard importer loadslxml.etreedirectly with zero CPython-side changes. This is tracked as a follow-up change to thenanvix/lxmlport repo.Removed
Modules/lxml_etree_builtin.c,Modules/lxml_elementpath_builtin.c(the flat-namePyInittrampolines)..nanvix/setup_local.py:_lxml_etree/_lxml_elementpathentries..nanvix/lxml.py: deleted. Itsgenerate_setup_local()(generalSetup.localgeneration, not lxml-specific despite the file name) is moved into.nanvix/setup_local.pynext torender_setup_local(); the unusedclear_setup_local()and the lxml runtime staging (stage_lxml_runtime/_ETREE_SHIM) are dropped..nanvix/build.py/package.py/test.py: drop the lxml module import, the lxml runtime staging calls, and the standalonelxml.etreeimport/parse smoke snippet..nanvix/z.py: droplxml/libxml2/libxsltfrom_DEP_EXPECTED_LIBSand the now-deadpython-packages/payload extraction (only lxml used it; the native-extension form will not)..nanvix/nanvix.toml: droplibxml2/libxslt/lxmldependency pins..nanvix/config.py: droptest_nanvix_lxmlfrom the regrtest list.Lib/test/test_nanvix_lxml.py: deleted.Base / relationship to other PRs
feat/wave5-pr-a-stdlib-so([nanvix] E: Build stdlib extensions as .so nanvix/cpython#732). Independent of the_ssl/_hashlib/_ctypesexternalization in [nanvix] E: Build _ssl/_hashlib/_ctypes as .so nanvix/cpython#738 — this is a sibling cleanup, not stacked on it.Validation
Modules/Setup.localrenders cleanly with exactly one*static*and one*shared*marker, no lxml entries../z lint(black + pyright) clean.pre-commit runclean on all changed files.