Skip to content
This repository has been archived by the owner on Apr 29, 2023. It is now read-only.

Assorted fixes: Regex, reproducibility, and more. #97

Open
wants to merge 12 commits into
base: next
Choose a base branch
from
Open

Assorted fixes: Regex, reproducibility, and more. #97

wants to merge 12 commits into from

Conversation

jonfoster
Copy link
Contributor

@jonfoster jonfoster commented Mar 26, 2018

Hi,

Here are some assorted fixes I've made to PyXB.

More details are in the individual commit messages.

Thanks to Eurofins Digital Testing for paying me to work on most of these changes.

Kind regards,

Jon

Rework XML regex code to follow the standards.  It now completely parses the XML regex and generates a corresponding Python regex, instead of relying on the syntaxes being "close enough".
Use setuptools instead of distutils, for wheel support.
This fixes a warning when running setup.py, due to the version number having an invalid format.
Really old versions of Python are no longer supported by PyXB anyway, so there's no need to keep complicated code just to support them.

Also use UUID4 rather than UUID1 for generating UUIDs, because there are privacy issues with people's MAC address being embedded in UUIDs in documents.
Make it posible to configure pyxbgen to generate bindings in a reproducible way.  "Reproducible" means that the same input results in exactly the same output, byte-for-byte, without depending on:

 * the time the bindings were built
 * the path to the working directory they're buit in
 * the MAC address of the build PC (embedded in a UUID)
 * newly generated UUIDs (guaranteed to be different every time)
 * the iteration order of Python dicts/sets
 * the path separator (e.g. "/") used by the OS that the bindings are built on.

Before this commit, the PyXB bindings changed every time they were built becasue they depended on all of the above.  After this commit, you can pass options to pyxbgen to get reproducible bindings.  The new options are:

 * --strip-file-paths - doesn't store paths to XSD files in bindings.
 * --no-timestamp - doesn't write build timestamp and Python version in comments in bindings.
 * --generation-uid= - specifies the Generation UID rather than randomly generating one.

There are two reasons why people want this:

 1) If bindings are committed to version control, then when bindings are rebuilt you want to see what actually changed and not see loads of spurious differences.

 2) If bindings are not committed to version control but are built by some build system, then careful development teams want developers to be using the exact same bindings as each other and as the build system is building.
@pabigot
Copy link
Owner

pabigot commented Mar 27, 2018

The changes look reasonable and likely to benefit users.

However, I really don't have the time or interest necessary to merge them or, more importantly, undertake to resolve any issues that arise from them.

At this point it seems you're in a better position than I am to maintain and evolve PyXB. If you're interested in taking over the project, please email me directly.

@johanvdw
Copy link

johanvdw commented May 8, 2018

A comment:
--strip-file-paths
Rather than stripping the whole path, it may be better to provide a relative path. It is possible that the XSD schema uses different folders, potentially with the same filename in some of them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants