Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential problem for commit finder when packages do not use semver. #706

Closed
jenstroeger opened this issue Apr 15, 2024 · 9 comments · Fixed by #709
Closed

Potential problem for commit finder when packages do not use semver. #706

jenstroeger opened this issue Apr 15, 2024 · 9 comments · Fixed by #709
Assignees
Labels
commit-finder The issues related to commit finder

Comments

@jenstroeger
Copy link
Contributor

jenstroeger commented Apr 15, 2024

Package versioning in the Python ecosystem doesn’t always follow semantic versioning; in fact, alternative versioning schemes are documented here.1 For example for final release versions,

The release segment consists of one or more non-negative integer values, separated by dots:

N(.N)*

Final releases within a project MUST be numbered in a consistently increasing fashion, otherwise automated tools will not be able to upgrade them correctly.

and then

Date based release segments are also permitted.

For example, black or flake8-bugbear are popular packages following a date based version scheme. It might be interesting to pull metadata of existing packages from the PyPI API and correlate a package’s release date with its version number, and perhaps other non-semver versioning schemes… 2 🤔

With that in mind, I suspect the commit finder might need to be reviewed?

Footnotes

  1. See also romantic versioning and sentimental versioning.

  2. Unfortunately, the current pyproject.toml spec doesn’t seem to contain a “versioning scheme” entry, so Macaron might be stuck with a few heuristics-based wild-wild guesses…

@behnazh-w behnazh-w added the commit-finder The issues related to commit finder label Apr 15, 2024
@behnazh-w
Copy link
Member

behnazh-w commented Apr 15, 2024

Macaron's commit finder can correctly find the commit for black and flake8-bugbear even though they follow a date-based version pattern. But these versions fail to communicate breaking changes via major versions. It would be nice to have a check in Macaron that reports if a project does not follow semantic versioning?

@jenstroeger
Copy link
Contributor Author

jenstroeger commented Apr 15, 2024

@behnazh-w have you tried pytz which supposedly follows IANA’s timezone versioning (link, docs) — a year followed by a lower-case letter — though it seems that pytz has as of 2013.6 modified the versioning scheme to a more Python compatible approach here and has replaced the lower-case letter with a number.

@behnazh-w
Copy link
Member

@behnazh-w have you tried pytz which supposedly follows IANA’s timezone versioning (link, docs) — a year followed by a lower-case letter.

No I haven't tried this package. @benmss what do you think?

@jenstroeger
Copy link
Contributor Author

It would be nice to have a check in Macaron that reports if a project does not follow semantic versioning?

That would be useful information in the final report.

If my project depends on packages that don’t programmatically communicate breaking changes then that needs to be considered when declaring my project’s deps. So, if Macaron could alert me to such funky deps, that’d be schnufte indeed.

@benmss
Copy link
Member

benmss commented Apr 15, 2024

It appears the commit finder does not work with the IANA format currently. Thanks @jenstroeger for bringing this to our attention.

For the 2013.6 and beyond versions it works fine.

@jenstroeger
Copy link
Contributor Author

I suspect that any versioning scheme that follows an incremental, number based approach (see outdated Python PEP-440 and beyond) might just work, but arbitrary repos and packages — especially of other language ecosystems — might not. See also here.

@jenstroeger
Copy link
Contributor Author

@benmss looks like this issue can be closed?

Did you sample a (large) set of package metadata from PyPI and check what version strings they’re using? For example

> curl -s https://pypi.org/pypi/pytz/json | jq '.releases | keys[]'
"2004a"
"2004b"
"2004b.2"
"2004d"
"2005a"
...

gives you all pytz releases, however, you might have more luck digging through the Project Metadata Table.

One more question:

# Detect versions that end with a zero, so the zero can be made optional.
has_trailing_zero = len(split) > 2 and split[-1] == "0"
why does that trailing zero receive special treatment?

@benmss
Copy link
Member

benmss commented May 2, 2024

For now only the pytz package was tested, as that particular format was triggering a bug. A handful of other Python packages have been tested previously with no issues. Larger scale testing has been done for other languages. Not for Python just yet but it is planned.

why does that trailing zero receive special treatment?

This is because some versions have one or more additional zeros when compared to their tag.
E.g. Version 1.0.0 vs. Tag 1.0, or Version 2.0.0 vs Tag v2

It would be nice to have a check in Macaron that reports if a project does not follow semantic versioning?

@behnazh-w @jenstroeger I have created a new issue for this, and I encourage discussion of that there.

@benmss benmss closed this as completed May 2, 2024
@jenstroeger
Copy link
Contributor Author

jenstroeger commented May 2, 2024

This is because some versions have one or more additional zeros when compared to their tag. E.g. Version 1.0.0 vs. Tag 1.0, or Version 2.0.0 vs Tag v2

Hmm 🤔

I ask because the RHS len(split) > 2 and split[-1] == "0" “feels” hacky and fragile to me — what if split[-1] is "000"?

Instead, I think Python provides interesting iterators1 that you can use to compare two version strings for equality (I assume that’s your primary goal). Now let’s assume that a version string is composed of numbers separated by e.g. a "." and all prefix and suffix alphanumerical fluff has been stripped. Then

>>> from itertools import zip_longest
>>> 
>>> v = "2.0.0.0"  # A version string.
>>> t = "2"  # A tag string representing a version.
>>> all(int(sv) == int(st) for sv, st in zip_longest(v.split("."), t.split("."), fillvalue="0"))
True

compares these two version strings, no matter how many fragments and how many zeroes each of the version fragments is made of.

I have created a new issue for this, and I encourage discussion of that #728.

Thank you, I’ll take a look.

Footnotes

  1. Take a look at the built-in itertools package, and the useful more-itertools package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commit-finder The issues related to commit finder
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants