Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versions cannot contain spaces #23

Open
russellbanks opened this issue Feb 3, 2024 · 10 comments
Open

Versions cannot contain spaces #23

russellbanks opened this issue Feb 3, 2024 · 10 comments

Comments

@russellbanks
Copy link
Contributor

I had a user report of an error with parsing a version:

Both of these are due to the same issue of being unable to parse a version with spaces in. The version in question looks like 22.01 ZS v1.5.5 R2 (See winget-pkgs/manifests/m/mcmilk/7zip-zstd). It works correctly if the spaces are replaced with a delimiter, such as -.

Versioning::from_str("22.01 ZS v1.5.5 R2").unwrap(); // IllegalVersioning("22.01 ZS v1.5.5 R2")
@fosskers
Copy link
Owner

fosskers commented Feb 3, 2024

Spaces! My lord what has the world come to.

Okay I've looked at the associated issues on your repository, as well as the 7zip releases page. It seems that the title of the releases uses spaces, but the git tag has proper - deliminations. I gather your tool is pulling from their releases via the title, then? Could you fill me in with some of the background information about how these version numbers are gathered and how they're used for further processing?

Accepting spaces in version numbers (even in the Mess type) may add some serious complexity, especially since the parsing behaviour of all three subtypes (SemVer, Version, and Mess) propagate up to the parent, Versioning. If I allow spaces, then the line between what's a version and what isn't (given say a line of words) becomes very blurry.

@russellbanks
Copy link
Contributor Author

russellbanks commented Feb 3, 2024

My tool allows users to add or update applications to winget (Microsoft's package manager for Windows). Each application has some metadata files and in those there is a package version (Example). The package versions at winget-pkgs/manifests/m/mcmilk/7zip-zstd are already pre-existing versions for 7zip-zstd in winget. This is where my tool gets the versions from (and also a new version from the user if they're adding a new version). Because Microsoft has very little limitations on what is a valid version, my tool needs to be able to parse and order any version of any app that is submitted to winget so that I can know which metadata file is the latest to build upon.

In Microsoft's schema for this, they actually just have a regex for validating the version:

"PackageVersion": {
  "type": "string",
  "pattern": "^[^\\\\/:\\*\\?\"<>\\|\\x01-\\x1f]+$",
  "maxLength": 128,
  "description": "The package version"
},

In short, the regex just says it can't contain some specific control characters and beyond that, what constitutes a version can be anything, and winget will attempt to order it regardless.

@fosskers
Copy link
Owner

The following test passes:

    #[test]
    fn mess_7zip() {
        cmp_messes("22.01-ZS-v1.5.5-R2", "22.01-ZS-v1.5.6-R2");
    }

    fn cmp_messes(a: &str, b: &str) {
        let x = Mess::new(a).unwrap();
        let y = Mess::new(b).unwrap();

        assert!(x < y, "{} < {}", x, y);
    }

But so does:

        cmp_messes("22.01-ZS-v1.5.5-R2", "24.02-ZS-v1.6.0");

Since even with the - injected in, these versions are deep into lawlessness seemingly mixing calver, semver, and other noise, and so must be parsed as a Mess. Note that here, the calver section basically dominates the comparison, but this may be sufficient for your purposes for comparing at least this particular edge case.

I think it's best to remain firm that whitespaces shouldn't be accepted in version numbers, even mostly lawless ones as we see here. If they are present, I think it's reasonable to do some preprocessing in downstream code before parsing.

@russellbanks
Copy link
Contributor Author

Preprocessing is unfeasible in my case as I take in a version straight from a clap parameter and serialize it into Yaml. Even if I were to do some changes to the version, I need it to be the exact same as what the user inputted to what value is serialized in the end. If it were more feasible, it would also add a lot of unnecessary complexity just to workaround this issue of mess not working with spaces.

I'd really appreciate it if you could add support for spaces - it would arguably make this library more robust with the intention of being able to parse any messy version and still be able to compare it.

@fosskers
Copy link
Owner

Let's discuss this. Were I to support spaces in Mess, there are two breaking changes that come to mind:

  1. The addition of a variant in the Sep enum. These kinds of changes, while technically breaking, almost certainly affect no one but me. I doubt anyone is actually pattern matching on Sep anywhere in real code.
  2. The behaviour of the exposed Mess::parse nom parser. Since this could technically be plugged into any downstream parser combination. A sentence like Hello 1.2.3 Rust that expected the structure word <> version <> word would now fail, with the Rust now being parsed as part of the version. Similar to the above: is anybody actually doing this anywhere? I don't know, but it might not be impossible to individually check: https://crates.io/crates/versions/reverse_dependencies

@russellbanks
Copy link
Contributor Author

  1. The addition of a variant in the Sep enum.

I do think it's unlikely that anyone is matching over Sep. If anyone were, they could either use _ when matching or add the space seperator to their match. However, I've checked the 18 packages in the reverse dependencies of this crate on crates.io by searching the code on their respective GitHub repositories for versions:: and none of them import or match Sep.

  1. The behaviour of the exposed Mess::parse nom parser.

Again, from checking the reverse dependencies, no one appears to be using this. The usages I saw were mostly Versioning, SemVer, and a couple usages of Chunk. Granted these are only ones that are on crates.io, but although this may technically be breaking, I can't see it having an impact or even affecting any code bases.

@russellbanks
Copy link
Contributor Author

I've since had another user report an instance of an app having a version with spaces in it: 7.3 2022-02-28 r5338 (sf-7.3-1) - russellbanks/Komac#447 (comment)

@fosskers
Copy link
Owner

Thanks for the follow-up. I do intend to solve this as soon as I'm able.

@9999years
Copy link

9999years commented Jun 21, 2024

Hello, I'm running into this as well. I'm using Versioning to compare PostgreSQL version numbers, to determine if an upgrade is needed.

The output looks a bit like this:

$ psql postgres --pset format=csv -c 'SHOW server_version'
server_version
14.12 (Homebrew)

Here, Homebrew is a suffix that can basically be disregarded. (I probably wouldn't want 14.12 (Homebrew) to compare greater or less than 14.12 (NixOS), for instance.)

I can work around this by removing the suffix, but I thought I'd make a comment here.

It might be nice to have a "permissive parser" which ignores or otherwise segments out extra information like this.

@fosskers
Copy link
Owner

Hi there. In this case I'd recommend a preemptive call to https://doc.rust-lang.org/std/primitive.str.html#method.split_once to remove anything past the first whitespace, and then call Versioning::new.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants