Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ambigous name/version splits in nix #854

Closed
AMDmi3 opened this issue May 18, 2019 · 27 comments
Closed

Handle ambigous name/version splits in nix #854

AMDmi3 opened this issue May 18, 2019 · 27 comments

Comments

@AMDmi3
Copy link
Member

AMDmi3 commented May 18, 2019

Nix metadata format does not split package names from versions (e.g. "name": "foo-1.0"). On top of that, both name and version may have hyphens in them. So there's no way to split name and version reliably, and though we use some heuristics (take largest part from the right which starts with the number), there are cases which are processed incorrectly (liblqr-1-VER, python3.6-3to2-VER, polkit-qt-1-qt5-VER etc.). I'm running into a lot of these lately, so this needs to be fixed.

@ryantm @volth could nix dump format be extended to provide separate name and/or version field(s)?

@AMDmi3
Copy link
Member Author

AMDmi3 commented May 18, 2019

There is an ongoing migration to supply each package with pname and version

That's great! How much of it made its way into unstable dump? I'm already seeing some "version"s, but not "pname"s yet. Though it's enough to split name reliaby, I'm checking it right now.

Nix has a builtin function to split the name, but it seems that it would fail on your examples

It turns out that I'm already using it, it's even mentioned in the source comment that references https://github.com/NixOS/nix/blob/master/src/libexpr/names.cc#L19.

It would be nice it repology will list packets with unparseable names in its the list of problems.
it should result in quick fix, people love to fix the list of problems.

Unfortunately problems support is currently quite limited, however this information can be dumped to update log (available from https://repology.org/repositories/updates#nix_unstable).

@AMDmi3
Copy link
Member Author

AMDmi3 commented May 18, 2019

Update: I've tried to use "version" in the parser

There are some inconsistencies:

xzoom-0.3.24, version 0.3
riot-desktop-1.1.0, version empty

However much more packages were fixed:

-nix_unstable fuse 7z-ng-git-2014-06-08
+nix_unstable fusefs:7z-ng 2014-06-08
-nix_unstable lisp-trivial-utf 8-20111001-darcs
+nix_unstable lisp-trivial-utf-8 20111001-darcs
-nix_unstable wmii-hg 2012-12-09
+nix_unstable wmii hg-2012-12-09

Some packages with version not starting with number are now parsed (like lisp-drakma v2.0.4).
Some packages are now parsed with hg/git as version prefix, not package name suffix, which is good for matching them with other repos.

It would be nice it repology will list packets with unparseable names in its the list of problems.

It doesn't make sense to dump all potentially ambiguous names - these are basically everything with two or more hyphens, and there is too many of them.

AMDmi3 added a commit to repology/repology-rules that referenced this issue May 18, 2019
AMDmi3 added a commit that referenced this issue May 18, 2019
- Switch to PackageMaker context manager
- Try to use new "version" field to reliably split package name and
  version
- Improve logging
@AMDmi3
Copy link
Member Author

AMDmi3 commented May 18, 2019

I meant, to mark the packages without pname and version as problematic.

Well as of now it's all of them, so it doesn't make much sense. It makes sense though to report most suspicious ones. I've found that most of packages I've had to add exceptions for fall under '-[0-9]+[a-z]' regexp. I'm logging these now.

I'm deploying update with "version" aware parser and extended logging, so keep an eye on the update log.

@mvp
Copy link

mvp commented Sep 10, 2019

Note that pname and version are explicitly present in nix metadata.
For example, my package uhubctl is incorrectly detected as:

pname: uhubctl-unstable
version: 2019-07-31

when it should be:

pname: uhubctl
version: unstable-2019-07-31

However, if you follow link to nix package from https://repology.org/project/uhubctl/versions, it has pname and version specified separately:
https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/misc/uhubctl/default.nix#L7-L8

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 10, 2019

Note that pname and version are explicitly present in nix metadata.

Not in json which repology uses. @volth ping? It also haven't updated since Aug 30th.

% curl --silent https://nixos.org/nixpkgs/packages-unstable.json.gz | gunzip | jq .packages.uhubctl
{
  "name": "uhubctl-unstable-2019-07-31",
  "system": "x86_64-linux",
  "meta": {
    "available": true,
    "description": "Utility to control USB power per-port on smart USB hubs",
    "homepage": "https://github.com/mvp/uhubctl",
    "license": {
      "fullName": "GNU General Public License v2.0 only",
      "shortName": "gpl2",
      "spdxId": "GPL-2.0-only",
      "url": "http://spdx.org/licenses/GPL-2.0-only.html"
    },
    "maintainers": [
      {
        "email": "pavol@rusnak.io",
        "github": "prusnak",
        "githubId": 42201,
        "keys": [
          {
            "fingerprint": "86E6 792F C27B FD47 8860  C110 91F3 B339 B9A0 2A3D",
            "longkeyid": "rsa4096/0x91F3B339B9A02A3D"
          }
        ],
        "name": "Pavol Rusnak"
      }
    ],
    "name": "uhubctl-unstable-2019-07-31",
    "outputsToInstall": [
      "out"
    ],
    "platforms": [
      "aarch64-linux",
      "armv5tel-linux",
      "armv6l-linux",
      "armv7l-linux",
      "mipsel-linux",
      "i686-linux",
      "x86_64-linux",
      "powerpc64le-linux",
      "riscv32-linux",
      "riscv64-linux",
      "x86_64-darwin",
      "i686-darwin",
      "aarch64-darwin",
      "armv7a-darwin"
    ],
    "position": "pkgs/tools/misc/uhubctl/default.nix:23"
  }
}

@mvp
Copy link

mvp commented Sep 10, 2019

Perhaps nix can add 2 more fields to json output: pname and version?
That should be still backwards compatible and solve this problem for repology.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 10, 2019

It's mentioned in the discussion above, these fields are (or at least should be) present in the json. For instance, abcl:

{
  "name": "abcl-1.5.0",
  "system": "x86_64-linux",
  "meta": {
    "available": true,
    "description": "A JVM-based Common Lisp implementation",
    "homepage": "https://common-lisp.net/project/armedbear/",
    "license": {
      "fullName": "GNU General Public License v3.0 only",
      "shortName": "gpl3",
      "spdxId": "GPL-3.0-only",
      "url": "http://spdx.org/licenses/GPL-3.0-only.html"
    },
    "maintainers": [
      {
        "email": "7c6f434c@mail.ru",
        "github": "7c6f434c",
        "githubId": 1891350,
        "name": "Michael Raskin"
      }
    ],
    "name": "abcl-1.5.0",
    "outputsToInstall": [
      "out"
    ],
    "platforms": [
      "aarch64-linux",
      "armv5tel-linux",
      "armv6l-linux",
      "armv7l-linux",
      "mipsel-linux",
      "i686-linux",
      "x86_64-linux",
      "powerpc64le-linux",
      "riscv32-linux",
      "riscv64-linux"
    ],
    "position": "pkgs/development/compilers/abcl/default.nix:34",
    "version": "1.5.0"
  }
}

Not sure why's it not present for uhubctl.

Also, notably, there are no pnames in the json at all. While it seems like we can split name/version reliably by trimming version from name, it doesn't work in all cases in practice:

https://repology.org/log/1418803

...
winePackages.staging: ERROR: name "wine-4.14-staging" does not end with version "4.14"
wine-staging: ERROR: name "wine-4.14-staging" does not end with version "4.14"
wineWowPackages.staging: ERROR: name "wine-wow-4.14-staging" does not end with version "4.14"
...
xzoom: ERROR: name "xzoom-0.3.24" does not end with version "0.3"
...

@mvp
Copy link

mvp commented Sep 11, 2019

Can you use attribute name as package name, and work out version from it?
https://nixos.org/nixos/packages.html?channel=nixpkgs-unstable&query=uhubctl

Note that full json https://nixos.org/nixpkgs/packages-unstable.json.gz has a field package name in its hierarchy, e.g:

    "uhubctl": {
      "name": "uhubctl-unstable-2019-07-31",
      "system": "x86_64-linux",
      "meta": {
        "available": true,
        "description": "Utility to control USB power per-port on smart USB hubs",
        "homepage": "https://github.com/mvp/uhubctl",
        "license": {
          "fullName": "GNU General Public License v2.0 only",
          "shortName": "gpl2",
          "spdxId": "GPL-2.0-only",
          "url": "http://spdx.org/licenses/GPL-2.0-only.html"
        },
        "maintainers": [
          {
            "email": "pavol@rusnak.io",
            "github": "prusnak",
            "githubId": 42201,
            "keys": [
              {
                "fingerprint": "86E6 792F C27B FD47 8860  C110 91F3 B339 B9A0 2A3D",
                "longkeyid": "rsa4096/0x91F3B339B9A02A3D"
              }
            ],
            "name": "Pavol Rusnak"
          }
        ],
        "name": "uhubctl-unstable-2019-07-31",
        "outputsToInstall": [
          "out"
        ],
        "platforms": [
          "aarch64-linux",
          "armv5tel-linux",
          "armv6l-linux",
          "armv7l-linux",
          "mipsel-linux",
          "i686-linux",
          "x86_64-linux",
          "powerpc64le-linux",
          "riscv32-linux",
          "riscv64-linux",
          "x86_64-darwin",
          "i686-darwin",
          "aarch64-darwin",
          "armv7a-darwin"
        ],
        "position": "pkgs/tools/misc/uhubctl/default.nix:23"
      }
    },

This way your script can know that actual package name is uhubctl, and thus everything after dash in a name must be a version.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 11, 2019

You'd still need to extract version part from name, and that won't work as attribute name is different from the name part in name. Besides, attribute names contains even more garbage than package names.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 11, 2019

I recently added pname to the majority of nixpkgs derivations, so pname should be reliable.

This is great, awaiting it's properly exported via the dump.

I wonder, could it be possible that pnames in the dump won't contain addendums such as asciidoc-full asciidoc-full-with-plugins or arm-trusted-firmware-sun50iw1p1, arm-trusted-firmware-sun50i-h6, ...? These are so so numerous I refuse to merge them any more.

@jtojnar
Copy link
Contributor

jtojnar commented Sep 11, 2019

I wonder, could it be possible that pnames in the dump won't contain addendums such as asciidoc-full asciidoc-full-with-plugins or arm-trusted-firmware-sun50iw1p1, arm-trusted-firmware-sun50i-h6, ...? These are so so numerous I refuse to merge them any more.

We need different names as those are used to distinguish packages by nix-env. Hmm, perhaps we should add something like meta.repologyBasePackage for them.

@jtojnar
Copy link
Contributor

jtojnar commented Sep 11, 2019

I believe those predate the introduction of the the pname version split. Especially since it is done with this highly unidiomatic invocation. We should move that name modification to the package expression itself so that the package name is always the same when doing .override { enableStandardFeatures = true; }.

I consider our introduction of pname and version attributes to be an abstraction over the low-level package-agnostic Nix derivations. Ideally, there would be no name in Nixpkgs expressions, only in the low-levels of mkDerivation mapping it onto Nix’s derivation primitive. (And I have heard some interest in introducing native package primitive to Nix, which could, in the future, allow us to drop name altogether.)

By the way, in many packages we currently have unstable a part of version, rather than pname suggested in https://nixos.org/nixpkgs/manual/#sec-package-naming. I know I am guilty of promulgating this and we should probably fix that before switching Repology to pname & version.

Maybe we should discuss these issues in Nixpkgs issue tracker instead.

@jtojnar
Copy link
Contributor

jtojnar commented Sep 11, 2019

There is no introduction of pname version split ;) Each derivation has a name, even if it is not represent a package (as repology see a package). For example fetchurl and fetchpatch and runCommand and buildEnv are Nix derivations too, and they do have a name, but no pname nor version.

I am aware that there is no actual package primitive but we do have packages on conceptual level and I would consider the introduction of pname support to mkDerivation instrumental in the imaginary breaking away of package from derivation. I consider the fact that a package is also a derivation with a name an implementation detail.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 11, 2019

We need different names as those are used to distinguish packages nix-env

I belive that since pname/versions were just recently introducted, nix-env just cares of name, so pname could still contain upstream name, and name may consist of more parts than just pname and version as volth suggests

name is not always equal to pname+"-"+version, it is just a default value.

pname = "asciidoc";
version = "1.0";
name = "${pname}-full-with-plugins-${version}";

Hmm, perhaps we should add something like meta.repologyBasePackage for them.

If it can't be pname, just basePackage or basename or alike if it's possible. I don't believe there's anything specific to Repology here and I don't want repos to introduce any Repology specific things. Something close to upstream project name has a lot more uses than just Repology.

@jtojnar
Copy link
Contributor

jtojnar commented Sep 11, 2019

I belive that since pname/versions were just recently introducted, nix-env just cares of name, so pname could still contain upstream name

We could make nix-env use pname but currently, that attribute is Nixpkgs only concept. It would be more systematic to open an RFC to make Nix aware of pname rather than handle it ad-hoc in different places.

name may consist of more parts than just pname and version as volth suggests

As I explained above, I am not very fond of this idea since pname is the name of the package, not the project; and the cases where name is different than pname are historical relics.

  • name – Derivation name; an internal detail of Nix language; for packages, it is typically ${pname}-${version}.
  • pname – Package name; Nix does not recognize this, only as a part of ${name} before the first dash that is followed by a number; used for finding the package with nix-env (for performance reasons, you are better off using the attribute path, though)

We could to redefine and the values as follows:

  • name – Derivation name; an internal detail of Nix language; for packages, it is typically ${pname}-${variant}-${version}.
  • pname – Project name; when variant is not specified this is also a package name.
  • variant – Build configuration name; used to distinguish different package variants in nix-env

Alternately, we could consider dropping the variant names from package names, as they are primarily used by the slow legacy nix-env (everything now uses attribute paths). The only other place they figure are Nix store paths but it is not granular enough to describe the expression anyway.

Hmm, perhaps we should add something like meta.repologyBasePackage for them.

If it can't be pname, just basePackage or basename or alike if it's possible. I don't believe there's anything specific to Repology here and I don't want repos to introduce any Repology specific things. Something close to upstream project name has a lot more uses than just Repology.

I agree. Something like canonicalPackage could be actually useful for our auto-update infrastructure.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 11, 2019

We could to redefine and the values as follows:

  • name – Derivation name; an internal detail of Nix language; for packages, it is typically ${pname}-${variant}-${version}.
  • pname – Project name; when variant is not specified this is also a package name.
  • variant – Build configuration name; used to distinguish different package variants in nix-env

For Repology this is the most suitable (as long as pname is published). Am I missing something, or is separate canonicalPackage not needed in this schema because there's variant available?

@jtojnar
Copy link
Contributor

jtojnar commented Sep 11, 2019

Yeah, we have two parallel ways for selecting packages:

You can let Nix traverse the whole nixpkgs and try to find a derivation (think JSON document with the package data) with matching pname portion of name attribute. This is the very slow but it is still supported for legacy reasons.

Or you can get the derivation directly if you know the attribute path that leads to it (e.g. pkgs.python3Packages.numpy).

Obtaining any attribute from the derivation is easy, but getting the derivation from pname is inefficient.

Updaters will want to know the attribute path or canonical derivation itself (Nix is lazily evaluated) to spare themselves of the slowness of the first method.

@jtojnar
Copy link
Contributor

jtojnar commented Sep 11, 2019

Actually, what about using the attribute paths (the keys in the JSON file) instead of derivation names for Repology?

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 11, 2019

Actually, what about using the attribute paths (the keys in the JSON file) instead of derivation names for Repology?

As I've mentioned above, these contain even more garbage than names.

AMDmi3 added a commit that referenced this issue Nov 7, 2019
@AMDmi3
Copy link
Member Author

AMDmi3 commented Nov 12, 2019

So, I've somehow resolved this issue - I've refactored the parser a bit, it's now simpler and more straightforward in detecting bad data. The update is no longer blocked on separate pname/version from nix, but I'm ready to switch to them as soon as these are provided. All incorrectly named packages are now dropped and logged. There's about 100 of them, so not too many, any they are all listed in parse log (https://repology.org/repositories/updates#nix_unstable). The log contains much less noise now too (repology/repology-rules#297).

While here, I'd like to mention that the new package names (in all their variety) handling mechanism (#931) is now in place, and it could make use of attribute name mentioned by @jtojnar, so I wanted to ask about it's meaning (and difference to pname). In short, Repology now stores a set of names of different purposes for each package:

  1. a string to derive project name from
  2. a string to show to the user as a package name (sometimes more human readable names are available, such as Firefox Browser)
  3. an identifier used to track package in time, e.g. a most stable name (e.g. one not susceptible to changes like py36-foopy37-foo, or foofoo-client, or foofoo-compat0)
  4. a set of names used to refer the package from the outside (Implement endpoints based on original package name repology-webapp#66), currently:
    • source package name
    • binary package name
    • generic name (useful when neither of above is applicable)

I wonder if attribute name would be useful as 3 or some of 4.

@AMDmi3 AMDmi3 closed this as completed Nov 12, 2019
@jtojnar
Copy link
Contributor

jtojnar commented Nov 14, 2019

make use of attribute name mentioned by @jtojnar, so I wanted to ask about it's meaning (and difference to pname).

Nixpkgs is structured as a nested (in Nix parlance) attribute sets (very similar to Python dictionaries or JSON objects). The attribute name (or rather attribute path) denotes along which attributes (keys) you need to traverse to get to a desired package. Most packages are in the top-level attribute set but some things like Haskell, Python libraries, part of GNOME platform packages, Qt libraries… are nested under a common attribute set in the top-level (e.g. gnome3.gnome-boxes or python37Packages.setuptools).

The attribute path is used to unambiguously refer to a package in the package set. There are aliases (e.g. python3Packages = python3.pkgs, python3 = python37, bustle = haskellPackages.bustle, openal = openalSoft), so the mapping from attribute paths to packages is not injective.

For technical reasons, we sometimes also have multiple variants of a single project built with different configure flags, for example:

  • poppler, libsForQt5.poppler, poppler_gi and poppler_utils
  • python27Packages.setuptools and python37Packages.setuptools
  • libpulseaudio, pulseaudio and pulseaudioFull

Those will all be listed in the generated JSON file and may not necessarily have a different name attribute (for instance, they are libpulseaudio-12.2, pulseaudio-12.2 and pulseaudio-12.2, respectively for the aforementioned PulseAudio packages).

So I agree that attribute name would be potentially useful as 3. But as we do not have a notion of a canonical attribute path, the JSON dump chooses python37Packages.setuptools instead of more proper python3.pkgs.setuptools for reasons unknown to me.

And attribute paths can still change. For example, Nixpkgs did not allow dashes in attribute names in the past, so we used dashes instead. Now that the policy is somewhat relaxed, we are gradually moving packages to paths more in line with uppstream name.

But yeah, the attribute path should still be the most stable identifier.


Regarding pname: Nix does not have a concept of project name or package version. There is just a name name attribute in a derivation (something like a concrete realization of a package) that was probably designed as a hint to disambiguate the installation paths (hashes) under /nix/store.

Since the hash of the package expression and the transitive closure of expressions of its dependencies is the primary disambiguator, the name can be basically anything but commonly we use the ${runtime}-${runtime version}-${project name}-${configuration variant}-${project version} (parts omitted when not needed). So the luajitPackages.nvim-client attribute path points to a derivation with a name attribute luajit-2.1.0-beta3-nvim-client-0.2.0-1, that is nvim-client version 0.2.0-1 for luajit version 2.1.0-beta3. We have the same project parametrized with a different runtime under lua53Packages.nvim-client and it has a name lua5.3-nvim-client-0.2.0-1.

Of course the heuristic for extracting version from derivation name (basically a equivalent to regex like (?P<pname>.+?)(?:-(?P<version>[0-9].+))?) is no match for a unstructured attribute like name.


For 4, yeah, the attribute patches would fit the generic name, as it is what we use to refer to packages within nixpkgs.

The closest thing we have to a source packages is a drv file which contains instructions what other derivations to obtain and how to build them into a store path, but those are instantiated locally from the expression located by an attribute path. And as for binary packages, we just check if our binary cache does not already contain a store patch for derivation with that hash and just download it to the store if it does.

abathur added a commit to abathur/nixpkgs that referenced this issue Sep 13, 2022
Two items in resholve's mkDerivation are causing trouble for
some ecosystem tools:

1. I didn't pass through the original package's meta, which breaks the
   ability of at least nixos package search and r-ryantm to find the
   right source file (in the latter case breaking auto updates).

2. I was prepending "resholved-" to the pname, which at least nixos
   package search picks up as the package's name. Repology also tries
   to do this, but their current nix updater will prefer to get this
   data from the name. For now, this means changing to name will not
   stop repology from picking up the `resholved-<package>` names.

   Repology's code makes it clear that they *want* to use the pname/
   version, so I was inclined to settle with what I've got for now,
   but thiagokokada clarified that we aren't just waiting for nixpkgs
   fixes, but because Nix itself isn't exporting the pname/version in
   its JSON. See also:

   - repology/repology-updater#854
   - repology/repology-updater@9313110121df5

   For now, at least, I'll switch to appending "-unresholved" to the
   inner derivation's pname.
abathur added a commit to abathur/resholve that referenced this issue Sep 14, 2022
Two items in resholve's mkDerivation are causing trouble for
some Nix ecosystem tools:

1. I didn't pass through the original package's meta, which breaks the
   ability of at least nixos package search and r-ryantm to find the
   right source file (in the latter case breaking auto updates).

2. I was prepending "resholved-" to the pname, which at least nixos
   package search picks up as the package's name. Repology also tries
   to do this, but their current nix updater will prefer to get this
   data from the name. For now, this means changing to name will not
   stop repology from picking up the `resholved-<package>` names.

   Repology's code makes it clear that they *want* to use the pname/
   version, so I was inclined to settle with what I've got for now,
   but thiagokokada clarified that we aren't just waiting for nixpkgs
   fixes, but because Nix itself isn't exporting the pname/version in
   its JSON. See also:

   - repology/repology-updater#854
   - repology/repology-updater@9313110121df5

   For now, at least, I'll switch to appending "-unresholved" to the
   inner derivation's pname.

Closes #86.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@mvp @AMDmi3 @jtojnar and others