Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add changes required by Nimble lock file support #12104

Merged
merged 1 commit into from Jul 15, 2021

Conversation

bobeff
Copy link
Contributor

@bobeff bobeff commented Sep 1, 2019

Implemented support for Nimble local cache with package directories a checksum of the package at the end of their names. Now the compiler supports package paths in the form:

  • /path_to_nimble_cache_dir/pkgs/package_name-1.2.3-FEBADEAEA2345E777F0F6F8433F7F0A52EDD5D1B

  • /path_to_nimble_cache_dir/pkgs/package_name-#head-042D4BE2B90ED0672E717D71850ABDB0A2D19CD2

  • /path_to_nimble_cache_dir/pkgs/package_name-#branch-name-DBC1F902CB79946E990E38AF51F0BAD36ACFABD9

Related to nim-lang/nimble#127

@Araq
Copy link
Member

Araq commented Sep 2, 2019

Seems to do way too much, since these hashes are ignored, why not ignore them in the parsing step and leave the rest of the code as it is?

@dom96
Copy link
Contributor

dom96 commented Sep 2, 2019

What are you intending to use these hashes for? Why do they need to be appended to the package directories?

@bobeff
Copy link
Contributor Author

bobeff commented Sep 3, 2019

Seems to do way too much, since these hashes are ignored, why not ignore them in the parsing step and leave the rest of the code as it is?

@Araq The hashes are used by Nimble itself. If you look at the old Nimble and compiler code before my changes, you will find two nearly identical pieces of code:

I just merged them. Nimble already heavily relies on imports of the compiler code.

The implementation also supports the old format of the package directories for backward compatibility of the compiler with older Nimble versions.

@bobeff
Copy link
Contributor Author

bobeff commented Sep 3, 2019

What are you intending to use these hashes for? Why do they need to be appended to the package directories?

@dom96 If you look at the specification of the feature checksum is needed for each package to determine that it is the exactly same version as the required in the lock file in order to achieve reproducible builds. Even for tagged versions Git tags are not enough because they are not immutable.

I decided to store the checksum in the package directory, because this way the required by the lock file package can be found very fast without entering in every directory to search for it. Also this way you can have two or more different branch versions

  • package_name-#branch_name-efba335dccf2631d7ac2740109142b92beb3b465
  • package_name-#branch_name-8f995e59d6fc1012b3c1509fcb0ef0a75cb3610c

and different projects to depend to different library versions from the same branch.

To store the checksum in the directory name is common practice in other package managers. See Nix for example.

@bobeff
Copy link
Contributor Author

bobeff commented Sep 3, 2019

@Araq and @dom96 I don't know why in the first place the compiler have to know about Nimble. When I had changed the package directory names structure the compiler simply stopped to work properly and I had to fix it. If you want, I can try to remove this dependency.

@zah
Copy link
Member

zah commented Sep 3, 2019

My recommendation to @bobeff was to reduce the reliance on the NimbleDir directory in the compiler and instead to supply the correct paths to the locked-down versions of the packages every time you invoke Nim through a command such as nimble build (i.e. all the paths such as ~/.nimble/pkgs/somelib-somehash can be given as regular -p:<path> dirs).

He has taken the approach here, because the suggested alternative has proven to require more significant changes, but I still think it's worth pursuing.

If the question is "Why do we need hashes in the first place?", the answer is that the recent stories of package hijacking in other eco-systems now make this additional security measure a requirement. We must ensure that a dependency in the lockfile can't be silently replaced by a malicious version. The layout of the Nimble cache is just an optimisation that makes it easier to resolve your dependencies more quickly in everyday Nimble operations.

@zah
Copy link
Member

zah commented Sep 3, 2019

As a counter-point, I should mention that there are few additional concerns that suggest that the compiler should be more aware of the Lockfiles.

  1. Tools like nimsuggest and nim check should work out of the box. The scheme I recommended above requires you to use nimble as a starting point when you invoke such commands.

  2. @yglukhov has pointed out that a more advanced package system may allow for imports to be resolved in a more context-dependent way.

When you have the following diamond dependecies:

A -> B
A -> C
C -> D 1.0
B -> D 2.0

The package D must be resolved to a different path when imported from C or from B. A similar concern arises when using different packages sharing the same name. It's possible to solve this problem through adding a slightly more advanced alternative to the -p: option, but another way would be to teach the compiler how to handle the contents of the Lockfiles.

@stefantalpalaru
Copy link
Contributor

all the paths such as ~/.nimble/pkgs/somelib-somehash can be given as regular -p:<path> dirs

You'll hit some command line size limits there. Windows command lines seem limited to 2047 characters: https://support.microsoft.com/en-us/help/830473/command-prompt-cmd-exe-command-line-string-limitation

@zah
Copy link
Member

zah commented Sep 3, 2019

@stefantalpalaru, that's right. This is usually handled by adding support for command files. In Nim, you can argue that a temp cfg file can fit that role, but not needing temp files is certainly more elegant.

@mratsim
Copy link
Collaborator

mratsim commented Sep 3, 2019

If hashes are used in the path, only the first 8 characters should be used from the hash:

@arnetheduck
Copy link
Contributor

generally it seems wrong that the compiler knows this much about the package manager - the paths/extended-module-names could easily be passed in (allowing other package mgmt strategies etc)

nimsuggest and friends could easily be aware of nimble or lock files instead of burdening the compiler with this knowledge.

otherwise, what's the point of splitting package manager and compiler? this split is already a bit half-baked in nim, with the nim compiler doing all kinds of things that are typically left to outside tools (rst template processing to multiple formats, ctags etc etc!!??)

a better and more well-defined interface between compiler and package manager would make the compiler much easier to understand - fewer "magic" paths that it looks in, so when something goes wrong, one has a fair chance of understanding what it was.

@cheatfate
Copy link
Member

Or compiler can ask package manager about modules somehow, e.g. by specific call with arguments.

@arnetheduck
Copy link
Contributor

Or compiler can ask package manager about modules somehow, e.g. by specific call with arguments.

asking creates the same kind of two-way information flow between package manager and compiler which represents a similar kind of complexity - each project must be aware of the other which is what one would want to avoid

@zah
Copy link
Member

zah commented Sep 3, 2019

nimsuggest and friends could easily be aware of nimble or lock files instead of burdening the compiler with this knowledge.

nim check is the compiler and nimsuggest is just a very slightly modified version of the compiler codebase.

@arnetheduck
Copy link
Contributor

nimsuggest and friends could easily be aware of nimble or lock files instead of burdening the compiler with this knowledge.

nim check is the compiler and nimsuggest is just a very slightly modified version of the compiler codebase.

yes, unfortunately - the compiler code is quite a lot harder to read because of this, with the extra when jungle that kicks in sometimes.

nimble check and nimble suggest would be the canonical way to do checks in a package-manager-enhanced world, wrapping the existing commands with the pm-enhanced paths, keeping concerns separated.

@alehander92
Copy link
Contributor

alehander92 commented Sep 3, 2019 via email

@arnetheduck
Copy link
Contributor

Not if you have a protocol that other package managers can use

indeed, but this is orthogonal - if the nim compiler exposes an api that is sufficiently rich, any package manager can use it (which is nice / a good thing) - but if that API is two-way, it is inherently more complex because of how data flows.

@Araq
Copy link
Member

Araq commented Sep 3, 2019

Well guess what, we figured it out too, see nim-lang/nimble#654

@Araq
Copy link
Member

Araq commented Sep 3, 2019

Nimble already heavily relies on imports of the compiler code.

The newer Nimble doesn't and nimscriptsupport.nim is all dead code afaik.

@genotrance
Copy link
Contributor

If the Nim compiler has to be touched, I'd rather see #654 implemented as part of this work.

Meanwhile, Nimble should continue to be backwards compatible thru this change. As of today, you can run the latest Nimble even with 0.19.6. Perhaps lock file support won't work with an older compiler but existing functionality shouldn't break.

@dom96
Copy link
Contributor

dom96 commented Sep 3, 2019

Whoa, this discussion has gone a bit off topic. Please consider moving such discussions to separate threads/issues. You will likely find that many of these have already been discussed in Nimble's issue tracker (I saw a lot of things mentioned that are already implemented and/or planned).

As far as my original question. I understand why hashes are a necessity for lock files support, what I am wondering is why you are tacking on another hash to the pkg file paths when these paths can already include a Git/Hg commit hash: package_name-#commit_hash. You should not need this additional mechanism.

@dom96
Copy link
Contributor

dom96 commented Sep 3, 2019

Btw just to clear up some misconceptions: Nimble already passes paths explicitly to Nim when calling nimble build/nimble c. Nim is only aware of Nimble to make it easier for Nim programmers to get started (creating a Nimble project can be considered too much friction for quick scripts), perhaps you disagree and that's fine, we can argue about changing this, but let's do that in a separate issue.

@bobeff
Copy link
Contributor Author

bobeff commented Sep 3, 2019

why you are tacking on another hash to the pkg file paths when these paths can already include a Git/Hg commit hash

@dom96

  • The one reason is that two commits can have exactly the same content as a result of history rewriting.
  • The other one is that in the future we can implement a tarball download method. In the case of GitHub the downloaded file still contains a short version of the Git commit in its name and most probably by using the GitHub API, we can obtain even the full one for given tag or branch if needed, but maybe someone will want to download tarballs from some other place where there are no such capabilities.

One possible optimization is that we can reuse the top level tree hash of a Git commit and to implement exactly the same algorithm which Git uses, when another download method is being used. Git and Mercurial use different methods for calculating their tree hashes. I don't implement it in this way because:

  • It is a bit harder by requiring me to figure out how Git calculates its tree hashes for example by looking at its source code.
  • It will favor one of the used VCSes, although this will be a practical solution after Git is the most widely used one now, and most probably it will stay such in the foreseeable future.
  • After implementing a tarball download method I expect it to be faster than cloning a repository and because of this, it to become the main used method, although as it seems there is a way to query a Git tree hash remotely at least when using GitHub.

I still can implement this optimization as a final step before the pull request, if there is a strong interest in it.

@bobeff
Copy link
Contributor Author

bobeff commented Sep 4, 2019

but let's do that in a separate issue.

@Araq I also think that this is out of the scope of the lock files feature. After all despite that the code of the procedure has become somehow more clumsy, I didn't add anything new as a dependency between Nim and Nimble.

I propose for now to make the directory names to contain only the first few characters of the hash as it was suggested above, if this is the final decision? If so, the question is how many are appropriate:

  • mratsim proposed the first 8.
  • It seems that the first 7 are GitHub standard for short commit links.
  • git log --oneline gives the first 9.
  • For some reason Nix uses:

160-bit truncations of SHA-256 hashes encoded in a base-32 notation

@dom96
Copy link
Contributor

dom96 commented Sep 4, 2019

The one reason is that two commits can have exactly the same content as a result of history rewriting.

How does this affect us in a practical sense? Can you elaborate on why this is a problem?

but maybe someone will want to download tarballs from some other place where there are no such capabilities.

Even in this case, you can easily reuse this mechanism, no? You just calculate a hash for your tarballed package and reuse the same #hash suffix. The only problem is that the hash of a tarball won't match the commit hash if downloaded, but I think that's a minor annoyance. Let's not overcomplicate things for a potential future feature.

@zah
Copy link
Member

zah commented Sep 5, 2019

@dom96, I've had several discussions with @bobeff regarding the hashing algorithms being used. An important property that we need is that the hash should not depend on the download method being used. It should depend only on the contents of the package. In other words, the same package obtained through different methods should resolve to the same hash.

It's possible to use a git tree hash to identify the contents (this should not be confused with a git commit hash), but this would require us to implement a fully compliant tree hashing algorithms that works exactly as git (to be used when you download a tarball). In return, we'll be able to skip some of the hashing when working with locally developed git repos.

What I recommended to @bobeff in the end is to use a scheme similar to multihash, so that we'll be able to add extra hashing schemes such as git tree hashes or mercurial manifest hashes in the future, but for now we'll launch with a single hashing algorithms defined by Nimble itself.

@dom96
Copy link
Contributor

dom96 commented Sep 5, 2019

In other words, the same package obtained through different methods should resolve to the same hash.

Okay, but I don't understand why this is necessary, can you explain?

@zah
Copy link
Member

zah commented Sep 6, 2019

I would say that having this property has a long tail of various benefits, but here is one particular way to see why it's necessary:

  1. The hash in the lock file should match the used hash in the cache (otherwise, it's not cheap to verify that the dependencies in the lock file are satisfied).

  2. The hash in the lock file must authenticate the contents of all downloaded code (otherwise, the package hijacking attacks would be possible)

  3. Nimble should be able to transparently switch to Github tarballs because this will give us better reliability (git clones fail more often on flaky connections) and speed (a tarball is often smaller than a git clone). To authenticate a tarball, you need a hash that depends only on the contents of the package. Since we need to specify the format of the hash in the lock file now and points 1 and 2 require us to use the same hash throughout the system, we arrive at the currently proposed solution.

@bobeff
Copy link
Contributor Author

bobeff commented Sep 6, 2019

The newer Nimble doesn't and nimscriptsupport.nim is all dead code afaik.

@Araq You are right. I forgot that I even removed this file in nim-lang/nimble@134b799 :D

@dom96
Copy link
Contributor

dom96 commented Sep 7, 2019

Okay, I'm inclined to accept this, however I want to see the lock file implementation in Nimble first in order to understand precisely how these hashes are going to be used (this PR shouldn't block you). It might very well turn out that we can do this differently and no changes in the compiler will be necessary, let's postpone this PR until lock files in Nimble are ready.

@bobeff bobeff force-pushed the feature/nimble-lockfile-support branch from d9a6246 to 1757aad Compare June 11, 2021 15:43
@bobeff bobeff force-pushed the feature/nimble-lockfile-support branch 2 times, most recently from 6623149 to 9a05edb Compare July 1, 2021 22:37
@bobeff
Copy link
Contributor Author

bobeff commented Jul 3, 2021

Could you merge this to enable CI for the Nimble lock file support branch?

@dom96
Copy link
Contributor

dom96 commented Jul 4, 2021

@bobeff you can change the Github Actions file to point at this branch until we can merge this.

@bobeff bobeff force-pushed the feature/nimble-lockfile-support branch from 9a05edb to 7902a1f Compare July 5, 2021 06:38
lib/std/sha1.nim Show resolved Hide resolved
@bobeff bobeff force-pushed the feature/nimble-lockfile-support branch from 7902a1f to 89907c2 Compare July 5, 2021 12:22
lib/std/sha1.nim Outdated Show resolved Hide resolved
@bobeff bobeff force-pushed the feature/nimble-lockfile-support branch from 89907c2 to d5e4998 Compare July 6, 2021 11:15
@bobeff bobeff closed this Jul 6, 2021
@bobeff bobeff reopened this Jul 6, 2021
@bobeff
Copy link
Contributor Author

bobeff commented Jul 6, 2021

I'm not sure why the CI is failing. It seems to be not related to the pull request.

compiler/nimblecmd.nim Outdated Show resolved Hide resolved
@arnetheduck
Copy link
Contributor

arnetheduck commented Jul 6, 2021

There still seem to be several unaddressed comments here: ie locking the algorithms that the package manager must use to introduce compile paths, the use of sha1 vs a multihash scheme that would allow, among other things, to use git hashes directly in the future etc.

@arnetheduck
Copy link
Contributor

ie what's version selection doing the compiler? why does the compiler have to know about package versioning schemes at all? this is clearly in the domain of a package manager

@Araq
Copy link
Member

Araq commented Jul 7, 2021

ie what's version selection doing the compiler? why does the compiler have to know about package versioning schemes at all? this is clearly in the domain of a package manager

Correct but this is kept for backwards compatibility. At least that's my understanding.

@bobeff bobeff force-pushed the feature/nimble-lockfile-support branch 5 times, most recently from c57bd0f to 7f92950 Compare July 11, 2021 17:37
Implemented support for Nimble local cache with package directories with
a checksum of the package at the end of their names. Now the compiler
supports package paths in the form:

 * /path_to_nimble_cache_dir/pkgs/package_name-1.2.3-
FEBADEAEA2345E777F0F6F8433F7F0A52EDD5D1B

 * /path_to_nimble_cache_dir/pkgs/package_name-#head-
042D4BE2B90ED0672E717D71850ABDB0A2D19CD2

 * /path_to_nimble_cache_dir/pkgs/package_name-#branch-name-
DBC1F902CB79946E990E38AF51F0BAD36ACFABD9

Related to nim-lang/nimble#127
@bobeff bobeff force-pushed the feature/nimble-lockfile-support branch from 7f92950 to 1da1285 Compare July 12, 2021 21:48
@Araq Araq merged commit 5e66804 into nim-lang:devel Jul 15, 2021
@Araq Araq removed the Postponed label Jul 15, 2021
@bobeff bobeff deleted the feature/nimble-lockfile-support branch July 16, 2021 14:52
PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022
Implemented support for Nimble local cache with package directories with
a checksum of the package at the end of their names. Now the compiler
supports package paths in the form:

 * /path_to_nimble_cache_dir/pkgs/package_name-1.2.3-
FEBADEAEA2345E777F0F6F8433F7F0A52EDD5D1B

 * /path_to_nimble_cache_dir/pkgs/package_name-#head-
042D4BE2B90ED0672E717D71850ABDB0A2D19CD2

 * /path_to_nimble_cache_dir/pkgs/package_name-#branch-name-
DBC1F902CB79946E990E38AF51F0BAD36ACFABD9

Related to nim-lang/nimble#127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet