Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pkg3: immutability of compatibility #14

Open
StefanKarpinski opened this issue Nov 15, 2016 · 91 comments
Open

Pkg3: immutability of compatibility #14

StefanKarpinski opened this issue Nov 15, 2016 · 91 comments
Labels

Comments

@StefanKarpinski
Copy link
Sponsor Member

Continuing half of the discussion on #3.

@StefanKarpinski StefanKarpinski changed the title Pkg3: immutability of compatibility constraints Pkg3: immutability of compatibility Nov 15, 2016
@StefanKarpinski
Copy link
Sponsor Member Author

If we allow compatibility of versions to be mutated after the fact (as we do now in METADATA), one major issue is that it will be impossible, when compatibility has been modified later, to know what the state of compatibility constraints on versions actually were when versions were resolved. This could hide resolution bugs and generally makes understanding the system harder.

One possible solution is for each modification of compatibility constraints to increment a build number of a version or something like that, so 1.2.3 is the version with its original compatibility, while 1.2.3+1 would be a version with potentially modified compatibility or other metadata changes, which would get its own metadata in the registry, but share the same source tree.

At that point, however, I have to question why 1.2.3+1 wouldn't simply be called 1.2.4. The main objection seems to be that it's annoying / hard to create patches and package maintainers often aren't as responsive as we'd like. Which makes me think that we should just make it easier to make this kind of patch update and make it possible without the package maintainers involvement.

@StefanKarpinski
Copy link
Sponsor Member Author

In particular, patches don't need to be made on the main repository of a project, they can be made on a fork as long as they are eventually upstreamed back to the main repo.

@JeffreySarnoff

This comment has been minimized.

@tkelman
Copy link

tkelman commented Nov 15, 2016

The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases.

The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term.

@StefanKarpinski
Copy link
Sponsor Member Author

The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases.

If the latest patch release always supersedes previous ones in the the same major-minor series, then you can always just make a new patch. The only way needing 1.2.3+1 rather than 1.2.19 makes sense is if you want a version with compatibility fixes but without any bugfixes. That seems like a somewhat implausible situation. How would this be necessary? If such a situation did occur, we could always allow publishing 1.2.3+1 with updated compatibility but without bug fixes.

The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term.

That means we'd have to record the state of all registries in the environment, which ties the meaning of an environment to the history of registries in a way that we are (or at least I am) trying to avoid. If version compatiblity is immutable (in either 1.2.3+1 or 1.2.4 form), then you can always tell just by looking at the compatibility info for those version whether they are correct. You can't tell if they were optimal at the time, but you can verify correctness.

@tkelman
Copy link

tkelman commented Nov 15, 2016

If the latest patch release always supersedes previous ones in the the same major-minor series

This is not a good idea, as I've said before - there's not a lot of precedent for allowing code changes to completely supercede old versions. If there's going to be a second class of dependency resolution for complete replacement, then it should not be allowing code changes. People break their api in bugfix releases even if we tell them not to, and downstream packages are going to need to be able to use api's that only existed in early patch releases. And this situation might not be noticed immediately, so there could be enough later patch and minor releases that there isn't room to fix the situation by making a new set of renumbered releases.

@StefanKarpinski
Copy link
Sponsor Member Author

So are you ok with the idea of version metadata – especially compatibility – being immutable, but having 1.2.3+1 supercede 1.2.3 with no source code changes, only metadata changes?

@tkelman
Copy link

tkelman commented Nov 15, 2016

Yes, that seems like a mostly equivalent way of accomplishing the same thing as modifying compatibility in metadata. It records more history permanently (not just in git history), maybe that could be useful though.

@tkelman
Copy link

tkelman commented Nov 15, 2016

I do think we should keep a log of version history used by local registry copies over time, so you could feasibly implement an "undo" of a global update operation. That's a separate issue though.

@StefanKarpinski
Copy link
Sponsor Member Author

Or are you entirely against the idea that version metadata be immutable?

@martinholters
Copy link
Member

Creating such a metadata-only update would be simplified if the metadata was only part of the registry, not the package itself, i.e. 1.2.3+1 could have the same hashes stored as 1.2.3. Actually, it would have to, to enforce the "no source code changes" policy. This would a) allow easy automatic verification of this policy and b) simplify metadata-only updates by non-package-maintainers.

Would that be an option? (Or is that already the idea and I misread the proposal?)

@simonbyrne
Copy link
Contributor

simonbyrne commented Nov 16, 2016

The example I gave in the other thread illustrates why patches are insufficient:

  1. Pkg B v2.0.0 depends on v1.2 of Pkg A
  2. Pkg C v3.0.0 depends on v1.2 of Pkg A
  3. Pkg A v1.3.0 is tagged with new features
  4. Pkg B v2.1.0 is tagged using features of Pkg A v1.3.0, but forgets to update the version requirement
  5. Pkg B v2.1.1 is tagged fixing this.

Now user installs Pkg B and Pkg C: the end result would be:

  • Pkg A v1.2.x (as this is the latest version compatible with Pkg C)
  • Pkg B v2.1.0 (as this is the latest version compatible with Pkg A v1.2)
  • Pkg C v3.0.0

which would be broken.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Nov 16, 2016

@martinholters: Yes, having compatibility info not live in the package repo is definitely a possibility, but it would make it harder for unregistered packages to participate in version resolution. Since making unregistered packages easier to work with was one of the major requests for Pkg3, that's a bit of a problem. Also, if we move compatibility info out of the package itself, where does the developer edit it? The obvious answer is in the registry but I feel like that's not tremendously obvious or developer-friendly.

@simonbyrne: This wouldn't be the result under what I've proposed since the existence of Pkg B v2.1.1 would prevent resolution from ever choosing Pkg B v2.1.0 – that's what "strongly favor the latest patch release" is meant to convey. Instead you would get A v1.2.x, B v2.0.0 and C v3.0.0. In the other approach being discussed here, B v2.1.0+1 would fix B v2.1.0's dependencies and would similarly hide B v2.1.0 from consideration when resolving new versions.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Nov 16, 2016

The core of @tkelman's objection (assuming he's not against the idea of immutable version metadata entirely, which would be good to get an answer on), seems to be that updating version metadata via new patches allows metadata fixes to be mixed with bug fixes – well, technically arbitrary source code changes, since people may not just fix bugs in patch versions. But if people stick with bug fixes in patches, this won't be a problem: why would you want a buggier version? Yes, people will screw up bug fixes, but then the appropriate action is to make another patch that fixes the fix.

Fixing version metadata for 1.2.3 by releasing 1.2.4 is less flexible that adding another level of metadata-changes-only versioning like 1.2.3+1. So why not just add another layer and semantically separate metadata changes from code changes of any kind? One reason is that semantic versioning already has three layers of versioning, which is already a lot to deal with and reason about, and adding another one seems complicated and unnecessary. At the level of practical development, people only use branches corresponding to major/minor versions: patches occur on branches with names like release-1.2 – if you want to make a new 1.2.x release, you tag the tip of release-1.2. How would this workflow change with metadata-only changes like 1.2.3+1? You need a branch for each patch release now: you'd make metadata-only fixes on release-1.2.3 and you'd need a branch like that for every single release. That just seems ridiculous. If you make metadata fixes via new patch releases, mixed in with other bug fixes, then the current workflow doesn't change at all – just fix version metadata on the release-1.2 branch and tag a new patch.

My perspective is that we want to design the package manager so that making patch versions that do anything besides fixing bugs is problematic. This will actively encourage package developers to only fix bugs in patches. Two feature of the proposed design that encourage this are:

  1. Have newer patches fully supercede older ones with the same major/minor version.
  2. Not allowing version dependencies to specify versions at patch granularity.

Both of these design choices assume that patches with the same major/minor version are equivalent aside from metadata updates and bug fixes. If a package maintainer violates this assumption by adding or removing functionality in a patch, it will cause problems. Problems lead to complaints, which will provide feedback to the maintainer and help them learn that this is bad practice and not do it in the future. This is not based on some sort of groundless optimism that people will do things correctly on their own, it's based on the principle that people respond to feedback and that we can design a system that actively causes people to receive corrective feedback. Is this limiting the ways that package developers can version their packages and have things work smoothly? Yes, but I think that's a good thing.

@tkelman
Copy link

tkelman commented Nov 16, 2016

If a compatibility-only change can be done only at the registry level without needing the source to change at all, then there's no need for a branch for a compatibility revision.

Designing the system to be intentionally rigid and inherently flawed in the face of a behavior that people will commonly do (a recent example, changing the type of a single parameter of a single function - that breaks the api but seems like a minor change), and in a way that cannot be easily fixed once newer versions have been published, is why I think this goal is a bad idea.

The core job of a package manager is if source has been published as a release version, it should be possible to depend on it. Demoting the patch level of versioning from this is unnecessary, adds friction to the system, and doesn't gain us anything. Downstream users are the ones who face problems from versioning mistakes, and are incapable of fixing them or working around them without cooperation from the upstream author, or forking the package and re-releasing a new series of different version numbers. We don't gain enough for this to be worth it.

@tkelman
Copy link

tkelman commented Nov 16, 2016

What qualifies as a bugfix is not always clear cut either. In fixing one bug, you can often accidentally (or intentionally!) break something else that downstream users were depending on. And these issues don't get identified immediately. By the time some of these issues are found, the upstream author may have moved on to a newer release series, that the downstream users don't have time to upgrade to right away (especially if there was a past release that worked fine for them). What option does downstream have to get their code working again? They could publish a fork without any of the more recent releases, but why have we made them go to that trouble when a patch level upper bound would serve the exact same purpose?

@StefanKarpinski
Copy link
Sponsor Member Author

The problem with having registry-only compatibility changes is that it:

  1. makes compatibility confusing since there are multiple conflicting – and changing – sources of what a version's compatibility actually is, and it
  2. makes registered and unregistered packages work completely differently – registered packages have a mechanism for amending compatibility while unregistered ones don't.

The process I'm proposing is straightforward and the same for registered or unregistered packages: keep definitive compatibility info in Config.toml; when compatibility needs to be adjusted, just edit Config.toml on the appropriate release branch, commit the changes and publish the tip of the release branch as a new patch.

Preferring the latest patch for version resolution doesn't make it impossible to use older patches, nor does it force users to upgrade to the latest patch – if what they're using works, no problem:

  • if you're already using v2.1.0 and it works, no problem
  • if an environment records v2.1.0 and you run it, you get v2.1.0
  • if you install or upgrade, then yes, you’ll always get v2.1.1 instead of v2.1.0
  • but you can still explicitly ask for v2.1.0, e.g. with pkg> add A = 2.1.0

The example you allude to (where was this?) with a changed type parameter is a simple broken patch. The correct fix in such a situation if you depend on the package to exclude that specific broken patch, which solves the problem; if you're the package maintainer, the fix is to revert the part of the change that broke compatibility for someone and make a new patch release. Neither is a big problem.

I would love an actual problematic case that can't be handled with what I'm proposing instead of general arguments about what package managers should or shouldn't do. If there's some problem scenario, I want to know about it. The kind of example @simonbyrne presented is exactly what I'm talking about (hopefully my answer to that is convincing to him). The Compat example in #3, is also exactly what I'm talking about: the fact that minor updates to packages with many dependents (Compat being the most extreme example) would force patching of all dependents is a devastating problem with my original proposal, hence #15 (comment).

@tkelman
Copy link

tkelman commented Nov 16, 2016

The problem is the "broken patch" is broken from the perspective of downstream users who were using the old api, but intended as a new api by the upstream author. Upstream isn't going to revert it. Downstream then needs to indicate that all future patches are broken. That's not possible in this proposal, every new upstream release would break the downstream until downstream gets a chance to add another broken patch to their list.

It's not possible for compatibility to be set in stone and never change - compatibility depends on the entire set of possible interacting versions of dependencies, it always changes as new versions get released.

@tkelman
Copy link

tkelman commented Nov 16, 2016

You are proposing making it impossible to declare version compatibility bounds at patch granularity. That's necessary in the case above, where

package B depends on package A, which is at say v 1.3.3 when package B gets written (and it relies on a feature that was new in 1.3.0)
package A breaks api between versions 1.3.5 and 1.3.6
package A makes many more 1.3.x releases, several 1.4.y, and has started on 2.0.0
package B gets a report that it doesn't work any more with package A v1.4.3

Assuming the author of package B can remember or recover from environment info what version of package A did work, there's no way in this proposal of reflecting its requirements since it can't express an upper bound on A v 1.3.6 that caused the problem. It could say every patch from 1.3.6 on is broken, but if those have to be listed individually then it becomes incorrect as soon as an additional 1.3.17 backport gets released. The most practical solution to immediately get a working version of its dependency is to republish a fork of the old version of package A.

What problem is solved by disallowing requirements at patch granularity, and disallowing expressing requirements as ranges?

@StefanKarpinski
Copy link
Sponsor Member Author

The subject of this issue is immutability of compatibility, which is orthogonal to patch granularity. I was trying to unmuddy the discussion by splitting #3 in to this issue and #15, which would be a better place to discuss patch granularity, although that's explicitly about the opposite complaint: that the granularity is too fine, which I already conceded.

@tkelman
Copy link

tkelman commented Nov 17, 2016

Splitting a discussion without posting to that effect in the discussion itself isn't terribly effective.

Compatibility constraints are either correct, too tight, or too loose with respect to the time and set of available dependency versions when you state them. As new versions become available, a previously correct set of constraints can become too tight if it doesn't include working versions, too loose if it does not indicate new breakage, or remain correct. Compatibility claims that were too tight or too loose when they were first made may need to be amended after the fact.

If making personal registries is simple, then I don't think it's worth worrying about how to amend compatibility for unregistered packages. Source releases should be immutable, compatibility often needs to be amended, so compatibility should be tracked outside of the source. If you need to amend compatibility for an unregistered package, then create a personal registry to track it.

@tbreloff
Copy link

I really hope that package management and compatibility can be managed
outside of the actual codebase as much as possible. In fact I wish that we
didn't use git tags at all. Forcing package authors to add new commits (and
tag them) just to fix a dependency resolution is ridiculous. Please lets
put all requirements outside of the actual package repo. Let a core group
of people manage those dependencies for the curated metadata, with advice
from authors. Private metadatas will be easier to manage as well.

On Wednesday, November 16, 2016, Tony Kelman notifications@github.com
wrote:

Splitting a discussion without posting to that effect in the discussion
itself isn't terribly effective.

Compatibility constraints are either correct, too tight, or too loose with
respect to the time and set of available dependency versions when you state
them. As new versions become available, a previously correct set of
constraints can become too tight if it doesn't include working versions,
too loose if it does not indicate new breakage, or remain correct.
Compatibility claims that were too tight or too loose when they were first
made may need to be amended after the fact.

If making personal registries is simple, then I don't think it's worth
worrying about how to amend compatibility for unregistered packages. Source
releases should be immutable, compatibility often needs to be amended, so
compatibility should be tracked outside of the source. If you need to amend
compatibility for an unregistered package, then create a personal registry
to track it.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492n9oSA1HZIK2ZVNgjFtcMlQUuF8oks5q-51RgaJpZM4Kyxf_
.

@JeffreySarnoff
Copy link

+1.618 for allowing me to become unconcerned with anything git related

@tkelman
Copy link

tkelman commented Nov 17, 2016

@tbreloff package authors need to be responsible for dependency versioning. What features are you using, when things break how do you fix or work around them, etc. That comes with the territory of having dependencies. If you get any help you're lucky, but you can't expect other people to do this for you.

An outside-of-the-source copy of the dependency information may need to take priority here though, as in the existing system where metadata is used for registered packages, the package's copy of REQUIRE isn't actually used except at tag time to populate the initial content.

A compatibility-only revision release could be a mechanism for this, but it needs to be possible to do that for any published release, not just the latest within a minor series. Compatibility is about the rest of the world with respect to a fixed version of a package - we shouldn't be mixing the release numbering or resolution mechanism for outside-world compatibility within the same system (and constraints) that we use for a package's own source.

@tbreloff
Copy link

So then maybe what I'd like is a little more subtle. It would be nice if
the larger community had a mechanism to tag and fix dependecies in place of
authors that don't have the time or knowledge to keep up with the process.
How many times a day do you have to tell people exactly what they need to
do and how to do it in order to properly register or tag? Wouldn't it be
easier for everyone involved if you just did it yourself? You're the one
with commit access to metadata, so why go through the silly and pointless
steps that make it seem like the author has anything valuable to add? I'd
be happy with v1.2+ and v1.2.3+ if it means problems are immediately solved
by the people who understand the right way to solve them.

tl;dr Manage as much as possible from within metadata(s) without
necessarily requiring the author

On Thursday, November 17, 2016, Tony Kelman notifications@github.com
wrote:

@tbreloff https://github.com/tbreloff package authors need to be
responsible for dependency versioning. What features are you using, when
things break how do you fix or work around them, etc. That comes with the
territory of having dependencies. If you get any help you're lucky, but you
can't expect other people to do this for you.

An outside-of-the-source copy of the dependency information may need to
take priority here though, as in the existing system where metadata is used
for registered packages, the package's copy of REQUIRE isn't actually used
except at tag time to populate the initial content.

A compatibility-only revision release could be a mechanism for this, but
it needs to be possible to do that for any published release, not just the
latest within a minor series. Compatibility is about the rest of the world
with respect to a fixed version of a package - we shouldn't be mixing the
release numbering or resolution mechanism for outside-world compatibility
within the same system (and constraints) that we use for a package's own
source.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492h_v_zsbEfvzKw1j5Q2UwR_c06Fgks5q-_OagaJpZM4Kyxf_
.

@StefanKarpinski
Copy link
Sponsor Member Author

The notion that you can build a functioning ecosystem of reusable software without authors thinking about versioning at all strikes me as incredibly implausible, not to mention totally unscalable. Who's going to be spending all of their time figuring out how to version every single registered package? Your answer here seems to be "I dunno, but not me." If you want to develop software that way, that's cool – then don't register your packages. What I'm proposing will support unregistered packages much better, but it won't change the fact that following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time.

@tbreloff
Copy link

without authors thinking about versioning at all

Of course there's a middle ground. Authors think about the high level versioning, but not necessarily the gritty details (that frequently are due to other packages out of their control). Those details should either be handled by automation or by expert guidance, depending on the situation.

Your answer here seems to be "I dunno, but not me."

When it comes to curated metadata repos, if I'm not a curator then the final responsibility is not mine. Package authors can guide versioning (and should be encouraged to do as much as possible themselves) but this mentality that curators should never make changes to the thing they're curating, but instead to enact social pressure on package authors until they make the exact change that the curator could have done in the first place... it's just stupid. I want to see the curation as disjoint from the code.

following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time.

I couldn't agree more, which is why I care so much about making it dirt-simple to "do the right thing".

@JeffreySarnoff
Copy link

JeffreySarnoff commented Nov 17, 2016

@StefanKarpinski @tbreloff. Each of you is right, In important measure

I have seen the need for handholding in the less well traveled regions of the deep end of the pool. increases superlinearly. @tkelman The work you do helping us deal with tags and git when it goes on a bender probably is more informative than predictive.

This Summer and next Fall I expect for Julia a flood of new and very active involvement. Something is going feel the extra weight. 🚶‍♂️ (mmph, 😢) "I do not want to play with git" (😢, mmph)

between update and upgrade. ?uplift

@StefanKarpinski
Copy link
Sponsor Member Author

@simonbyrne: yes, this is probably a good idea. Registry-signed tags make sense too.

@tbreloff
Copy link

registry itself maintain a fork of all the repositories

The other benefit is that the community could decide to tag/release without requiring the package author. There have been many times that people would have stepped up and tagged something while the author is on vacation (or whatever).

@simonbyrne
Copy link
Contributor

So as I understand it, a typical release process might look something like:

  1. Package author requests new release via some registry API
  2. Registry performs checks. If it fails we notify the author somehow
  3. Registry pulls data into its fork, tags and signs the tag.
  4. Registry contents are updated.
  5. All dependent packages are also checked for compatibility with the new package: their Config.toml files are updated to reflect the outcomes of this check.

Is that what you had in mind?

(these points are intentionally a bit vague, in particular point 5, but that is probably best discussed in a different issue)

@tbreloff
Copy link

That sounds pretty reasonable @simonbyrne. And my point above was that "Package author requests new release" could just as easily be "community requests new release" without any hiccups (with the social understanding that we should default to the author's wishes whenever feasible).

@StefanKarpinski
Copy link
Sponsor Member Author

Yes, roughly, although I might order it like this instead:

  1. Package author requests new release via some registry API
  2. Registry pulls git data into its fork
  3. Registry performs checks. If it fails we notify the author somehow
  4. Registry tags and signs the tag
  5. Registry contents are updated

One issue with tagging is that IIRC, tags are only transmitted via push/pull, not via pull request, so it's still unclear how to get the tag into the origin repo. For GitHub repos, we could use the tag create API but that doesn't address non-GitHub repos. For those, I suppose we could either have platform-specific APIs or ask the repository owners to pull tags from the registry fork.

I'm also not sure where the best point for checking compatibility is. It could be part of the checks step – if it's a patch release, it shouldn't break any packages that depend on it. We could verify that before accepting a version.

@StefanKarpinski
Copy link
Sponsor Member Author

Also, note that git tags are usually for commits not trees, so if we use tree tags (which is possible), it will be a bit unusual. We may want to tag a commit for convenience but associate the version with a tree rather than a commit.

@tkelman
Copy link

tkelman commented Nov 22, 2016

If the checks fail, you'd need to back out pulling into the registry fork and redo it after the author addresses the issues.

This is getting to be a lot of machinery to expect small organizations to maintain their own instances of.

@StefanKarpinski
Copy link
Sponsor Member Author

Why would you need to back anything out? Git commits are immutable.

@tkelman
Copy link

tkelman commented Nov 22, 2016

Not everyone has enabled branch protection - people do occasionally force push to master of packages. They shouldn't be doing that, but if they do we wouldn't want it to mess up the registry's fork.

@StefanKarpinski
Copy link
Sponsor Member Author

Force pushing a branch doesn't destroy commits, it just changes the commit that a branch points at.

@tkelman
Copy link

tkelman commented Nov 22, 2016

Depends exactly what "pulls git data into its fork" means then, and where the checks happen. If checks happen in a completely from-scratch clone wherever it's running and don't push anything back to the github copy of the fork unless the checks pass, then it's fine. Pulling into an existing clone's master after a force push is where things can go wrong.

@StefanKarpinski
Copy link
Sponsor Member Author

I will make the wrong choice so that we can argue about it.

@simonbyrne
Copy link
Contributor

Pulling into an existing clone's master after a force push is where things can go wrong.

I think "pull" may be the wrong word here: the metadata fork I envision the process as something like the following:

git fetch upstream
git checkout HASH
# run tests
# if tests pass
git tag -s -m "..." vX.Y.Z
git push registry vX.Y.Z

(here upstream and registry are the respective remotes). In other words, no branches are involved. This doesn't solve the problem of getting the tags back to upstream, but I don't know if that is such a big deal as the user won't be pulling from it.

I'm not sure about the commit vs tree hash issue, but my experience has been that trees are often harder to work with as they're not really a "user facing" feature of git.

Also, I'm not really sure how we would handle non-git sources either.

@simonbyrne
Copy link
Contributor

one other thing to think about: who "owns" the version numbers. In what I outlined above, it would be the registry, not the package (as emphasised by the fact that it is the registry signing the tag).

I'm not sure how this would work in the case of a package being in multiple registries (who decides whether or not it is a valid version?)

@tkelman
Copy link

tkelman commented Nov 22, 2016

I will make the wrong choice so that we can argue about it.

Was that really necessary? "This sort of response is not constructive" either.

It's fairly obvious that having two possible sources for a fact is more complicated and confusing than only having a single possible source for it.

We haven't actually solved this problem if everything is duplicated in both the registry and the package. One should take priority over the other. If we design this whole system to ensure they're equal in most normal usage, you still need to pick which to use in case of local divergence or development. Local development probably points to preferring the package's copy, but how local development is supposed to fit with the rest of Pkg3 has not yet been described here.

One of the copies of this information is a duplicate and somewhat redundant. It sounds like we're moving towards a very registry-driven design. In use cases other than local development, the package's copy (and upstreaming registry-driven compatibility changes back to it) is fairly vestigial. You want to be able to do dependency resolution without having to first download every version of every package. How would version resolution work on an unregistered package? Right now, unregistered packages have no versions - how would Pkg3 change that?

Archiving past versions is a good idea, but doing so by having every registry also maintain git forks of all its packages is making our "github as cdn" abuse worse.

@StefanKarpinski
Copy link
Sponsor Member Author

Yeah, tagging versions is complicated. We may need a "two phase commit" process.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Nov 22, 2016

I will make the wrong choice so that we can argue about it.

Was that really necessary? "This sort of response is not constructive" either.

My point is that your attitude to this discussion has been fundamentally uncharitable and contentious. In this particular instance, there are two ways to do a thing, and instead of giving me the benefit of the doubt that I'm not a moron and will pick the one that works, you assume that I'll do the wrong thing and then argue with me based on that assumption. This attitude is frustrating, comes across as disrespectful, and mires us in unnecessary arguments instead of collaborative exploration of the solution space to find something that addresses everyone's concerns.

We haven't actually solved this problem if everything is duplicated in both the registry and the package. One should take priority over the other.

Replicating immutable data isn't a problem. That's the principle behind git and most other successful distributed data stores. Having multiple copies is only a problem if they are mutable.

It sounds like we're moving towards a very registry-driven design.

Quite the opposite. If anything, the package repository is primary and registries are just copies of immutable, append-only metadata about package versions, copied from the packages.

How would version resolution work on an unregistered package? Right now, unregistered packages have no versions - how would Pkg3 change that?

This is a good question. I was considering just using tags for versions in unregistered packages. But of course, you generally don't want to bother tagging versions if your package isn't registered, so I'm not sure what the point is. Instead, I think one would just use an environment file in the git repo to synchronize unregistered packages in lock-step (a la MetaPkg), but their dependencies on registered packages can be looser via compatibility constraints in the unregistered package repos.

Archiving past versions is a good idea, but doing so by having every registry also maintain git forks of all packages is making our "github as cdn" abuse worse.

How else would you do this? If you want to keep an archive of a package's git history you have to make a fork of it in case it goes away at some point. Using git for source delivery has problems, but that's an orthogonal issue.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Nov 22, 2016

Maybe we should separate the two jobs of a registry:

  1. Validation: checking that a proposed version makes sense – that it satisfies various checks.
  2. Collection: keeping package and version metadata in a centralized location.

The former is the part that requires intelligence and automation while the latter is dead simple.

@tbreloff
Copy link

And don't forget #3: user api. Make it dirt-simple for everyone involved to
follow best practices... So then they might.

I agree these can be designed separately.

On Tue, Nov 22, 2016 at 6:27 PM Stefan Karpinski notifications@github.com
wrote:

Maybe we should separate the two jobs of a registry:

  1. Validation: checking that a proposed version makes sense – that it
    satisfies certain requirements and checks.
  2. Collection: keeping package and version metadata in a centralized
    location.

The former is the part that requires intelligence and automation while the
latter is dead simple.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492sI7is9lNIJdKJULbLvDqneBII-Pks5rA3p-gaJpZM4Kyxf_
.

@tkelman
Copy link

tkelman commented Nov 22, 2016

There are many more than 2 ways to do something that is "intentionally a bit vague" and unclearly specified. I've been contentiously arguing against aspects of the design that I don't think will work. Several of which it looks like we've moved away from, but it took discussion. Take it at technical face value, please.

Dependency resolution can require global information, which is why registries contain compatibility information for all past versions. Getting the equivalent set of information if the package copy is the primary source would require either downloading all versions, or getting information out of git for many versions simultaneously in a way that we don't currently do anywhere to my knowledge. The latter would make the goal of allowing packages to not have to be git repositories less feasible.

If we're only archiving releases that get published to a registry, then why would the git history be needed? If packages are immutable after installation then they can just be source tarballs, and an archive can work like most conventional package managers, just a collection of source release snapshots.

@StefanKarpinski
Copy link
Sponsor Member Author

I was actually thinking of separating them entirely. I.e. first you submit a proposed version to various validation services: services that check things like that the proposed version metadata is well-formed, that its tests pass, that it works with various versions of its dependencies, that it doesn't break various versions of its dependents. Once you've got ok/error from a validation service or services, you can go to a registry and submit that and then the check at the registry is just that the sufficient set of validations have passed. I can even imagine private packages being submitted to cloud-hosted validations services and then registered privately. The set of validations that a version has passed can be attributes of the version; people can filter packages/versions based on validations that it has.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Nov 23, 2016

If we're only archiving releases that get published to a registry, then why would the git history be needed? If packages are immutable after installation then they can just be source tarballs, and an archive can work like most conventional package managers, just a collection of source release snapshots.

If someone deletes their git repo, we want to be able to make another full git repo the new source of the package. We need a fork to do that. I'm not sure why you're arguing this point.

@StefanKarpinski
Copy link
Sponsor Member Author

I'm not sure what your point about global version information is.

@tkelman
Copy link

tkelman commented Nov 23, 2016

Don't we also want to make Pkg3 robust against the "package developer force pushed over master" scenario? So tags need not all be linear have common descendants? We'd want it to be possible to restart development from a non-git copy of a deleted repo with a fresh git init from scratch, wouldn't we? (Or the "rebased to remove large old history" situation that has come up a few times.)

The scheme of propagating tags through forks sounds overly complex and unnecessary, and a lot to set up to run a registry. And now we have multiple mutable remotes for any given package - this could get confusing in terms of issue and PR management, if all the downloads are coming from a fork that users should actually ignore.

The point about global version information is that the head copy of a package's compatibility contains less information than the registry's copy. Except for the author at tag time, everyone else could delete the package's copy and not notice. "Package is primary" is the remaining item of dispute here, afaict.

@StefanKarpinski
Copy link
Sponsor Member Author

I agree that propagating tags through forks is complicated and maybe impractical. We'll have to see. The main thing we need is copies of the git history for the commits behind various tagged versions, but that could be a separate process from registration.

@tkelman
Copy link

tkelman commented Nov 23, 2016

If we have a reliable registry-controlled mechanism of obtaining a copy of the release snapshot source with a matching checksum, does it actually need a copy of the git history? Thanks to github it's oddly easier to get straightforward hosting of a full git repo (up to its size limits, anyway) than it is to host arbitrary non-git source snapshots, but I wonder whether we're letting that ease of use drive the design decisions.

@martinholters
Copy link
Member

Wouldn't future support of non-git-based packages be problematic if releasing a version would include cloning its git history? Ok, of course one could replace that with "cloning its version history in whatever VCS is being used", but that would make registries much more complicated, as they would have to accommodate every VCS used by packages they want to register.

@tkelman
Copy link

tkelman commented Nov 23, 2016

We should move this aspect of discussion to its own issue, but I think it's totally reasonable today to require that Julia packages must have a git repo (or git mirror of something else) as the development source of record. What we should try to keep feasible is allowing the flexibility of downloading release tags at install time to users' systems in a form other than a full git clone though.

@simonbyrne
Copy link
Contributor

Archiving past versions is a good idea, but doing so by having every registry also maintain git forks of all packages is making our "github as cdn" abuse worse.

As I understand it, GitHub is fairly intelligent about not unnecessarily replicating data across forks (thanks to git's immutable objects), so I don't think this is really an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants