Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic package distribution ideas #988

Open
bdrewery opened this issue Aug 26, 2014 · 13 comments
Open

Atomic package distribution ideas #988

bdrewery opened this issue Aug 26, 2014 · 13 comments

Comments

@bdrewery
Copy link
Member

Logging here so I don't forget and for discussion.

Currently there is a race for users when packages are published. If they run pkg upgrade, get the repo files, then before they start fetching packages the repository is modified (let's assume atomically), then their job will likely fail as the packages they are trying to fetch no longer exist or have new dependencies that require new solving.

This problem exists for all pkg repositories, not only FreeBSD.org.

My idea is that when we build packages we name them as their cached name (i.e. the package has a hash in its name) just as we do when fetched. Then we publishing we do not delete any packages. Only add new. After a certain period of time (hours?) we can delete old packages. This would prevent users having a mismatch as they will always find the packages their catalog expects to be there. Without hashed filenames the user may get a broken package installed due to us not having reproducible packages. It may have the same version and rev as the older one the client expected, but now be linked against a different dependency.

Actually due to us checking checksums of fetched packages then this situation is detected and pkg bails out with an error. It's still a bit scary to the user when it is just a side effect of publishing the repository.

I'd prefer to find a solution that does not require out-of-band management of packages so I am not liking this idea much...

My previous idea was to make a "smart repository" where pkg binds itself to a specific URL advertised by the repository. Then the publisher can just make a new directory and remove the old when they please. Pkg would continue using the known URL until the next pkg update. I still like the "smart repository" idea more. I thought I had an issue for it but cannot find it.

@bapt @DarkHelmet433

@DarkHelmet433
Copy link
Member

Thinking purely from the perspective of pkgsync, I could handle either of these. For what its worth, there is a symlink to an internal setname variant that is followed by the server:
root@pkg0.isc:/home/pkgsync/pkg-mirror/freebsd:11:x86:32 # ls -l
lrwxr-xr-x 1 5211 5211 43 Aug 15 17:55 latest -> ../pool/freebsd:11:x86:32/latest.1408112311
...
It would take a bit of care, but the specific urls could be exposed if you wanted to go the "smart repo" way.
I don't know what is easiest for personal/private repos, but for pkg.f.o either should be do-able.

@davidchisnall
Copy link
Member

Avoiding the race mattering could also be achieved by refetching the repo metadata file after fetching the files - if they don't match, then it's probably a good idea to retry the fetch. Of course, if you're on a really slow connection then it may always take longer to get the packages you want than it takes the beefy machines to build them...

@bdrewery
Copy link
Member Author

I think the simplest would be to get /latest to do an actual redirect to the pool directory and then get pkg update to notice the redirect (difficult with libfetch maybe) and bind that repo to that given URL so long as the URL in the repo config still matches what the URL was when pkg update was ran. If that is the case then use the redirected link always until it is changed by a pkg update.

Or we store a file in `latest/.latest_redirect' and which stores '../pool/.../' and have pkg fetch that file additionally and find to it using similar logic.

Poudriere uses these symlinks currently:

lrwxr-xr-x  1 root  wheel  16 Mar 26 03:23 .latest@ -> .real_1427358217
drwxr-xr-x  4 root  wheel   9 Mar 23 11:33 .real_1427128420/
drwxr-xr-x  4 root  wheel   9 Mar 23 13:29 .real_1427134759/
drwxr-xr-x  4 root  wheel   9 Mar 23 13:34 .real_1427135455/
drwxr-xr-x  4 root  wheel   9 Mar 25 22:07 .real_1427339272/
drwxr-xr-x  4 root  wheel   9 Mar 26 03:23 .real_1427358217/
lrwxr-xr-x  1 root  wheel  11 Dec  3  2013 All@ -> .latest/All
lrwxr-xr-x  1 root  wheel  14 Dec  3  2013 Latest@ -> .latest/Latest
lrwxr-xr-x  1 root  wheel  19 Dec  3  2013 digests.txz@ -> .latest/digests.txz
lrwxr-xr-x  1 root  wheel  16 Sep 17  2014 meta.txz@ -> .latest/meta.txz
lrwxr-xr-x  1 root  wheel  23 Dec  3  2013 packagesite.txz@ -> .latest/packagesite.txz

In this scheme it is currently URL/All/ which redirects to URL/.latest/All/ which goes through the .real symlink.

I'd like some solution that works with both poudriere and pkg.freebsd.org.

@bdrewery
Copy link
Member Author

Any URL binding needs to leave out the actual hostname redirected to to not bind to specific mirrors.

@bdrewery
Copy link
Member Author

Of course any URL binding would need to sanity check that the URL it was given still is valid before fetching packages and if not then do a full pkg update. Given David's idea is quite similar it might be simplest to go with his check for now.

@bdrewery
Copy link
Member Author

Any file solution or repo metadata usage would need to be signed, which would require running pkg repo to update/create the file.

@bdrewery
Copy link
Member Author

bdrewery commented Apr 4, 2015

Here is a proof of concept of @davidchisnall's idea.

https://people.freebsd.org/~bdrewery/pkg-refetch.patch

This is only covering pkg fetch so far for discussion. I'll update the other clients as well if we're OK with this.

Before:

# pkg fetch -ya
Updating myrepo repository catalogue...
Fetching meta.txz: 100%    588 B   0.6kB/s    00:01
Fetching packagesite.txz: 100%  311 KiB 318.6kB/s    00:01
Processing entries: 100%
myrepo repository update completed. 1222 packages processed
The following packages will be fetched:
New packages to be FETCHED:
[...]
The process will require 1 GiB more space.
1 GiB to be downloaded.
Fetching irssi-0.8.17_1.txz: 100%  566 KiB 580.3kB/s    00:01
pkg: cached package irssi-0.8.17_1: size mismatch, fetching from remote
Fetching irssi-0.8.17_1.txz: 100%  566 KiB 580.3kB/s    00:01
pkg: cached package irssi-0.8.17_1: size mismatch, cannot continue

After:

# pkg fetch -ya
Updating myrepo repository catalogue...
Fetching meta.txz: 100%    588 B   0.6kB/s    00:01
Fetching packagesite.txz: 100%  311 KiB 318.6kB/s    00:01
Processing entries: 100%
myrepo repository update completed. 1222 packages processed
The following packages will be fetched:
New packages to be FETCHED:
[...]
The process will require 1 GiB more space.
1 GiB to be downloaded.
pkg: cached package irssi-0.8.17_1: size mismatch, fetching from remote
Fetching irssi-0.8.17_1.txz: 100%  566 KiB 580.3kB/s    00:01
pkg: cached package irssi-0.8.17_1: size mismatch, cannot continue
[I meant to add a msg here explaining the refetch.]
Updating myrepo repository catalogue...
Fetching meta.txz: 100%    588 B   0.6kB/s    00:01
Fetching packagesite.txz: 100%  311 KiB 318.6kB/s    00:01
Processing entries: 100%
myrepo repository update completed. 1222 packages processed
The following packages will be fetched:
New packages to be FETCHED:
[...]
The process will require 1 GiB more space.
1 GiB to be downloaded.
Fetching irssi-0.8.17_1.txz: 100%  566 KiB 580.3kB/s    00:01
Fetching irssi-devel-20140530_2.txz: 100%  492 KiB 504.5kB/s    00:01
Fetching irssi-fish-1.00.r5.txz: 100%   52 KiB  54.2kB/s    00:01
Fetching irssi-otr-1.0.0_3.txz: 100%   56 KiB  58.3kB/s    00:01
Fetching irssi-scripts-20131030.txz: 100%  529 KiB 541.9kB/s    00:01
[...]

This succeeds in the 2nd run.

@bdrewery
Copy link
Member Author

bdrewery commented Apr 4, 2015

The patch above updates forever, but nevermind that. It's just a PoC.

@bapt
Copy link
Member

bapt commented Apr 4, 2015

I like it

@davidchisnall
Copy link
Member

goto retry: looks like quite a gratuitous way of implementing a loop. Making it a simple do...while() loop would be cleaner to me. You could also add a counter and give up after n attempts printing a helpful error message.

@bdrewery
Copy link
Member Author

bdrewery commented Apr 6, 2015

Sure. Being a PoC I did not want to add reindent noise into it.

@bdrewery
Copy link
Member Author

The current status is that we need a volunteer to submit a patch that is a full solution. Mine was only for pkg-fetch rather than all needed places. I don't have time to work on this.

@ngie-eign
Copy link

ngie-eign commented Jan 24, 2019

Thinking purely from the perspective of pkgsync, I could handle either of these. For what its worth, there is a symlink to an internal setname variant that is followed by the server:
root@pkg0.isc:/home/pkgsync/pkg-mirror/freebsd:11:x86:32 # ls -l
lrwxr-xr-x 1 5211 5211 43 Aug 15 17:55 latest -> ../pool/freebsd:11:x86:32/latest.1408112311
...
It would take a bit of care, but the specific urls could be exposed if you wanted to go the "smart repo" way.
I don't know what is easiest for personal/private repos, but for pkg.f.o either should be do-able.

Question: where does pkgsync live?

I think @bdrewery's PoC is a good step in fixing this trivial, annoying scenario, and if possible, it would be nice if it was in the next release. There are multiple people that run into this particular problem at least once a month (in the developers@ group), which means there are likely more folks who are running into this issue in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants