Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to upgrade fresh F37 installation #405

Closed
oscfdezdz opened this issue Jan 29, 2023 · 18 comments
Closed

Unable to upgrade fresh F37 installation #405

oscfdezdz opened this issue Jan 29, 2023 · 18 comments
Labels
bug Something isn't working f37 Related to Fedora 37 kinoite Also affect Fedora Kinoite

Comments

@oscfdezdz
Copy link

To Reproduce
Please describe the steps needed to reproduce the bug:

  1. Write Fedora Silverblue 37 image to a USB with Fedora Media Writer
  2. Install Fedora Silverblue 37
  3. Run rpm-ostree upgrade
  4. Get error: While pulling fedora/37/x86_64/silverblue: Server returned HTTP 404

Expected behavior
Upgrade the system.

OS version:

$ rpm-ostree status -b
State: idle
BootedDeployment:
● fedora:fedora/37/x86_64/silverblue
                  Version: 37.1.7 (2022-11-05T06:01:00Z)
               BaseCommit: bfe9de223c9a4ba4a793d3e01f6b09024c919685ee73c896af767958725cac79
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A
      RemovedBasePackages: firefox firefox-langpacks 106.0.1-1.fc37 gnome-terminal-nautilus gnome-terminal 3.45.90-1.fc37 gnome-tour 43.0-1.fc37
          LayeredPackages: ddcutil gnome-console langpacks-es
@oscfdezdz oscfdezdz added the bug Something isn't working label Jan 29, 2023
@box-dev
Copy link

box-dev commented Jan 29, 2023

Same for me on 37.20230127.0. Looks like the server has issues!

May be unrelated but when gnome-software fired up on login just now, it went crazy and wrote out like 40Gb of data, swamping I/O. Coincidence or a buggy reaction to a failure to check updates?

@scottsweb
Copy link

Also having this issue today with version 37.20221231.0. rpm-ostree upgrade is throwing:

error: While pulling fedora/37/x86_64/silverblue: Server returned HTTP 404

@henriquepicanco
Copy link

I tried to install Silverblue at 9:30 am (2023/01/29—GMT-03) and had the same issue!

@mberlinger3
Copy link

Having the same issue here. Surprisingly rpm-ostree refresh-md -f works fine... Guess the server just is doing some maintenance

@cgwalters
Copy link

cgwalters commented Jan 29, 2023

Exciting; I don't have admin privileges on the repo but I'm guessing this is related to recent pruner runs. I saw coreos/fedora-coreos-releng-automation@9ef521a go by and I suspect that the prod ref was aliased and got removed?

I think it should just be a matter of re-creating the alias to fedora/37/x86_64/updates/silverblue.

@mmBesar
Copy link

mmBesar commented Jan 29, 2023

I have the same problem!

@kpbaks
Copy link

kpbaks commented Jan 29, 2023

Same issue when trying to rebase from kinoite 36 to 37
error: While pulling fedora/37/x86_64/kinoite: Server returned HTTP 404

@travier travier added f37 Related to Fedora 37 kinoite Also affect Fedora Kinoite labels Jan 29, 2023
@nullusionist
Copy link

Same issue here trying to upgrade Silverblue (37)
error: While pulling fedora/37/x86_64/silverblue: Server returned HTTP 404

@dustymabe
Copy link

hmm. I just ran an rpm-ostree upgrade and it didn't give me a 404. Are people still able to observe this problem?

@dustymabe
Copy link

dustymabe commented Jan 29, 2023

Exciting; I don't have admin privileges on the repo but I'm guessing this is related to recent pruner runs. I saw coreos/fedora-coreos-releng-automation@9ef521a go by and I suspect that the prod ref was aliased and got removed?

At least the logs from the pruner for F37 SB would indicate otherwise:

2023-01-29 00:56:00,676 INFO fedora-ostree-pruner - Skipping ref fedora/37/x86_64/silverblue in repo /mnt/koji/ostree/repo. Policy is to keep all commits.
2023-01-29 00:56:00,677 INFO fedora-ostree-pruner - Skipping ref fedora/37/x86_64/updates/silverblue in repo /mnt/koji/ostree/repo. Policy is to keep all commits.
2023-01-29 00:56:00,679 INFO fedora-ostree-pruner - Pruning the fedora/37/x86_64/testing/silverblue ref in repo /mnt/koji/ostree/repo to time:90
2023-01-29 00:56:00,679 INFO fedora-ostree-pruner - Running command: ['ostree', 'prune', '--repo', '/mnt/koji/ostree/repo', '--commit-only', '--only-branch', 'fedora/37/x86_64/testing/silverblue', '--refs-only', '--keep-younger-than=90 days ago']
Total (commit only) objects: 26227
Deleted 181 objects, 13.2 MB freed

though there was a later error when running ostree summary -u, which have lead to some issues:

2023-01-29 01:11:02,625 INFO fedora-ostree-pruner - Deleting the fedora/30/x86_64/testing/silverblue ref in repo /mnt/koji/ostree/repo.
2023-01-29 01:11:02,625 INFO fedora-ostree-pruner - Running command: ['ostree', 'refs', '--repo', '/mnt/koji/ostree/repo', '--delete', 'fedora/30/x86_64/testing/silverblue']
2023-01-29 01:11:02,684 INFO fedora-ostree-pruner - Running command: ['ostree', 'summary', '--repo', '/mnt/koji/ostree/repo', '-u']
error: No such metadata object 619554b37ce12caccb9425779aca709a40fe0050d0a74fc692e34b0052a5c9f0.commit
2023-01-29 01:11:03,001 ERROR fedora-ostree-pruner - Running command returned bad exitcode: 1

We've seen this error before and @jlebon and I have been trying to understand what causes it.

Though, as mentioned in my previous comment I'm not observing any 404 errors now.

@scottsweb
Copy link

It is still 404ing for me. Perhaps it is an issue with a particular mirror?

@dustymabe
Copy link

I think maybe the reason I wasn't getting a 404 is because I actually updated my local system yesterday and it was completely up to date, so I didn't need to pull any content from the mirror.

I'm investigating further.

@dustymabe
Copy link

dustymabe commented Jan 29, 2023

OK I think I've resolved the issue by re-importing the commits from the compose repo:

sh-5.2$ ostree --repo=/mnt/koji/ostree/repo rev-parse fedora/37/x86_64/silverblue
f3e6e1dff39f0c33b2a314a18051c7bf896addfe6ae3ecef1ad97672a6f12fdd
sh-5.2$ ostree --repo=/mnt/koji/compose/ostree/repo rev-parse fedora/37/x86_64/silverblue
f3e6e1dff39f0c33b2a314a18051c7bf896addfe6ae3ecef1ad97672a6f12fdd
sh-5.2$
sh-5.2$ ostree --repo=/mnt/koji/ostree/repo pull-local /mnt/koji/compose/ostree/repo f3e6e1dff39f0c33b2a314a18051c7bf896addfe6ae3ecef1ad97672a6f12fdd 
1 metadata, 0 content objects imported; 0 bytes content written
## Also needed to copy over the signature
sh-5.2$ ostree --repo=/mnt/koji/ostree/repo pull-local /mnt/koji/compose/ostree/repo 214dfba21c68bba96ba1e94d3d4a7ddb8bae658a7cc92b7313b4088caa5a816e
Scanning metadata: 7056



sh-5.2$ ostree --repo=/mnt/koji/ostree/repo rev-parse fedora/37/aarch64/silverblue
50fd6768f076e0bdd6e049b2c28d97432fbdd5026825754fbe7cb31fa227278f
sh-5.2$ ostree --repo=/mnt/koji/compose/ostree/repo rev-parse fedora/37/aarch64/silverblue
50fd6768f076e0bdd6e049b2c28d97432fbdd5026825754fbe7cb31fa227278f
sh-5.2$ 
sh-5.2$ ostree --repo=/mnt/koji/ostree/repo pull-local /mnt/koji/compose/ostree/repo fedora/37/aarch64/silverblue
1 metadata, 0 content objects imported; 0 bytes content written

the fedora/37/ppc64le/silverblue seemed fine.

Similarly for kinoite:

sh-5.2$ ostree --repo=/mnt/koji/ostree/repo rev-parse fedora/37/x86_64/kinoite    
3c16959fb552440f79c0ddc7e4f68587ef9111bb8c6df9d9502e69c55d4e9577
sh-5.2$ ostree --repo=/mnt/koji/compose/ostree/repo rev-parse fedora/37/x86_64/kinoite
3c16959fb552440f79c0ddc7e4f68587ef9111bb8c6df9d9502e69c55d4e9577
sh-5.2$ 
sh-5.2$ ostree --repo=/mnt/koji/ostree/repo pull-local /mnt/koji/compose/ostree/repo fedora/37/x86_64/kinoite    
Scanning metadata: 10988                                                                                                                     
# not sure what this was needed
sh-5.2$ ostree --repo=/mnt/koji/ostree/repo pull-local /mnt/koji/compose/ostree/repo 3cf1fe94e740bd5c4918268b12eca5e8adf93f5407cb222a5c4f4b464a7d75b8
1 metadata, 0 content objects imported; 0 bytes content written


sh-5.2$ ostree --repo=/mnt/koji/ostree/repo rev-parse fedora/37/aarch64/kinoite
619554b37ce12caccb9425779aca709a40fe0050d0a74fc692e34b0052a5c9f0
sh-5.2$ ostree --repo=/mnt/koji/compose/ostree/repo rev-parse fedora/37/aarch64/kinoite
619554b37ce12caccb9425779aca709a40fe0050d0a74fc692e34b0052a5c9f0
 
sh-5.2$ ostree --repo=/mnt/koji/ostree/repo pull-local /mnt/koji/compose/ostree/repo fedora/37/aarch64/kinoite
1 metadata, 0 content objects imported; 0 bytes content written                                                                             
# not sure why this was needed 
sh-5.2$ ostree --repo=/mnt/koji/ostree/repo pull-local /mnt/koji/compose/ostree/repo 6f6ad875fc14d19703a511d34f59c56e7a2a16e47457c0c386ca4a7c005ef3ce

the fedora/37/ppc64le/kinoite seemed fine.

@mmBesar
Copy link

mmBesar commented Jan 29, 2023

It's working fine now, just updated my system without an issue.
Thanks.

@travier
Copy link
Member

travier commented Jan 29, 2023

Thanks @dustymabe for the Sunday fix!

@nullusionist
Copy link

Double thanks @dustymabe! Confirm upgrade is functional here.

@scottsweb
Copy link

Can confirm all is well for me too. Thanks @dustymabe! 🙇

@travier travier closed this as completed Jan 30, 2023
jlebon added a commit to jlebon/ostree that referenced this issue Jan 30, 2023
When we calculate the reachability set in `ostree prune`, we do this
without any locking. This means that between the time we build the set
and when we call `ostree_repo_prune_from_reachable`, new content
might've been added. This then causes us to immediately prune that
content since it's not in the now outdated set.

Fix this by calculating the set under an exclusive lock.

I think this is what happened in
fedora-silverblue/issue-tracker#405. While
the pruner was running, the `new-updates-sync` script[1] was importing
content into the repo. The newly imported commits were immediately
deleted by the many `ostree prune --commit-only` calls the pruner does,
breaking the refs.

[1] https://pagure.io/fedora-infra/ansible/blob/35b35127e444/f/roles/bodhi2/backend/files/new-updates-sync#_18
@jlebon
Copy link
Member

jlebon commented Jan 30, 2023

Just to close the loop on this, I think ostreedev/ostree#2808 should fix the underlying issue so this doesn't happen again.

jlebon added a commit to jlebon/ostree that referenced this issue Jan 30, 2023
When we calculate the reachability set in `ostree prune`, we do this
without any locking. This means that between the time we build the set
and when we call `ostree_repo_prune_from_reachable`, new content
might've been added. This then causes us to immediately prune that
content since it's not in the now outdated set.

Fix this by first listing the objects eligible for pruning, then
calculating the set, and then passing both the set and the object list
to the prune API.

I think this is what happened in
fedora-silverblue/issue-tracker#405. While
the pruner was running, the `new-updates-sync` script[1] was importing
content into the repo. The newly imported commits were immediately
deleted by the many `ostree prune --commit-only` calls the pruner does,
breaking the refs.

[1] https://pagure.io/fedora-infra/ansible/blob/35b35127e444/f/roles/bodhi2/backend/files/new-updates-sync#_18
jlebon added a commit to jlebon/ostree that referenced this issue Jan 30, 2023
When we calculate the reachability set in `ostree prune`, we do this
without any locking. This means that between the time we build the set
and when we call `ostree_repo_prune_from_reachable`, new content
might've been added. This then causes us to immediately prune that
content since it's not in the now outdated set.

Fix this by first listing the objects eligible for pruning, then
calculating the set, and then passing both the set and the object list
to the prune API.

I think this is what happened in
fedora-silverblue/issue-tracker#405. While
the pruner was running, the `new-updates-sync` script[1] was importing
content into the repo. The newly imported commits were immediately
deleted by the many `ostree prune --commit-only` calls the pruner does,
breaking the refs.

[1] https://pagure.io/fedora-infra/ansible/blob/35b35127e444/f/roles/bodhi2/backend/files/new-updates-sync#_18
dustymabe added a commit to dustymabe/fedora-coreos-releng-automation that referenced this issue Jan 31, 2023
Sometimes we use the pods for the importer and pruner to inspect and
fix issues with the OSTree repos since it's the most efficient way
for us to gain read/write access to the repos. Recently I was restoring
some data that was lost in a recent prune [1] and the `ostree pull`
operation I was running failed with an obscure error message:

```
error: Commit 1eb251bced7652f2f486c15447a7bf00238ef0ea172b4f214d2684ecfbeb2c40:
GPG: GPG: Failed to import key: GPGME: System error w/o errno
```

It turns out the many gpg processes that get forked during a pull
operation to verify the signatures for the commits don't get cleaned
up:

```
1000720+   11737       1  0 14:59 pts/4    00:00:00 [gpg] <defunct>
1000720+   11740       1  0 14:59 pts/4    00:00:00 [gpg] <defunct>
1000720+   11743       1  0 14:59 pts/4    00:00:00 [gpg] <defunct>
1000720+   11746       1  0 14:59 pts/4    00:00:00 [gpg] <defunct>
...
```

Let's use `dumb-init` to reap these defunct processes.

[1] fedora-silverblue/issue-tracker#405 (comment)
dustymabe added a commit to coreos/fedora-coreos-releng-automation that referenced this issue Jan 31, 2023
Sometimes we use the pods for the importer and pruner to inspect and
fix issues with the OSTree repos since it's the most efficient way
for us to gain read/write access to the repos. Recently I was restoring
some data that was lost in a recent prune [1] and the `ostree pull`
operation I was running failed with an obscure error message:

```
error: Commit 1eb251bced7652f2f486c15447a7bf00238ef0ea172b4f214d2684ecfbeb2c40:
GPG: GPG: Failed to import key: GPGME: System error w/o errno
```

It turns out the many gpg processes that get forked during a pull
operation to verify the signatures for the commits don't get cleaned
up:

```
1000720+   11737       1  0 14:59 pts/4    00:00:00 [gpg] <defunct>
1000720+   11740       1  0 14:59 pts/4    00:00:00 [gpg] <defunct>
1000720+   11743       1  0 14:59 pts/4    00:00:00 [gpg] <defunct>
1000720+   11746       1  0 14:59 pts/4    00:00:00 [gpg] <defunct>
...
```

Let's use `dumb-init` to reap these defunct processes.

[1] fedora-silverblue/issue-tracker#405 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working f37 Related to Fedora 37 kinoite Also affect Fedora Kinoite
Projects
None yet
Development

No branches or pull requests