-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail To Restore ImageStreamTags: Error retrieving cluster version of imagestreamtags #204
Comments
Does anyone have any thoughts on this? Let me know if there is any other info I can provide. If this isn't the right forum for this question, let me know where is a better place for it. Thanks. |
This is right forum. Thanks! |
We will be attempting to reproduce this. |
@kaovilai any luck with reproducing this? |
We had some discussions around what could be happening but afaict no one reproduced it yet. Are you able to provide us a reproducing case? Or is this happening one of somewhere? |
Other than the provided |
We would like a step to reproduce from a clean new cluster if possible. As provided, we only know what's being restored, don't know what's in the cluster during backup or prior to restore. |
@kaovilai I am able to reproduce this issue on CRC 2.27 with OADP 1.1.6, by setting the Is it not possible to restore image stream and its image stream tags without backing up images, even though the image stream only references a public image, like the one here? Also, I am under the impression that backup images only work with S3. Is that true? In our case, we aren't using S3. Version info: ➜ crc version
CRC version: 2.27.0+71615e
OpenShift version: 4.13.12
Podman version: 4.4.4
➜ k -n openshift-adp get csv
NAME DISPLAY VERSION REPLACES PHASE
oadp-operator.v1.1.6 OADP Operator 1.1.6 oadp-operator.v1.1.5 Succeeded
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
<snipped>
creationTimestamp: "2023-10-04T15:49:35Z"
generation: 3
name: velero
namespace: openshift-adp
resourceVersion: "415206"
uid: e123e94a-6d1b-4e1a-ac49-efb1ef0ab1fd
spec:
backupImages: false
backupLocations:
- velero:
config:
profile: default
region: ***
credential:
key: cloud
name: cloud-credentials
default: true
objectStorage:
bucket: ***
prefix: ***
provider: aws
configuration:
restic:
enable: false
velero:
defaultPlugins:
- openshift
- aws
status:
conditions:
- lastTransitionTime: "2023-10-04T15:49:35Z"
message: Reconcile complete
reason: Complete
status: "True"
type: Reconciled Velero backup and restore YAML: apiVersion: velero.io/v1
kind: Backup
metadata:
annotations:
velero.io/source-cluster-k8s-gitversion: v1.26.7+0ef5eae
velero.io/source-cluster-k8s-major-version: "1"
velero.io/source-cluster-k8s-minor-version: "26"
creationTimestamp: "2023-10-05T19:44:36Z"
generateName: backup-
generation: 5
labels:
velero.io/storage-location: velero-1
name: backup-cnc7h
namespace: openshift-adp
resourceVersion: "416565"
uid: 3aa8c6a1-3231-42ea-8531-4437d93fa445
spec:
csiSnapshotTimeout: 10m0s
defaultVolumesToRestic: false
excludedResources: []
hooks: {}
includedNamespaces:
- isim-dev
includedResources: []
storageLocation: velero-1
ttl: 720h0m0s
status:
completionTimestamp: "2023-10-05T19:44:59Z"
expiration: "2023-11-04T19:44:37Z"
formatVersion: 1.1.0
phase: Completed
progress:
itemsBackedUp: 45
totalItems: 45
startTimestamp: "2023-10-05T19:44:37Z"
version: 1
---
apiVersion: velero.io/v1
kind: Restore
metadata:
creationTimestamp: "2023-10-05T19:47:35Z"
generateName: restore-
generation: 5
name: restore-4d6bn
namespace: openshift-adp
resourceVersion: "417228"
uid: 79eed167-70a1-4a0e-88dd-a571ca589a34
spec:
backupName: backup-cnc7h
excludedResources:
- persistentvolumeclaims
- persistentvolumes
- services
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
- csinodes.storage.k8s.io
- volumeattachments.storage.k8s.io
includeClusterResources: false
includedResources: []
namespaceMapping:
isim-dev: isim-dev-restored
restorePVs: false
status:
completionTimestamp: "2023-10-05T19:47:37Z"
errors: 12
phase: PartiallyFailed
progress:
itemsRestored: 45
totalItems: 45
startTimestamp: "2023-10-05T19:47:35Z"
warnings: 6 |
@ihcsim "Is it not possible to restore image stream and its image stream tags without backing up images, even though the image stream only references a public image, like the one here?" The image copy functionality only copies tags for which the |
Right. Turning off backupImages stops imagestream images copy logic but don't prevent kube resource from being included as a kube resource in backup |
It should work for some configuration of gcp and azure as well however s3 is most tested configuration. Still, having backupImages false should mean the imagestream functions don't do any internal image copying. Should be the same as if openshift velero plugin isn't added.. Can you try if you can repeat the issue without openshift plugin? |
Yes, this is what I'd expect, but it's not what I am seeing.
As in just plain Velero? How does that help with my original issue, which involves OADP? |
FWIW, I notice that in my example image stream, the |
Even with oadp installing velero you should be able to remove openshift from dpa |
@kaovilai without openshift plugin won't be exactly the same. In that case we'll attempt to restore kube resources for internal images as well, which may result in imagestreamtags created for images that don't actually exist. |
@ihcsim |
Root cause analysis: // Get retrieves an image that has been tagged by stream and tag. `id` is of the format <stream name>:<tag>.
func (r *REST) Get(ctx context.Context, id string, options *metav1.GetOptions) (runtime.Object, error) {
name, tag, err := nameAndTag(id)
if err != nil {
return nil, err
}
imageStream, err := r.imageStreamRegistry.GetImageStream(ctx, name, options)
if err != nil {
return nil, err
}
image, err := r.imageFor(ctx, tag, imageStream) // <-- this is returning an IsNotFoundError
if err != nil {
return nil, err
}
return newISTag(tag, imageStream, image, false)
} which calls
which is caused by TagEvent not yet existing due to network speeds etc in processing the imagestreamtag creation // LatestTaggedImage returns the most recent TagEvent for the specified image
// repository and tag. Will resolve lookups for the empty tag. Returns nil
// if tag isn't present in stream.status.tags. |
Should be resolved by vmware-tanzu/velero#6949 |
@kaovilai The changes in your PR fixes the issue. For some larger images with multiple tags (such as the OpenShift Postgres sample), we had to bump the retry cap to between 3 to 5 minutes, on different clusters. I don't know if we want to make this a configurable parameter, or if we can just pick a higher cap like 5 mins and then document this issue in the README. LMKWYT. |
Ack. Depending on if velero is receptive we might still have to make a hack in openshift plugin |
@kaovilai If we do it in the plugin, I guess we won't need to worry about retries or waiting. If the "dry run" create fails with AlreadyExists, we just return with discarding the istag. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Closed via vmware-tanzu/velero#7004 |
While attempting to restore imagestreamtags using OADP 1.1.3 on OCP 4.13.9, my restore fail with the following errors:
Error retrieving cluster version of imagestreamtags.image.openshift.io
.This is what the relevant log lines look like:
Although the
Restore
was completed with thePartiallyFailed
phase:I could see that the image streams and image stream tags were restored to the target namespace:
I am just wondering why Velero would report the imagestreamtags as not found here, when it does look like the imagestreamtags are restored.
The error happens to all the tags of the image:
FWIW, the pgsql imagestream used for testing is one from the out-of-box
openshift
namespace, without any customization in the spec. I also tried with different images, and all failed with the same error.(expand to see YAML)
The text was updated successfully, but these errors were encountered: