Skip to content

Conversation

shawkins
Copy link
Collaborator

@csviri I misread your working draft at some point and thought you were tracking a single latest revision - that's all I need to remove tombstone tracking. If we are not relying on comparable resource versions, then I think we can simply reuse the informer last synced version to reason about all puts - not just creates.

I've left the latest check as non-compiling as that will line up with changes you have in other prs.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2025

boolean moveAhead = false;
String latest = managedInformerEventSource.getLastSyncResourceVersion(resourceId.getNamespace()).orElse(null);
if (latest != null && latest > newResource.getMetadata().getResourceVersion()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that those events (maybe for different resource names) come in order in watches?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess if we going to compare the resource versions in informers that does not matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that ordering is guarenteed. So this is applicable not only to getting rid of tombstones, but putting anything into the cache in general - added that as another simplification.

Also the javadoc on getLastSyncResourceVersion could use some improvement. Updates to the cache are made prior to async event processing. So the underlying cache state should be correct up to the given resourceVersion at the time we check getLastSyncResourceVersion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that ordering is guarenteed. So this is applicable not only to getting rid of tombstones, but putting anything into the cache in general - added that as another simplification.

I see, I know that it is true for a single resource, but for example there are multiple resources in the same type, if that still comes in order, but I guess yes, can also quickly take a look today on some cluster and watch some events.

Thank you, this is great!!

Copy link
Collaborator

@csviri csviri Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( also if you could join us tomorrow on community meeting we could discuss strategy would be great, thinking if these improvements should go to 5.2 or rather the next 5.3 so we can iterate and discuss more, and there is no pressure, since might be good to do a minor release soonish )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that on zoom at 15:00 CEST?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

"Temporarily moving ahead to target version {} for resource id: {}",
newResource.getMetadata().getResourceVersion(),
resourceId);
cache.put(resourceId, newResource);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe to do without further conditions as long as there is no chance that the operator sdk is making effectively concurrent updates to the same resource - if that's possible, then we still need to compare the cache item to what is being put.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted to a comment in the code

var res = cache.get(resourceID);
Optional<R> resource = temporaryResourceCache.getResourceFromCache(resourceID);
if (resource.isPresent()) {
if (resource.isPresent() && res.filter(r -> r.getMetadata().getResourceVersion() < resource.get().getMetadata().getResourceVersion()).isPresent()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If events are processed async from the cache update, it's possible for the informer cache to be more up-to-date than the temporary resource cache.


boolean moveAhead = false;
String latest = managedInformerEventSource.getLastSyncResourceVersion(resourceId.getNamespace()).orElse(null);
if (latest != null && latest > newResource.getMetadata().getResourceVersion()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that on zoom at 15:00 CEST?

@csviri
Copy link
Collaborator

csviri commented Oct 14, 2025

added branch for 5.3 (as we discuss on community meeting) to target resource version comparison related issues:
#2996

thank you!

@shawkins shawkins force-pushed the tombstone_removal branch 4 times, most recently from edbe2b6 to 65649f1 Compare October 19, 2025 19:02
@shawkins
Copy link
Collaborator Author

@csviri just wanted to double check - what if anything do you want in next? I was thinking that only the annotation removal would go into 5.3.

I have enabled parse versions by default, which does need some refinement - perhaps a better name and we lack a configuration value for on the primary resources. If it is not enabled, the the temporary resource cache won't do anything.

There's no smart exception handling yet. If non comparable versions are seen, then you'll just get an exception - the message could mention disabling the feature.

@csviri
Copy link
Collaborator

csviri commented Oct 20, 2025

just wanted to double check - what if anything do you want in next? I was thinking that only the annotation removal would go into 5.3.

Although this will work, in almost all cases, since:

There's no smart exception handling yet. If non comparable versions are seen, then you'll just get an exception - the message could mention disabling the feature.

We might want to ship this as a whole in 5.3.
see also: #3011

So we have also more time and refine it. We can then release 5.3 with those feature quite quickly don't have to wait for other feature. What do you think?

@csviri
Copy link
Collaborator

csviri commented Oct 20, 2025

Do I see that this PR is same as: #3010

Should we close one of those?

Copy link
Collaborator

@csviri csviri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added just one comment, I lean towards 5.3 regarding this feature, and completely remove the flag for parsing fro that version.
Otherwise LGTM

public synchronized void putAddedResource(T newResource) {
putResource(newResource, null);
(unknownState
|| PrimaryUpdateAndCacheUtils.compareResourceVersions(resource, cached) > 0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering #3010 (comment) here - yes we can switch to >= because we're checking canSkipEvent first. I was starting to think more about the implications of the annotation removal and wondering if we should hold on to the resource that was created until it was obsolete, but we'll discuss that more on the annotation removal pr.

@shawkins
Copy link
Collaborator Author

Do I see that this PR is same as: #3010

Should we close one of those?

Closed the other one for now - if enabling parse versions seems like it should be separated, then it will be reopened with all the other changes stripped out.

@shawkins
Copy link
Collaborator Author

Added just one comment, I lean towards 5.3 regarding this feature, and completely remove the flag for parsing fro that version. Otherwise LGTM

I don't think it can remove the flag if you want to allow for non-conforming sources. The configuration would also need expanded to the controller itself.

@csviri
Copy link
Collaborator

csviri commented Oct 20, 2025

Added just one comment, I lean towards 5.3 regarding this feature, and completely remove the flag for parsing fro that version. Otherwise LGTM

I don't think it can remove the flag if you want to allow for non-conforming sources. The configuration would also need expanded to the controller itself.

My idea is that we would detect non-conforming sources dynamically, so if the comparison algorithm throws an error we can flip a flag and don't compare the resources anymore.

@csviri
Copy link
Collaborator

csviri commented Oct 20, 2025

But for sake of simplicity we could configure it on InformerEventSources for now. (maybe as next PR)

@shawkins
Copy link
Collaborator Author

My idea is that we would detect non-conforming sources dynamically, so if the comparison algorithm throws an error we can flip a flag and don't compare the resources anymore.

That could still be two modes of operation:

  • strict (the default)
  • lenient

Related to that in this pr there is: https://github.com/operator-framework/java-operator-sdk/pull/2989/files#diff-0b20da91b3c31881f7f78a0ecffefc3811e1916f8e401a047d335fffa8593b71R91-R97

Which is lenient, or it could be done as a breaking change an throw an exception.

@csviri
Copy link
Collaborator

csviri commented Oct 21, 2025

  • lenient

Yes, I think lenient is fine, but maybe this would require more discussions. (therefore proposing to move also this pr agains 5.3 branch, and to that version.

Related to that in this pr there is: https://github.com/operator-framework/java-operator-sdk/pull/2989/files#diff-0b20da91b3c31881f7f78a0ecffefc3811e1916f8e401a047d335fffa8593b71R91-R97

I guess the null check is not enough rather a validation of that version, and catching the NonComparableResourceVersionException also at

 public synchronized void putAddedResource(T newResource) {
    putResource(newResource, null);
            (unknownState
                    || PrimaryUpdateAndCacheUtils.compareResourceVersions(resource, cached) >= 0)
                ? null
                : cached);
  }

But the configuration if want to have explicit, should not be on controller level, rather on InformerEventSource+ControllerEventSource level.

Otherwise this looks great!!

@shawkins
Copy link
Collaborator Author

Yes, I think lenient is fine, but maybe this would require more discussions. (therefore proposing to move also this pr agains 5.3 branch, and to that version.

Ok, I'll target to the 5.3 branch.

I guess the null check is not enough rather a validation of that version

Null - seems like a simple programatic error, which was already present in the external example
Non-null, but invalid - seems like a non-conforming source, so it seems fine to allow the NonComparableResourceVersionException

and catching the NonComparableResourceVersionException also at

Doesn't it make more sense to start with enabled / disabled behavior, then add a lenient mode later if users want that instead of strictly disabled?

@csviri
Copy link
Collaborator

csviri commented Oct 21, 2025

Doesn't it make more sense to start with enabled / disabled behavior, then add a lenient mode later if users want that instead of strictly disabled?

I'm fine with that, as far it is per InformerEventSource.

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
@shawkins shawkins closed this Oct 21, 2025
@shawkins shawkins reopened this Oct 21, 2025
@shawkins shawkins closed this Oct 21, 2025
@shawkins shawkins mentioned this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants