Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kubernetes config fail-fast and retry #873

Merged
merged 7 commits into from Oct 26, 2021

Conversation

isikerhan
Copy link
Contributor

Hello,

This PR adds support for fail fast and retry loading Kubernetes ConfigMaps and Secrets to resolve issues #769 and #441. There are some PRs around about these, but they seem to be stalled.

Thank you.

@isikerhan isikerhan changed the title Kubernetes Config Fail-fast and Retry Support Support Kubernetes config fail-fast and retry Sep 28, 2021
@wind57
Copy link
Contributor

wind57 commented Sep 29, 2021

this is very nice! I would very much like to review once I have a bit more time.

@ryanjbaxter
Copy link
Contributor

This looks really good to me. Would you mind pulling in the latest changes to main so we can try and run the ci build again?

@ryanjbaxter ryanjbaxter added this to In progress in 2020.0.x via automation Sep 30, 2021
@isikerhan
Copy link
Contributor Author

This looks really good to me. Would you mind pulling in the latest changes to main so we can try and run the ci build again?

Pulled in the latest changes, and the build succeeded. @ryanjbaxter

@@ -578,17 +595,30 @@ the `Secret` named `s1` would be looked up in the namespace that the application
See <<namespace-resolution,namespace-resolution>> to get a better understanding of how the namespace
of the application is resolved.

<<config-map-fail-fast,Similar to the `ConfigMaps`>>; if you want your application to fail to start
when it is unable to load `Secrets` property sources, you can set `spring.cloud.kubernetes.secrets.fail-fast=true`.
Copy link
Contributor

@wind57 wind57 Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, but this looks very confusing to me. We say: "if I can't read a Secret and spring.cloud.kubernetes.secrets.fail-fast=true, we will fail without retrying". So let's take a step back and see what we do now, without this PR in place:

  • if we can't read a Secret, we will ignore this Exception and continue to start the application.

This is what would happen if spring.cloud.kubernetes.secrets.fail-fast=false (which is the default), which is good because it means we preserve backwards compatibility in user expectations.

Now if we enable spring.cloud.kubernetes.secrets.fail-fast=true and we can't read a property, we fail - which is again very nice and without surprises.

Now, we want to enable retry and for that we say:

  • spring.cloud.kubernetes.secrets.fail-fast=true
  • put retry and aop on classpath.

Isn't this confusing to you? The different between retry or not is something on the classpath. Both cases mandate the presence of spring.cloud.kubernetes.secrets.fail-fast=true, let me emphasize the fail-fast in the naming, which, depending on what you have on the classpath is not a "fail fast"?

As a user of this I would expect a fail-fast with true/false and a retry with true/false.

I am missing something?


This btw has some more implications that I have not yet reviewed entirely. For example the: ConditionalOnConfigMapPropertiesRetryEnabled. Look at the name ...RetryEnabled with a name = "fail-fast", havingValue = "true". So both retry and a fail fast? As an exercise (to see if I am not the real problem here), I explained this to 2 of my colleagues at work and they had the same confusion.

I bet there is something I miss here, so please advise.

Copy link
Contributor

@wind57 wind57 Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this and as a matter of fact found a real case in our repository.

We have spring-cloud-kubernetes and spring-retry in the same project. If we merge this the way it is and for whatever reason I would like to say "I'd like to fail if I can't read a Secret without ever retrying", I will be stuck as I can't do that. Essentially:

  • I already have spring-retry and aop on the classpath because of business implications.
  • I would specify spring.cloud.kubernetes.secrets.fail-fast=true

This case will indeed be documented, no question about it. At the same time, I can't opt out of it.

Unless there is an explicit flag to control both the fail and retries, someone might not like this. We can implement them the same way we implement "Normalized Sources" where one flag takes precedence over the other.

So we could have two flags: fail-on-error and retry-on-failure (or some better names). If fail-on-error=false, it becomes irrelevant what the value of retry-on-failure is (we could log a warn message for this case, as we do in some other logically related things). If fail-on-error=false of course it matters what retry-on-failure is.

I do not think that what I am proposing is much different what is already done. Just to add one more flag (those @ConditionalOnClass still stay).

What do you guys think? Of course there is always the option to leave as-is and take a small enhancement task.

Copy link
Contributor Author

@isikerhan isikerhan Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a separate flag for retry was something I've considered during the development. Spring Cloud Config Client and Spring Cloud Kubernetes Config do pretty much the same thing. One loads config from a config server, the other loads them from the Kubernetes objects. And I would expect them to behave as similar as possible as a user. So I ended up sticking with a very similar design to the config client to keep the ecosystem consistent. The name "fail-fast" also comes from there :) I don't know, I may be wrong with this approach. Btw I don't think that adding one more flag is such a big deal and will create a huge inconsistency. IMHO, we can still change the design unless we get far away from the way other modules do similar things. What do you think @ryanjbaxter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you came to the conclusion that adding a flag to disable retry when fail fast was true and spring retry was on the classpath was a good option to have?

@Inherited
@ConditionalOnKubernetesEnabled
@ConditionalOnKubernetesConfigEnabled
@ConditionalOnProperty(prefix = ConfigMapConfigProperties.PREFIX, name = "fail-fast", havingValue = "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we extract @ConditionalOnProperty(prefix = ConfigMapConfigProperties.PREFIX, name = "fail-fast", havingValue = "true") to its own annotation called ConditionalOnKubernetesConfigRetryEnabled and use that here:

@ConditionalOnKubernetesEnabled
@ConditionalOnKubernetesConfigEnabled
@ConditionalOnKubernetesConfigRetryEnabled

reads pretty awesome, to me. what do you think?

Copy link
Contributor Author

@isikerhan isikerhan Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the names @ConditionalOnConfigMapPropertiesRetryEnabled and @ConditionalOnKubernetesConfigRetryEnabled both refer to the same logical condition. However, I like the name you suggested; it's clearer and a bit shorter. Maybe we can rename @ConditionalOnConfigMapPropertiesRetryEnabled itself to @ConditionalOnKubernetesConfigRetryEnabled and extract @ConditionalOnProperty(prefix = ConfigMapConfigProperties.PREFIX, name = "fail-fast", havingValue = "true") to something like @ConditionalOnKubernetesConfigFailFastEnabled. It would look like the following:

ConditionalOnKubernetesConfigRetryEnabled .java:

@ConditionalOnKubernetesEnabled
@ConditionalOnKubernetesConfigEnabled
@ConditionalOnKubernetesConfigFailFastEnabled
public @interface ConditionalOnKubernetesConfigRetryEnabled {
// ...

ConditionalOnKubernetesConfigFailFastEnabled.java

@ConditionalOnProperty(prefix = ConfigMapConfigProperties.PREFIX, name = "fail-fast", havingValue = "true")
public @interface ConditionalOnKubernetesConfigFailFastEnabled {
// ...

The only thing that I don't like with this approach is the extracted condition @ConditionalOnKubernetesConfigFailFastEnabled will not check if Kubernetes itself or Kubernetes config is enabled or not. This could make it a bit useless when someone wants it to use in another place since it does not actually check if "Kubernetes config fail-fast" feature is enabled, it only checks if the property *.fail-fast is equal to true.

Copy link
Contributor

@wind57 wind57 Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConditionalOnKubernetesEnabled
@ConditionalOnKubernetesConfigEnabled
@ConditionalOnKubernetesConfigFailFastEnabled
public @interface ConditionalOnKubernetesConfigRetryEnabled {

this! very much this; good name, I like it more than what I proposed initially.

As to your concern, look at ConditionalOnKubernetesConfigEnabled it does not check for kubernetes either. If you want to be extra safe, document it properly explaining that it does not mandate kubernetes or config, but otherwise ConditionalOnKubernetesConfigEnabled simply states exactly what it does, no other guarantees, as such. imho, we should be good.

btw, if you agree to pick this change, we should also rename @Bean("configMapPropertiesRetryInterceptor") and @Bean("secretsPropertiesRetryInterceptor"). Just a heads-up.

return retryOperationsInterceptor(configProperties.getRetry());
}

@Bean("secretsPropertiesRetryInterceptor")
Copy link
Contributor

@wind57 wind57 Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand initially why you have @Bean("secretsPropertiesRetryInterceptor"), but darn this is interesting. I don't know why but I would have expected for @Retryable to take an array of interceptors (I actually never thought about it in this manner)? And take the first one present in context? This way, you could have had two clean beans here:


@Bean
@ConditionalOnSecretsPropertiesRetryEnabled
public RetryOperationsInterceptor secretsPropertiesRetryInterceptor(SecretsConfigProperties configProperties) {
	return retryOperationsInterceptor(configProperties.getRetry());
}

@Bean
@ConditionalOnSecretsPropertiesRetryDisabled
public RetryOperationsInterceptor secretsPropertiesNoRetryInterceptor() {
	return RetryInterceptorBuilder.stateless().retryPolicy(new NeverRetryPolicy()).build();
}

and @Retryable(interceptors = { "secretsPropertiesRetryInterceptor", "secretsPropertiesNoRetryInterceptor" }).

This is pure theoretical and I do not know what I am missing, but looking at this hack here, I would have wanted that to be a reality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, @Retryable doesn't work like that. It takes a single interceptor bean name and tries to get the bean with that name from the bean factory if the bean name is not empty. And if the bean itself is not present in the bean factory, then it fails with NoSuchBeanDefinitionException. (see AnnotationAwareRetryOperationsInterceptor.java) I'd expect it to work normally without retrying in this case but apparently, it doesn't work like that.

Copy link
Contributor

@wind57 wind57 Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or like that, yes. Either way, this idea you made here is nice, thank you for the follow-up.

@Conditional(ConditionalOnKubernetesConfigPropertiesRetryEnabled.OnKubernetesConfigPropertiesRetryEnabled.class)
public @interface ConditionalOnKubernetesConfigPropertiesRetryEnabled {

class OnKubernetesConfigPropertiesRetryEnabled extends AnyNestedCondition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I am starting to sound like a broken record here :| what do you think renaming this to : ConditionalOnKubernetesConfigOrSecretsRetryEnabled?

@Conditional(ConditionalOnConfigMapPropertiesRetryDisabled.OnConfigMapPropertiesRetryDisabled.class)
public @interface ConditionalOnConfigMapPropertiesRetryDisabled {

class OnConfigMapPropertiesRetryDisabled extends NoneNestedConditions {
Copy link
Contributor

@wind57 wind57 Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time I have used or reviewed some code with NoneNestedConditions, this in turn sparked my interest. I removed this part of the code below (after cloning your repo locally) :

@ConditionalOnConfigMapPropertiesRetryEnabled
static class OnConfigMapPropertiesRetryEnabled {

}

did a mvn clean package on the whole project and to my surprise no test failed. :) Interesting. Took a while, but I found no way to make such a test. Ideally, such a test should be present in KubernetesBootstrapConfigurationTests::SecretsFailFastEnabled::shouldDefineRequiredBeans. The problem goes back to the one that we already talked before, that @Retryable and the need for the same interceptor name for the case when we want and not, to retry.

On the other hand, this exercise made me go through the code and tests in much more detail and appreciate the effort you have put into this.

Copy link
Contributor Author

@isikerhan isikerhan Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually is about the order of the bean definitions in KubernetesBootstrapConfiguration.RetryConfiguration. It'd work properly even without annotating configMapPropertiesRetryInterceptorNoRetry method with @ConditionalOnConfigMapPropertiesRetryDisabled just beacuse of the order they're defined. If you change the bean definition order like below, some tests must start failing as you expected:

@Bean("configMapPropertiesRetryInterceptor")
@ConditionalOnConfigMapPropertiesRetryDisabled
public RetryOperationsInterceptor configMapPropertiesRetryInterceptorNoRetry() {
	return RetryInterceptorBuilder.stateless().retryPolicy(new NeverRetryPolicy()).build();
}

@Bean
@ConditionalOnConfigMapPropertiesRetryEnabled
public RetryOperationsInterceptor configMapPropertiesRetryInterceptor(
		ConfigMapConfigProperties configProperties) {
	return retryOperationsInterceptor(configProperties.getRetry());
}

At this point, if you put the code block that you removed back, the tests must start running properly again. With a correct ordering, @ConditionalOnConfigMapPropertiesRetryDisabled may seem a little bit useless, but I don't really like to rely on the order of something. @ConditionalOnConfigMapPropertiesRetryDisabled is just for extra safety.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I had to take a closer look today and this is the same conclusion I came to. thank you.

@ConditionalOnKubernetesEnabled
@ConditionalOnKubernetesSecretsEnabled
@ConditionalOnProperty(prefix = SecretsConfigProperties.PREFIX, name = "fail-fast", havingValue = "true")
public @interface ConditionalOnSecretsPropertiesRetryEnabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its probably obvious, but the same renamings we talked about for ConfigMap are going to be applied here too. Just making sure we do not miss it :)

@@ -80,6 +84,12 @@ protected abstract MapPropertySource getMapPropertySource(String applicationName
return null;
}

@Override
@Retryable(interceptor = "configMapPropertiesRetryInterceptor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are overriding this only for the sake of @Retryable. You are doing this as a safety net, correct? So that just in case we ever use locateCollection, it will have that @Retryable? If so, this is a great idea (I would have missed it if it was me doing this PR!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you and yes. I'm overriding locateCollection to mark it with @Retryable.

*/
@Configuration(proxyBeanMethods = false)
@ConditionalOnKubernetesEnabled
@EnableConfigurationProperties({ ConfigMapConfigProperties.class, SecretsConfigProperties.class })
public class KubernetesBootstrapConfiguration {

@ConditionalOnKubernetesConfigPropertiesRetryEnabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a word of appreciation that I liked this pattern: activate the Config for either of the conditionals, but allow to disable them individually. Nice.

@@ -64,6 +67,11 @@ public Fabric8SecretsPropertySource(KubernetesClient client, String name, String

}
catch (Exception e) {
if (failFast) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the concern I have and would like to hear your thoughts. Suppose we are reading ConfigMapA, at the same time we have enabled fail-fast, be that with or without retry, it is irrelevant to the point I am trying to make.

Now let's suppose that this ConfigMapA is read perfectly fine on the first attempt, no question here. At the same time, because of our flow, we will be trying to read ConfigMapA-kubernetes map. This happens because we add the kubernetes profile in the AbstractKubernetesProfileEnvironmentPostProcessor::postProcessEnvironment and in KubernetesClientConfigMapPropertySource::getData we generate and try to read from such a map (same idea applies for fabric8).

Obviously such a ConfigMap is not present, in the vast majority of cases (if not all). We, on the other hand, will fail the application start-up, because of fail-fast. Users are not going to be happy and I guess even surprised.

There are two points here:

  • there is a PR that I introduced that allows you to opt out of such ConfigMapName-<ActiveProfiles> readings, but it is not yet in main nor it will entirely fix this issue here. It allows to opt out of all ConfigMap readings. So if you have a ConfigMapB, you can't say : "opt-out of ConfigMapA-kubernetes, but allow ConfigMapB-kubernetes"; neither do I think we want such granularity in that PR.

  • Initially, I thought that may be we should just ignore every failures related to XXX-<ActiveProfile>, but what if someone already relies on that? And this touches my first point above, of course. What if there are users that have ConfigMapMyApp and ConfigMapMyApp-PROD. When they were deploying to non PROD environments, the application would have started just fine (it only logged an Exception). If we merge this PR, and they enable retry (which is exactly what this PR wants them to do), it will now fail, because ConfigMapMyApp-PROD does not exist in non-prod environments.

I will spent some more time trying to figure out an idea on how to approach this correctly (imho), but of course, your thoughts are absolutely a must too.

Copy link
Contributor Author

@isikerhan isikerhan Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we don't fail when a ConfigMap or a Secret is not found. (Since Kubernetes Java clients do not throw exception on 404) Instead, we create an empty PropertySource as far as I see. So, our application still won't fail just because a ConfigMap or a Secret is missing even we enable fail-fast. Btw, it's obvious that retrying to load nonexistent ConfigMap/Secret is pointless even we have retry enabled.

The question here is: "Should we fail on not found?" IMHO this question belongs to another topic. Actually, I considered adding a failure policy (or a simple boolean flag like required) per config/secrets source like below:

spring:
  cloud:
    kubernetes:
      config:
        fail-fast: true # when this is set to false, all failure-policies will be normalized to "never"
        sources:
         - name: some-config-map
           failure-policy: fail-on-not-found # fails when the config map is not present or any error occurs while loading it
         - name: another-config-map
           failure-policy: fail-on-error # (default) fail only when an error occurs while loading the config map. ignore if it's not found
         - name: optional-config-map
           failure-policy: never # ignore not found or any errors

But the same problem you mentioned above emerges with that design: When do we consider a config source as "not found"? Should we consider a config source as not found when any of the ConfigMaps with names MyConfigMap, MyConfigMap-<profile>, is not found? Or is it enough to say a config source "is found" if we are able to load any of these? I ended up leaving this behavior as-is.

IMHO we can leave resolving these questions to another feature since we don't actually break anything with this PR (if I'm not mistaken). What do you think?

Copy link
Contributor

@wind57 wind57 Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I get when I comment without trying and solely based on my memory. I definitely used to see these errors on start-up, but I do not remember the context anymore :( (may be they were present before we upgraded the fabric8 client) I am really sorry for this comment without actually trying it first - I did now, for both clients and we are good indeed.

Teşekkürler!

I am more than OK with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. And thank you very much for the review!

@wind57
Copy link
Contributor

wind57 commented Oct 11, 2021

I took a look over the weekend (sorry, lack of time lately) and I am OK with the changes. Looks like a very nice PR overall, but mind that my opinion is not very important, as I am just a contributor like you.

@isikerhan
Copy link
Contributor Author

@ryanjbaxter I guess we're good to go. Can you take a look and merge this if everything is OK?

@ryanjbaxter
Copy link
Contributor

Appreciate all the work here from you both @isikerhan and @wind57.

Now that I have some time to breath after getting some of other work on Spring Cloud Kubernetes done I am wondering how you think this functionality works with the Config Server on K8s that I added?

https://docs.spring.io/spring-cloud-kubernetes/docs/2.1.0-SNAPSHOT/reference/html/#spring-cloud-kubernetes-configserver

The config server already has this retry and fail-fast ability built into it. So when you are using the config server on k8s you could enable all this functionality without any of the code in the PR.

Not saying this PR is not useful, but just wanted to get your thoughts.

@isikerhan
Copy link
Contributor Author

isikerhan commented Oct 22, 2021

Appreciate all the work here from you both @isikerhan and @wind57.

Now that I have some time to breath after getting some of other work on Spring Cloud Kubernetes done I am wondering how you think this functionality works with the Config Server on K8s that I added?

https://docs.spring.io/spring-cloud-kubernetes/docs/2.1.0-SNAPSHOT/reference/html/#spring-cloud-kubernetes-configserver

The config server already has this retry and fail-fast ability built into it. So when you are using the config server on k8s you could enable all this functionality without any of the code in the PR.

Not saying this PR is not useful, but just wanted to get your thoughts.

I think we can offer the both options. People may not want to be forced to migrate to SCK Config Server or deploy an extra component just to have retry and fail-fast abilities. IMHO It'd be good to have the ability to retry / fail-fast for SCK config map and secrets property sources as well if these two features don't break each other (they don't as far as i see).

@spencergibb
Copy link
Member

I may be mistaken, but I think this behavior differs significantly from config client/server. With config client/server fail fast happens only if config server could not be contacted. If I understand correctly, this will fail for any property source.

@spencergibb
Copy link
Member

After chatting with @ryanjbaxter I was confused about how to get config maps/secrets using the k8s api, so my concern is not really one.

@ryanjbaxter ryanjbaxter added this to In progress in 2021.0.0-RC1 via automation Oct 26, 2021
@ryanjbaxter ryanjbaxter removed this from In progress in 2020.0.x Oct 26, 2021
@ryanjbaxter ryanjbaxter added this to the 2.1.0-RC1 milestone Oct 26, 2021
@ryanjbaxter ryanjbaxter merged commit d338511 into spring-cloud:main Oct 26, 2021
2021.0.0-RC1 automation moved this from In progress to Done Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2021.0.0-RC1
  
Done
5 participants