Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Service Catalog interop : idempotent on duplicate create service #17

Closed
poblin-orange opened this issue Feb 7, 2020 · 6 comments · Fixed by #27
Closed

Kubernetes Service Catalog interop : idempotent on duplicate create service #17

poblin-orange opened this issue Feb 7, 2020 · 6 comments · Fixed by #27
Labels
bug Something isn't working enhancement New feature or request

Comments

@poblin-orange
Copy link
Member

poblin-orange commented Feb 7, 2020

Expected behavior

Expecting Openshift service catalog to be able to consume services fronted by OSBCMDB.

Observed behavior

Openshift issues multiple PUT to create a service instance. Seems compliant with OSB spec.

.kubernetes-retired/service-catalog#1639

The broker rejects the duplicate PUT, giving provisionning error on OpenShift v3.9.51 ( Kubernetes v1.9.1+a0ce1bc657 )

cc @pdechamboux

@poblin-orange poblin-orange added bug Something isn't working enhancement New feature or request labels Feb 7, 2020
@gberche-orange
Copy link
Member

Symptom when concurrent provisionning request is sent by K8S

   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT org.cloudfoundry.client.v2.ClientV2Exception: CF-ServiceInstanceNameTaken(60002): The service instance name is taken: 95a4a8e5-52df-4810-82f0-d04366efbd5c
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at org.cloudfoundry.reactor.util.ErrorPayloadMapper.lambda$null$0(ErrorPayloadMapper.java:46) ~[cloudfoundry-client-reactor-3.16.0.RELEASE.jar!/:na]
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at org.cloudfoundry.reactor.util.ErrorPayloadMapper.lambda$null$9(ErrorPayloadMapper.java:98) ~[cloudfoundry-client-reactor-3.16.0.RELEASE.jar!/:na]
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:118) ~[reactor-core-3.2.12.RELEASE.jar!/:3.2.12.RELEASE]
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.onNext(FluxOnAssembly.java:345) ~[reactor-core-3.2.12.RELEASE.jar!/:3.2.12.RELEASE]
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67) ~[reactor-core-3.2.12.RELEASE.jar!/:3.2.12.RELEA

@gberche-orange
Copy link
Member

gberche-orange commented Feb 21, 2020

Desired approach:

  • if there is a provisioning in progress with same id then
    • look up the service instance's plan/service/params / status
    • if not same, return 409 (conflict)
    • otherwise return 202 accepted
  • otherwise
    • look up the service instance's plan/service/params / status
    • if not same, return 409 (conflict)
    • otherwise return 200

This is costly to do because the StateRepository is not currently accessible to CloudFoundryDeployer but only the WorkflowServiceInstanceService

Alternative more affordable degraded approach

  • if there is an operation (hopefully create, but also update or delete) in progress with same id then
    • return 202 accepted (step 1)
  • otherwise try create a new service instance
    • if creation fails due to existing service instance, fail (instead of returning 200 OK) (step 2)
    • if creation fails due to other error, fail
    • otherwise return 202 accepted

=> we sacrifice the cases where

  • a custom OSB client is submitting duplicate requests and will receive 202 accepted instead of 409 conflict or 200 OK

@gberche-orange
Copy link
Member

gberche-orange commented Feb 26, 2020

Pb with initial implementation in ec163e7: race condition between servlet threads (reading the state repository) and reactor threads (writing to state repository async)

Alternatives:

    1. move the state repository write to the servlet thread
    1. don't use service instance state repository

Option 1) was successfully tested on openshift

@gberche-orange
Copy link
Member

gberche-orange commented Feb 27, 2020

Same problem arises with service bindings:

  • openshift creates dup service binding requests
  • osb-cmdb successfully returns the 1st service binding with a 200 and credentials
  • osb-cmdb fails to create a duplicate service key, polutes logs and fails with a 500 error
  • openshift returns a sucessful service binding.
org.cloudfoundry.client.v2.ClientV2Exception: CF-ServiceKeyNameTaken(360001): The service key name is taken: a6c72b10-9749-4395-b257-bfab24c49d53

Log pollution of level ERROR:

19:02:33.230: [APP/PROC/WEB.0] 2020-02-26 18:02:33.229 ERROR 27 --- [ry-client-nio-3] o.s.c.appbroker.deployer.DeployerClient  : Error creating backing service key a6c72b10-9749-4395-b257-bfab24c49d53 for service 6a9aac92-a6e4-4141-a1c9-15dc0cf577bb with error 'CF-ServiceKeyNameTaken(360001): The service key name is taken: a6c72b10-9749-4395-b257-bfab24c49d53'
[...]
19:02:33.238: [APP/PROC/WEB.0] 2020-02-26 18:02:33.235 ERROR 27 --- [ry-client-nio-3] d.DefaultBackingServicesProvisionService : Error creating backing services keys [BackingServiceKey{serviceInstanceName='6a9aac92-a6e4-4141-a1c9-15dc0cf577bb', name='a6c72b10-9749-4395-b257-bfab24c49d53', plan='null', parameters={}, properties={keepTargetSpaceOnDelete=true, target=p-mysql}, parametersTransformers=[], rebindOnUpdate=false}] with error 'CF-ServiceKeyNameTaken(360001): The service key name is taken: a6c72b10-9749-4395-b257-bfab24c49d53'

19:02:33.238: [APP/PROC/WEB.0] 2020-02-26 18:02:33.237 ERROR 27 --- [ry-client-nio-3] ppDeploymentCreateServiceBindingWorkflow : Error creating backing services keysfor p-mysql/10mb with error 'CF-ServiceKeyNameTaken(360001): The service key name is taken: a6c72b10-9749-4395-b257-bfab24c49d53'

19:02:33.245: [APP/PROC/WEB.0] 2020-02-26 18:02:33.244 ERROR 27 --- [ry-client-nio-3] s.c.ServiceBrokerWebFluxExceptionHandler : Unknown exception handled: 

The service binding specs mentions at https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md#binding in the response-6 section:

Status Code Description
200 OK SHOULD be returned if the Service Binding already exists and the requested parameters are identical to the existing Service Binding. The expected response body is below.
201 Created MUST be returned if the Service Binding was created as a result of this request. The expected response body is below.
202 Accepted MUST be returned if the binding is in progress. The operation string MUST match that returned for the original request. This triggers the Platform to poll the Polling Last Operation for Service Bindings endpoint for operation status. Information regarding the Service Binding (i.e. credentials) MUST NOT be returned in this response. Note that a re-sent PUT request MUST return a 202 Accepted, not a 200 OK, if the Service Binding is not yet fully created.

However, the osb-cmdb service binding are not yet asynchronous (nor is there a scab support for async service bindings), hence returning a 202 for dup binding requests isn't an option.

Possible alternatives:

  1. Keep returning a 500 error and try to reduce log pollution
    1. Avoid creating duplicate service keys using the state repository to detect inflight requests
    2. throw a ServiceBrokerException with a helpful diagnostic message
    3. Just add error handling for org.cloudfoundry.client.v2.ClientV2Exception: CF-ServiceKeyNameTaken(360001)
  2. Keep returning a 500 error and ignore log pollution

Opted to 2) Keep returning a 500 error and ignore log pollution

@gberche-orange
Copy link
Member

gberche-orange commented Feb 28, 2020

See related openshift 3.9 documentation at https://docs.openshift.com/container-platform/3.9/architecture/service_catalog/index.html

Pointers and procedure to refresh service catalog in Openshift:

Procedure to register a service broker with basic auth (see inspiration)

# enable oc completion
$ source <(oc completion bash)

$ kubectl create secret generic osb-cmdb-0-auth \
--from-literal username=redacted-user \
--from-literal password=redacted-password

# Alternatively in a spec file as documented into https://kubernetes.io/docs/concepts/configuration/secret/#creating-a-secret-manually

$ cat secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: osb-cmdb-0-auth
type: Opaque
data:
  username: *****=
  password: ********
$ kubectl apply -f ./secret.yaml
secret "osb-cmdb-0-auth" created

# Seems to fail to make the broker appear in openshift UI
$ svcat register osb-cmdb-0-broker \
--url https://ocb-cmdb-0.redacted-domain \
--scope cluster \
--basic-secret osb-cmdb-0-auth
--skip-tls

# Alternatively
$ cat broker.yml
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ClusterServiceBroker
metadata:
  name: osbcmdb-broker
spec:
  url: https://osb-cmdb-broker-0.redacted-domain
  insecureSkipTLSVerify: true
  authInfo:
    basic:
      secretRef:
        namespace: interco-kermit-fp
        name: osb-cmdb-0-auth
$ oc create -f broker.yml
clusterservicebroker "osbcmdb-broker" created

Other diagnostic commands

$ svcat get brokers 
$ svcat describe broker osb-cmdb-0-broker
[...]
Status:   Ready - Successfully fetched catalog entries from broker @ 2020-04-09 15:35:47 +0000 UTC   

$ svcat get classes

# Errors
$ svcat get class p-mysql-osb --scope all
Error: class 'p-mysql-osb' not found in cluster scope

# works, filter by date
$ oc get ClusterServiceClass
[...]
e1c17bcf-011d-4a34-9ee3-357a28cd7c77   15h
f8fdfaa9-fdc1-4b68-8791-e55a1419a360   15h

# corresponds to p-mysql-osb"
$ oc get ClusterServiceClass --output json f8fdfaa9-fdc1-4b68-8791-e55a1419a360
[...]
"externalName": "p-mysql-osb",

svcat get plans
svcat get plans -o json

# troubleshoot invalid auth in secrets
$ oc get Secret osbcmdb-broker-5-auth-secret --namespace=interco-kermit-fp-demo --output=yaml | grep password
  password: redacted-base64-password
$ echo -n "redacted-base64-password" | base64 --decode | hexdump 

Cleanup/recovery/purge commands: See related kubernetes-retired/service-catalog#2268

$ svcat get instances 
      NAME                  NAMESPACE               CLASS    PLAN                   STATUS                   
+---------------+--------------------------------+---------+------+-----------------------------------------+
  p-mysql-ffkv6   cloudfoundry-service-instances   p-mysql   10mb   DeprovisionBlockedByExistingCredentials  
  p-mysql-wbv8r   cloudfoundry-service-instances   p-mysql   10mb   DeprovisionBlockedByExistingCredentials  

$ svcat deprovision p-mysql-ffkv6 --abandon 
This action is not reversible and may cause you to be charged for the broker resources that are abandoned. If you have any bindings for this instance, please delete them manually with svcat unbind --abandon --name bindingName
Are you sure? [y|n]: 
y
Error: deprovision request failed (serviceinstances.servicecatalog.k8s.io "p-mysql-ffkv6" not found)

$ svcat get instances 
      NAME                  NAMESPACE               CLASS    PLAN                   STATUS                   
+---------------+--------------------------------+---------+------+-----------------------------------------+
  p-mysql-wbv8r   cloudfoundry-service-instances   p-mysql   10mb   DeprovisionBlockedByExistingCredentials

@gberche-orange
Copy link
Member

Started an upstream PR in spring-cloud/spring-cloud-app-broker#343 but there is remaining effort needed to diagnose test failure. Pausing for now as I'm out of budget for this upstream PR.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants