Skip to content

Conversation

eberlep
Copy link
Collaborator

@eberlep eberlep commented Jul 27, 2022

No description provided.

@eberlep eberlep linked an issue Jul 27, 2022 that may be closed by this pull request
@eberlep
Copy link
Collaborator Author

eberlep commented Sep 1, 2022

Another idea: For standby databases, select the first pod directly and don't use and spilo-role label. In a perfect world, standby databases should consist of a single instance anyway (which, by definition, will always be called ...-0). If there happen to be more instances, we would not round robin between those (as would be the case when simply selecting application=spilo pods), and if one would downscale the standby, existing connections to it would be kept open.

@eberlep
Copy link
Collaborator Author

eberlep commented Sep 1, 2022

Tested with Database with 2 Pods

 $ kgp -o wide
NAME                                 READY   STATUS    RESTARTS   AGE   IP            NODE           NOMINATED NODE   READINESS GATES   SPILO-ROLE
pgfits-philipptest5b769db3d-0        3/3     Running   0          39m   10.244.2.38   shoot--t6zvn   <none>           <none>            replica
pgfits-philipptest5b769db3d-1        3/3     Running   0          34m   10.244.2.39   shoot--t6zvn   <none>           <none>            master
postgres-operator-67565dcbc6-6dqs2   1/1     Running   4          21h   10.244.2.7    shoot--t6zvn   <none>           <none> 
 $ k exec -c postgres -ti pgfits-philipptest5b769db3d-0 -- su postgres -c "patronictl list"                                                                                                                                                                                                                         
+ Cluster: pgfits-philipptest5b769db3d (7138085200722174034) --+---------+----+-----------+                                                                                                                                                                                                                                                                                                    
| Member                        | Host        | Role           | State   | TL | Lag in MB |                                                                                                                                                                                                                                                                                                    
+-------------------------------+-------------+----------------+---------+----+-----------+                                                                                                                                                                                                                                                                                                    
| pgfits-philipptest5b769db3d-0 | 10.244.2.38 | Sync Standby   | running | 18 |         0 |                                                                                                                                                                                                                                                                                                    
| pgfits-philipptest5b769db3d-1 | 10.244.2.39 | Standby Leader | running | 18 |           |                                                                                                                                                                                                                                                                                                    
+-------------------------------+-------------+----------------+---------+----+-----------+

With ENABLE_LEGACY_STANDBY_SELECTOR set to true:

Pointing to pod -1, which internally is the StandbyLeader

$ k describe svc pgfits-philipptest5b769db3d-external 
[..]
Selector:                 application=spilo,cluster-name=pgfits-philipptest5b769db3d,spilo-role=master,team=pgfits
[...]
Endpoints:                10.244.2.39:5432

With ENABLE_LEGACY_STANDBY_SELECTOR set to false / removed:

Pointing to pod -0, which in this case is not the StandbyLeader (if it were a Replica, there could even be some lag). When downscaling, no reconnect is happening. After downscaling, all is correct.

 $ k describe svc pgfits-philipptest5b769db3d-external 
[..]
Selector:                 application=spilo,cluster-name=pgfits-philipptest5b769db3d,statefulset.kubernetes.io/pod-name=pgfits-philipptest5b769db3d-0,team=pgfits
[..]
Endpoints:                10.244.2.38:5432

After promotion:

Pointing to pod -1, statefulset.kubernetes.io/pod-name successfully removed from selector.

 $ k describe svc pgfits-philipptest5b769db3d-external 
[..]
Selector:                 application=spilo,cluster-name=pgfits-philipptest5b769db3d,spilo-role=master,team=pgfits
[..]
Endpoints:                10.244.2.39:5432

@eberlep eberlep marked this pull request as ready for review September 1, 2022 14:24
@eberlep eberlep merged commit 5b05650 into main Oct 14, 2022
@eberlep eberlep deleted the broaden-svc-selector-2 branch June 1, 2023 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broaden Service Selector
1 participant