Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Remove the loadbalanced CQL service and document an alternative way to connect. #278

Merged
merged 1 commit into from
Mar 13, 2018

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Mar 8, 2018

In no particular order:

  • Removed the CQL service.
  • Added a headless nodes service.
  • Renamed the seedprovider service to seeds.
  • Added documentation explaining the two headless services and how to connect a CQL client.
  • Refactored the two headless services into a single service control
  • Removed the service update code, which was not tested and which added unecessary complication.
  • Navigator will now only touch the services if they are missing.
  • Updated the E2E tests so that cql_connect always attempts to connect to the nodes service.
  • Removed the ServiceName from the NodePool statefulsets because it no longer made sense with multiple NodePools / StatefulSets. The SS servicename is supposed to match a service dedicated to that statefulset, not a single seedprovider service. We probably should dynamically create a service for each nodepool.
  • Removed the, also broken, CASSANDRA_SEEDS configuration which was pointing seedProviderServiceName rather than at a service name matching the name of the statefulset.
  • In E2E tests Reverted to a better mechanism for simulating node failure. Decommission leaves the node in a decommissioned state causing the C* process to immediately exit.
  • Added CONSISTENCY checks to the E2E CQL queries to verify that both C* nodes are reachable and have the test data.

Fixes: #232

Release note:

NONE

@wallrj wallrj changed the title WIP: Remove CQL service Remove the loadbalanced CQL service and document an alternative way to connect. Mar 9, 2018
@munnerz
Copy link
Contributor

munnerz commented Mar 13, 2018

Removed the, also broken, CASSANDRA_SEEDS configuration which was pointing seedProviderServiceName rather than at a service name matching the name of the statefulset.

Perhaps I misunderstand this bullet - when a node bootstraps, surely it should (attempt to) connect to all seeds in the cluster, not just its own node pool, in order to start?

@wallrj
Copy link
Member Author

wallrj commented Mar 13, 2018

In #278 (comment) @munnerz wrote:

Perhaps I misunderstand this bullet - when a node bootstraps, surely it should (attempt to) connect to all seeds in the cluster, not just its own node pool, in order to start?

There are two things:

  • As far as we can see, this environment variable is obsolete, now that we're using the Kubernetes seed provider. The CASSANDRA_SEEDS env variable results in a Cassandra configuration stanza which is specifically for the SimpleSeedProvider.
  • The StatefulSet.Spec.ServiceName had previously been set to a single, cluster wide service name. So when there are multiple node pools and pods from multiple statefulsets, the DNS names no longer reflect the controlling statefulset...instead they'd be something like euwest1a-0.cass-seeds.my-namespace.srv.cluster.local which works (somehow)....but
  • The domain name label cass-seeds is no longer ideal, because there will only be DNS names for pods that have been labelled as seeds.
  • Instead we want stable DNS names for all the pods in all statefulsets, not just for the seed pods.

I think we should follow this up with a branch that enhances CreateNodePool action so that it also creates a headless service per nodepool....and that dynamically created service can be the name we use in the statefulset.spec.serviceName

@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

…o connect.

In no particular order:
* Removed the CQL service.
* Added a headless `nodes` service.
* Renamed the `seedprovider` service to `seeds`.
* Added documentation explaining the two headless services and how to connect a CQL client.
* Refactored the two headless services into a single service control
* Removed the service update code, which was not tested and which added unecessary complication.
* Navigator will now only touch the services if they are missing.
* Updated the E2E tests so that `cql_connect` always attempts to connect to the `nodes` service.
* Removed the ServiceName from the NodePool statefulsets because it no longer made sense with multiple NodePools / StatefulSets. The SS servicename is supposed to match a service dedicated to that statefulset, not a single seedprovider service. We probably should dynamically create a service for each nodepool.
* Removed the, also broken, CASSANDRA_SEEDS configuration which was pointing `seedProviderServiceName` rather than at a service name matching the name of the statefulset.
* In E2E tests Reverted to a better mechanism for simulating node failure. Decommission leaves the node in a decommissioned state causing the C* process to immediately exit.
* Added CONSISTENCY checks to the E2E CQL queries to verify that both C* nodes are reachable and have the test data.

Fixes: jetstack#232
Copy link
Contributor

@kragniz kragniz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

session = cluster.connect()
rows = session.execute('SELECT ... FROM ...')
for row in rows:
print row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not python3 :o

@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kragniz

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@jetstack-ci-bot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@jetstack-ci-bot
Copy link
Contributor

Automatic merge from submit-queue.

@jetstack-ci-bot jetstack-ci-bot merged commit 3a7e381 into jetstack:master Mar 13, 2018
@wallrj wallrj deleted the 232-remove-cql branch March 13, 2018 19:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants