Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUBY-1563 Implement polling SRV records for mongos discovery #1276

Closed
wants to merge 1 commit into from

Conversation

saghm
Copy link
Contributor

@saghm saghm commented Feb 22, 2019

As part of this work, I abstracted some of the SRV polling functionality previously in Mongo::URI::SRVProtocol into the Mongo::SRV module to facilitate reuse in the monitoring

@saghm saghm marked this pull request as ready for review February 22, 2019 21:32
Copy link
Contributor

@p-mongo p-mongo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be user documentation written stating that the driver now polls SRV records for changes.

begin
@records = @resolver.get_records(@records.hostname)
rescue Resolv::ResolvTimeout => e
log_warn("Timed out trying to resolve hostname #{@records.hostname}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also include the exception class and message in these warnings.

# limitations under the License.

module Mongo
module SRV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we decided to camel case module names like this, Srv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be inconsistent with existing user-facing APIs, which we can't change without breaking backwards compatibility:

class NoSRVRecords < Error; end

def initialize(options, monitoring, cluster)
super(options, monitoring, cluster)

unless options[:srv_records].nil?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option should be documented.

module Mongo
module SRV

class Monitor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and all other classes need to either be marked api private or have docstrings written for them.

@p-mongo
Copy link
Contributor

p-mongo commented Feb 22, 2019

There need to be tests written that adding and removing hosts via srv polling publishes correct sdam events. In particular, there must be topology change events published which I don't believe will currently happen when removing hosts.

Per the spec language, the hosts should be added first and removed second.

https://github.com/mongodb/specifications/blob/master/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.rst

self
end

def validate_record!(record_host, hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I see spec tests for various host names, I specifically would like to also see unit tests for this method when it is given record_host and host_name with and without trailing dots. For example, something like foo.bar. and bar.

Can you please also add a docstring explaining what this method is supposed to do as it is impossible to tell from the signature.

Copy link
Contributor Author

@saghm saghm Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is copied verbatim from Mongo::URI::SRVProtocol; it implements the requirement from the initial seedlist discovery that the driver should error when the SRV record returns a record with a different domain name (in the fifth paragraph of this section):

Drivers MUST raise an error and MUST NOT initiate a connection to any returned host name which does not share the same {domainname}.

@p-mongo
Copy link
Contributor

p-mongo commented Feb 22, 2019

@saghm Can you please create a separate PR with this change only:

I abstracted some of the SRV polling functionality previously in Mongo::URI::SRVProtocol into the Mongo::SRV module to facilitate reuse in the monitoring

@p-mongo
Copy link
Contributor

p-mongo commented Feb 23, 2019

Per #1280, monitoring_io: false should result in the srv monitoring being omitted.

I am not seeing where in the diff the monitors are closed. Killing threads in finalizers is insufficient.

Also, monitors are attached to topologies but we consider topologies lightweight (as well as immutable) and replace them any time a server description changes, for example. This means 1) adding a server to a sharded topology would recreate the monitor, possibly creating some difficult to diagnose interactions, and 2) there are going to be a ton of monitors alive for any given client that does experience multiple sdam events.

@p-mongo
Copy link
Contributor

p-mongo commented Feb 23, 2019

The spec also requires that srv rescans are done for unknown topologies.

@p-mongo
Copy link
Contributor

p-mongo commented Feb 24, 2019

Server addition and removal in the SDAM flow is done under the SDAM flow lock, therefore when adding/removing servers as part of srv monitoring these operations either need to be done under the same lock or there needs to be an explanation added why this locking is not required.

@saghm
Copy link
Contributor Author

saghm commented Feb 25, 2019

Per the spec language, the hosts should be added first and removed second.

You linked to the entire spec, so I'm not sure which part you're talking about. I don't see anything specifying the order of hosts being added and removed, and I don't see why it would matter.

I am not seeing where in the diff the monitors are closed. Killing threads in finalizers is insufficient.

What do you suggest instead? Killing a thread to stop it from running seems pretty standard to me.

@p-mongo
Copy link
Contributor

p-mongo commented Feb 25, 2019

The spec says:

For all verified host names, as returned through the DNS SRV query, the driver:

    MUST add each valid new host to the topology as Unknown
    MUST remove all hosts that are part of the topology, but are no longer in the returned set of valid hosts

Furthermore, we currently consider a topology with no servers (addresses) in it a terminal state. Therefore:

  1. The logic that reports that the empty topology is a terminal state should be audited to see if it needs to be updated for SRV monitoring. Since SRV monitoring must happen for unknown topologies, it seems to me that an empty topology is now a state that the driver can transition out of.
  2. There need to be tests added that make the topology empty and then non-empty via SRV monitoring.

@saghm
Copy link
Contributor Author

saghm commented Feb 25, 2019

PR for abstracting SRV behavior: #1285

@p-mongo
Copy link
Contributor

p-mongo commented Sep 12, 2019

Superseded by #1441.

@p-mongo p-mongo closed this Sep 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants