Ml/new control #130

Michal-Leszczynski · 2022-02-24T21:02:04Z

No description provided.

transport/cluster.go

session.go

transport/cluster.go

transport/node.go

mmatczuk · 2022-02-25T15:55:14Z

ATM it's quite a mess.

It looks like you are trying to put too much things in one chunk of work and you got lost in your own balagan.

mmatczuk · 2022-02-25T15:59:49Z

All that session stuff is not needed here.

Let's just add the Cluster + events handling, it seems that the control conn can be just Conn.
We would send queries to control from Cluster.

Please do think about exported vs not exported fields.
Please respect object boundaries!!

transport/stream.go

go.sum

transport/cluster.go

transport/conn.go

transport/node.go

transport/cluster.go

mmatczuk · 2022-03-11T10:12:20Z

transport/cluster.go

+	c.refresher <- struct{}{}
+}
+
+func (c *Cluster) StopEventHandling() {


Should't this be called Close?

Is there StartEventHandling

@Michal-Leszczynski there is a fundamental BUG(s) here!!

handleEvents is not prepared for this to close

you are closing something you do not own

conn will panic when you close it

That's true and I actually spotted this bug because of TestClusterIntegration!
I already fixed it, but forgot to delete this useless function. As in the current form, closing control connection makes handling events loop terminate.

See the final statement #130 (review)

mmatczuk

Left some comments inline.

One more general comment use if err := instead of if err = .

There is major flaw in the closing workflow.

Namely if c.control.Close() there is a go routine leak and calling StopEventHandling does not really stop anything can can cause panic when conn tries to handle events.

	if err := c.refreshTopology(); err != nil {
		log.Printf("failed to refresh topology: %s", err.Error())
		c.control.Close()

My suggestion

cluster should have Close() function that would effectively close everything - all the Stop this stop that should not be exported or even removed
cluster::events should be created in NewCluster
cluster::events should be passed to Conn::registerEvents as input parameter - it's fully operated by cluster not conn - better idea later
connReader need special handling of event stream because putting to cluster::events can block
it would be better to pass a function to Conn::registerEvents that would be called on EVENT instead of registering a handler like in normal request sending
with that there is a real full control of cluster::events in Cluster itself
when you put to cluster::events it should be done via select with default namely if event processing is too slow it allows to DROP the event log it and DO NOT BLOCK connReader

Otherwise good piece of code.

Looking forward to next version.

Michal-Leszczynski · 2022-03-12T23:16:59Z

Im not sure whether it's worth to worry about blocking when putting response into cluster::events channel. I would assume that events are rarely received and the chance of blocking the channel is really small (we can set channel size to 1024 and then I would imagine that's pretty much impossible to block it in regular conditions).

But passing cluster::events down to connection and resigning from registering handler seems like a good idea and I will proceed with this approach (and with special event handling in connReader as well).

- cluster is responsible for refreshing topology on it's own and when requested. - cluster handles incoming events. - added simple cluster integration test for creating cluster and handling simple events. - slight changes to event so that it is easier to parse. - connection can now register for events.

- removed useless TODO comments. - nodeStatus isn't a pointer now, removed setters and getters. - moved setEventHandler closer to registerEvents method. - log instead of fatalf in case of parsing unknown event type. - fixed handling responses that were incorrect in terms of interpretation of data (like unsupported event type).

- cluster::events are created in NewCluster. - instead of registering handler for events, now we pass down a function responsible for that. - this function is nonblocking.

- prevented assigning tokens to already created token because of possibility of dirty read. - now closing refresher channel is done when exiting handle events loop since it's the only place where we send on this channel.

Michal-Leszczynski · 2022-03-15T00:25:14Z

Currently we have to separate goroutines in cluster: loop and handleEvents, but for now handleEvents only delegates work to loop or does some minor changes in cluster so it does not seem like it's that important?
In the future we will need to request some metadata from the cluster (also connected to schema change event) and we would like to refresh this data in terms of error or encountering some unknown tables or keyspaces, but this task does not really fall under handleEvent jurisdiction.

My idea is to have another loop responsible for handling refresh metadata requests, just like the loop responsible for handling topology refresh requests. In this scenario handling events is done just by passing some function (which will make some minor changes in cluster or delegate work directly to one of those two loops) to register events.

Another general problem arises from the fact, that if we have two goroutines and each of them wants to reopen control connection in case of error, we have to synchronize it. I see three possible solutions:

some mutex guarding connection
making two connections, one for each loop
making one loop handle reopen control connection request (then one loop will be delegating this work to another)
the last one may seem a little out of place, but is also the least problematic one.

I will try to read more rust to have a better perspective, but I would like to hear your opinion on this approach.

Michal-Leszczynski · 2022-03-15T17:21:01Z

@mmatczuk
I added new ml/delete_event_loop PR to demonstrate my idea.

mmatczuk · 2022-03-15T17:35:26Z

@Michal-Leszczynski

Regarding the passing cluster::events

note that the issue of blocking exists and it needs to be sent with select
please do not make the buffer too deep

Regarding go routines, it is best to have just one (the loop) function and use select to read from many channels.
There is not need for any synchronization then, it's synchronized by message passing.

mmatczuk · 2022-03-15T17:50:55Z

I had a look at the commit with the new approach.

You can use c.handleEvent as input, there is no need to necessarily return the new function
I like the idea with type of action because there is a single channel then - the code is easier
Yet still Conn and Cluster should communicate by message passing

mmatczuk · 2022-03-15T17:52:34Z

Overall I think that many channels and single loop is more idiomatic and easier to maintain.

Michal-Leszczynski · 2022-03-15T23:06:52Z

With many channels we can set their buff length to 1, so that multiples of the same request (like refresh topology) will be handled by just one refresh topology action.

mmatczuk · 2022-03-16T09:09:34Z

That is correct.

Michal-Leszczynski · 2022-03-16T16:35:33Z

Corrected, squashed and rebased onto main version is on ml/delete_evet_loop PR. Can we merge it from there?

mmatczuk · 2022-03-18T09:45:42Z

Yes.

Michal-Leszczynski requested review from Choraden and Neofine February 24, 2022 21:02