Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ml/new control #130

Closed
wants to merge 11 commits into from
Closed

Ml/new control #130

wants to merge 11 commits into from

Conversation

Michal-Leszczynski
Copy link
Collaborator

No description provided.

transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/node.go Outdated Show resolved Hide resolved
@mmatczuk
Copy link
Contributor

ATM it's quite a mess.

It looks like you are trying to put too much things in one chunk of work and you got lost in your own balagan.

@mmatczuk
Copy link
Contributor

All that session stuff is not needed here.

Let's just add the Cluster + events handling, it seems that the control conn can be just Conn.
We would send queries to control from Cluster.

Please do think about exported vs not exported fields.
Please respect object boundaries!!

transport/stream.go Outdated Show resolved Hide resolved
go.sum Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/conn.go Outdated Show resolved Hide resolved
transport/node.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
transport/cluster.go Outdated Show resolved Hide resolved
c.refresher <- struct{}{}
}

func (c *Cluster) StopEventHandling() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should't this be called Close?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there StartEventHandling

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michal-Leszczynski there is a fundamental BUG(s) here!!

  • handleEvents is not prepared for this to close
  • you are closing something you do not own
  • conn will panic when you close it

Copy link
Collaborator Author

@Michal-Leszczynski Michal-Leszczynski Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true and I actually spotted this bug because of TestClusterIntegration!
I already fixed it, but forgot to delete this useless function. As in the current form, closing control connection makes handling events loop terminate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the final statement #130 (review)

Copy link
Contributor

@mmatczuk mmatczuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments inline.

One more general comment use if err := instead of if err = .

There is major flaw in the closing workflow.

Namely if c.control.Close() there is a go routine leak and calling StopEventHandling does not really stop anything can can cause panic when conn tries to handle events.

	if err := c.refreshTopology(); err != nil {
		log.Printf("failed to refresh topology: %s", err.Error())
		c.control.Close()

My suggestion

  • cluster should have Close() function that would effectively close everything - all the Stop this stop that should not be exported or even removed
  • cluster::events should be created in NewCluster
  • cluster::events should be passed to Conn::registerEvents as input parameter - it's fully operated by cluster not conn - better idea later
  • connReader need special handling of event stream because putting to cluster::events can block
  • it would be better to pass a function to Conn::registerEvents that would be called on EVENT instead of registering a handler like in normal request sending
  • with that there is a real full control of cluster::events in Cluster itself
  • when you put to cluster::events it should be done via select with default namely if event processing is too slow it allows to DROP the event log it and DO NOT BLOCK connReader

Otherwise good piece of code.

Looking forward to next version.

@Michal-Leszczynski
Copy link
Collaborator Author

Michal-Leszczynski commented Mar 12, 2022

Im not sure whether it's worth to worry about blocking when putting response into cluster::events channel. I would assume that events are rarely received and the chance of blocking the channel is really small (we can set channel size to 1024 and then I would imagine that's pretty much impossible to block it in regular conditions).

But passing cluster::events down to connection and resigning from registering handler seems like a good idea and I will proceed with this approach (and with special event handling in connReader as well).

- cluster is responsible for refreshing topology on it's own and when requested.
- cluster handles incoming events.
- added simple cluster integration test for creating cluster and handling simple events.
- slight changes to event so that it is easier to parse.
- connection can now register for events.
- removed useless TODO comments.
- nodeStatus isn't a pointer now, removed setters and getters.
- moved setEventHandler closer to registerEvents method.
- log instead of fatalf in case of parsing unknown event type.
- fixed handling responses that were incorrect
in terms of interpretation of data (like unsupported event type).
- cluster::events are created in NewCluster.
- instead of registering handler for events, now we pass down a function responsible for that.
- this function is nonblocking.
- prevented assigning tokens to already created token
because of possibility of dirty read.
- now closing refresher channel is done when exiting
handle events loop since it's the only place where we send on this channel.
@Michal-Leszczynski
Copy link
Collaborator Author

Currently we have to separate goroutines in cluster: loop and handleEvents, but for now handleEvents only delegates work to loop or does some minor changes in cluster so it does not seem like it's that important?
In the future we will need to request some metadata from the cluster (also connected to schema change event) and we would like to refresh this data in terms of error or encountering some unknown tables or keyspaces, but this task does not really fall under handleEvent jurisdiction.

My idea is to have another loop responsible for handling refresh metadata requests, just like the loop responsible for handling topology refresh requests. In this scenario handling events is done just by passing some function (which will make some minor changes in cluster or delegate work directly to one of those two loops) to register events.

Another general problem arises from the fact, that if we have two goroutines and each of them wants to reopen control connection in case of error, we have to synchronize it. I see three possible solutions:

  • some mutex guarding connection
  • making two connections, one for each loop
  • making one loop handle reopen control connection request (then one loop will be delegating this work to another)
    the last one may seem a little out of place, but is also the least problematic one.

I will try to read more rust to have a better perspective, but I would like to hear your opinion on this approach.

@Michal-Leszczynski
Copy link
Collaborator Author

@mmatczuk
I added new ml/delete_event_loop PR to demonstrate my idea.

@mmatczuk
Copy link
Contributor

@Michal-Leszczynski

Regarding the passing cluster::events

  • note that the issue of blocking exists and it needs to be sent with select
  • please do not make the buffer too deep

Regarding go routines, it is best to have just one (the loop) function and use select to read from many channels.
There is not need for any synchronization then, it's synchronized by message passing.

@mmatczuk
Copy link
Contributor

I had a look at the commit with the new approach.

  • You can use c.handleEvent as input, there is no need to necessarily return the new function
  • I like the idea with type of action because there is a single channel then - the code is easier
  • Yet still Conn and Cluster should communicate by message passing

@mmatczuk
Copy link
Contributor

Overall I think that many channels and single loop is more idiomatic and easier to maintain.

@Michal-Leszczynski
Copy link
Collaborator Author

With many channels we can set their buff length to 1, so that multiples of the same request (like refresh topology) will be handled by just one refresh topology action.

@mmatczuk
Copy link
Contributor

That is correct.

@Michal-Leszczynski
Copy link
Collaborator Author

Corrected, squashed and rebased onto main version is on ml/delete_evet_loop PR. Can we merge it from there?

@mmatczuk
Copy link
Contributor

Yes.

@mmatczuk mmatczuk closed this Mar 18, 2022
@Kulezi Kulezi deleted the ml/newControl branch August 30, 2022 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants