Skip to content
This repository has been archived by the owner on Apr 15, 2018. It is now read-only.

Consul support #1

Closed
juanjovazquez opened this issue Oct 17, 2015 · 28 comments
Closed

Consul support #1

juanjovazquez opened this issue Oct 17, 2015 · 28 comments

Comments

@juanjovazquez
Copy link
Contributor

Excellent work!. Thanks for sharing this. Do you have any plans to support consul in addition to etcd in the near future?. Would you accept PRs in this regard?.

@hseeberger
Copy link
Owner

@juanjovazquez, thanks for your kind words. So far have no big plans, but PRs are always very welcome!

In order to support different coordination services I think we need to abstract over the actual interactions with etcd, consul and potential others – we probably should split the project into conscructr-core, constructr-etcd, constructr-consul, etc.

I'm not familiar with consul. Do you think the current design of the state machine would also work for consul? In particular, does it offer some means of locking, e.g. via a CAS write (like etcd)?

@juanjovazquez
Copy link
Contributor Author

I think this should be feasible, but please let me check this out more carefully. I will keep you posted on the progress by means of this issue.

@hseeberger
Copy link
Owner

Yeah, let's use this issue to discuss possible approaches.

@juanjovazquez
Copy link
Contributor Author

@hseeberger, regarding this effort, I've started trying to get the same outcomes as with the etcd based code. In order to remain simple and to make easier the compare and constrast task between etcd and consul, I've just copy-pasted the ConstructrMachine module and modified to carry out the same things but the consul way. So, please sorry for code duplication and other ugly stuff you can probably identify.

Consul is more fine-grained than etcd when it comes to solve the use cases, i.e. locks, mutex, semaphores, leader election, etc. For example, in order to create a lock in the same way you did for the Locking stage, I had to carry out the following operations:

(1) Create a session with a ttl and delete behaviour . So, in this case, the ttl is linked to the session and not to the key, as it is in the case of etcd. Besides, the delete behaviour sets the behaviour that will be implemented when the session is associated with a key, i.e. the key will be deleted after the ttl.

(2) Create the lock key pointing out that this key will be acquired by the previously created session. No other session will be able to acquire this key until the session expires or is released. It's important to say that nothing prevents other process modifies this key not using the acquire parameter, but every other attempt to acquire the key with other session will fail.

This locking session will never be renewed so that the session will be deleted after the ttl.

A similar workflow is carried out during the AddingSelfstage where the key with the member address is acquired by a new session (not the same as locking stage) but in this case, we need to keep the session-id in the state data in order to be able to use it later for renewal purposes during the Refreshing stage. That's the reason why I modified the type of the state to be the following:

type StateData = (List[Address], Option[String])

where the optional second element in the tuple references this session-id.

I'm not used to FSM's so I've probably made mistakes but I didn't want to continue until know your opinion about the convenience of a code refactoring to accommodate both coordination services.

@hseeberger
Copy link
Owner

Thanks a lot. I'm currently at a conference, but will take a look later.

@hseeberger
Copy link
Owner

I took a quick look and noticed two things: First your approach of copying makes it terribly hard to see what has been changed. May I suggest you simply change the ConstructrMachine so it's possible to look at a diff? Second, I think only matching OK is not enough.

@juanjovazquez
Copy link
Contributor Author

Of course, you're right. I'll propose you a new version ASAP. Thanks!.

@hseeberger
Copy link
Owner

No worries, your contribution is highly appreciated!

@juanjovazquez
Copy link
Contributor Author

@hseeberger I hope you can explore it easier now. Please, feel free to ask me for any other improvement. Thanks in advance.

@hseeberger
Copy link
Owner

Thanks a lot. It's a pity that using Consul is so much harder than etcd. Before moving on we should add tests. The existing tests require a running etcd, I have to change that ... but in the meantime why don't you mimic the existing tests for Consul.

BTW, I think you don't have to go through rapture-json, i.e. can marshal/unmarshal directly. But we can figure that out later.

@hseeberger
Copy link
Owner

@juanjovazquez, just a heads-up that I'm currently working on supporting different clustered systems like Cassandra. This adds another dimension, hence supporting different coordination services (like Consul) will be affected and might probably be postponed.

@juanjovazquez
Copy link
Contributor Author

@hseeberger, no problem, I can always close my own 'private' versions to solve my more urgent use cases. As far as this project is concerned, what I'd like to know is whether the support of different coordination services is appealing to you or not. If so, I'm willing to work on it or even other members of my team. In that case, it would be important to me that you to participate in defining the kind of abstractions to carry out, the project structure and so on. I wouldn't like to force you in some direction that you don't want to take.

On the other hand, I'll work in the integration testing part ASAP. Regarding JSON parsing, I'm using your json marshaller in the context of some internal projects. I love it and I suppose that it's what you mean when talk about not sticking to rapture, isn't it?.

@hseeberger
Copy link
Owner

@juanjovazquez, supporting different coordination services is definitely appealing and getting contributions (like yours) even more. The only problem I'm facing right now is the very early state of this project and the numerous axes it could/should evolve which impose complexity that needs to be managed.

Please continue working on Consul support, in particular the tests. Once we know it's working, we can think about common abstractions.

Regarding JSON I'm using Rapture in addition to the standard spray-json-marshalling from Akka HTTP, because etcd returns nastily verbose objects which I didn't want to rebuild as case classes. Not sure if this is a good idea, though. In your case it looked to me that Rapture isn't adding a lot of value because the JSON returned by Consul seems to be pretty close to the "domain objects".

@hseeberger
Copy link
Owner

@juanjovazquez, I have introduced an abstraction for coordination backends like etcd and Consul. Please take a look at #5. Ideally you would implement Coordination for Consul and the tests of the constructr-akka module still work.

@juanjovazquez
Copy link
Contributor Author

@hseeberger, great!. We're working in the testing part for Consul but we'll update our fork first in order to take advantage of this abstraction. I've had to involve a member of my team in this effort and he's needing some time to catch up but I think we'll be able to propose a working Consul based version in the next days. Thanks!.

@hseeberger
Copy link
Owner

@juanjovazquez, sounds great! At this early stage the code isn't documented at all, so if you or your coworker have questions, don't hesitate to ask.

@juanjovazquez
Copy link
Contributor Author

@hseeberger, sure we'll have!. Thanks!

@hseeberger
Copy link
Owner

@juanjovazquez, I pushed some more changes and I'm really happy that the introduced abstractions (coordination, machine) work for Akka and Cassandra! I'm pretty sure we need to fine tune some details, but the overall design seems to work. Looking forward to seeing Consul support (in the constructr-coordination module).

@juanjovazquez
Copy link
Contributor Author

@hseeberger, I'm glad that the metaphor (a.k.a. ConstructrMachine) is working well. Congrats!. Up until now, all we'll probably need to add is a way to carry some metadata about the open session with the coordination service, i.e. Consul. As I mentioned before, in order to be able to refresh the node registry, we need to provide a sessionId whose ttl will be reset every time the refresh is called. In our previous poc, we were modelling this defining the machine state as follows:

type StateData = (List[Address], Option[String])

The optional String is a container for the sessionId that will be provided after the AddingSelf stage. How do you think we could model the State to make it more versatile?. What if we model it as some kind of metadata?. Something like:

type StateData = (List[Address], Option[Metadata])

where Metadata could be simply an alias for a Map:

type Metadata = Map[String, String]

WDYT?

@hseeberger
Copy link
Owner

@juanjovazquez, yeah, keeping some context of type Any (other coordination services might have different needs) might be a solution. But please keep in mind that now machine and coordination are separate components. Therefore I wonder whether we need to change the return types of Coordination, inparticular the anemic ones of lock, addSelf and refresh.

@juanjovazquez
Copy link
Contributor Author

@hseeberger, just a short update about our efforts on Consul. We already have the set of multinode tests working properly with Consul instead of etcd. Now, we're going to adapt the solution to the abstractions you've been working lately. We'll try to bring up to date our part asap!. Please, stay tuned!.

@hseeberger
Copy link
Owner

That sounds like great progress!

@juanjovazquez
Copy link
Contributor Author

@hseeberger, as you've probably already seen, we've created a PR according to the outcomes from this discussion thread. A couple of things to take into account.

First, we've had to slightly change the way we're starting the docker containers in tests in order to run them into docker-machine. We couldn't make them start properly in the other way.

Second, we cannot see what has happenned with the refreshing stage after your last changes in code. Please, could you pointing out how is currently the refreshing procedure?. As we understand, it's still necessary to refresh periodically the keys in the coordination service as they have been registered with a ttl, but unfortunately we cannot understand how you are doing now.

Thanks!

@juanjovazquez
Copy link
Contributor Author

Ok, I've just seen that you're commenting in the PR. Sorry...

@hseeberger
Copy link
Owner

No worries!

@juanjovazquez
Copy link
Contributor Author

@hseeberger, please find here #12 a new installment of the Consul PR. Feel free to comment and criticize ;)

@juanjovazquez
Copy link
Contributor Author

@hseeberger, please check out our last revision of #12. As you mentioned, we've squashed all revisions into a single commit. All tests are passing ;).

@hseeberger
Copy link
Owner

Fixed by #12.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants