Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra: add locking to prevent concurrent migration execution #48

Closed
Pvlerick opened this issue Mar 30, 2018 · 16 comments
Closed

Cassandra: add locking to prevent concurrent migration execution #48

Pvlerick opened this issue Mar 30, 2018 · 16 comments

Comments

@Pvlerick
Copy link
Contributor

All in the title. I'll add that ASAP.

@Pvlerick
Copy link
Contributor Author

I'm a bit confused by the locking mechanism.
I don't find any use of the EnableClusterMode property and the Lock() methods on MetadataTable is only used in the tests.

I think it's a great feature, but the way it is implemented at the moment cannot be replicated in Cassandra since the fetch/release of the lock is done in a transactional manner. Locking in Cassandra would have been achieved by insert a row somewhere then deleting it when done, which is not possible with the current abstractions.

@lecaillon
Copy link
Owner

lecaillon commented Mar 30, 2018

EnableClusterMode is a feature toggle enabled by default and used to activate or not the locking mecanism:

if (EnableClusterMode)

The Lock() function is never used. I think we can remove it.

See the DatabaseHelper.TryAcquireApplicationLock() and the DatabaseHelper.ReleaseApplicationLock() to implement such a mecanism in Cassandra

@Pvlerick
Copy link
Contributor Author

OK. I'll start by removing the Lock function then.

I'll see if I can have a way to implement EnableClusterMode and ReleaseApplicationLock with Casssandra, but I'm not sure that will be possible at the database level. My initial idea was to insert a row in the log table used by Evolve to acquire the lock (so other processes will not run the migration if the lock is present in the table), but that's not at database level unfortunately.

I'll try to figure something out.

@lecaillon
Copy link
Owner

lecaillon commented Mar 30, 2018

You don't have to do anything with EnableClusterMode but more certainly with DatabaseHelper.TryAcquireApplicationLock() ;)

And if this function has to acquire a table level lock, I don't really see the problem (the naming perhaps ?).
Just be sure to release it in DatabaseHelper.ReleaseApplicationLock() whatever happens.

@Pvlerick
Copy link
Contributor Author

My bad, I meant TryAcquireApplicationLock and not EnableClusterMode in my previous comment :-)

The issue is that these methods are defined on the database, which doesn't know about schema (keyspace in Cassandra) nor table names; it's not like in SQL engines where there is a global lock mechanism.
The best option I have now is to add the schema and table as parameters to both methods so that they the DatabaseHelper object can retrieve the metadata table and insert the lock in it. I'm actually wondering if that wouldn't a good thing for the other locks as now they act at database level, for what I understand, and since they lock on a constant no two Evolve can run on that database even if they don't act on the same schema.

Or I fail to understand something, which is also entirely possible ;-)

@lecaillon
Copy link
Owner

lecaillon commented Mar 30, 2018

You are right. So it seems to have at least 2 ways of locking the database:

  • at a global level for all the actual relational databases Evolve supports
  • and at a table level, say at a "metadatable" level, for Cassandra and others (relationnal or not) i suppose

I know you can think of a beautiful abstraction @Pvlerick ;)

@lecaillon
Copy link
Owner

For the design of the new locking strategy, as we face it, there are use cases where it makes sense to have that mecanism to the database and to the metadatatable level.
Finally I don't like the idea of passing around a ref of the MetadataTable to the DatabaseHelper. And hide the fact that sometimes we use table locking.
It's a bit strange to call a TryAcquireApplicationLock() method that in reality lock a table ...

So why don't we keep the 'old' design with the table Lock() method and change it a bit to have a bool TryLock() and a new bool TryReleaseLock() method ?
In the end we juste have change their calls in the main Evolve class, somewhere around WaitForApplicationLock()

For the Cassandra locking part, to allow strong consistency, maybe we need that the metadatable be in a keyspace with a replication factor of 1. It should be at least encouraged or even perhaps forced.

What do you think @Pvlerick ?

@Pvlerick
Copy link
Contributor Author

@lecaillon if I step back a little bit, the fundamental requirement is to avoid having two of the same Evolve migrations running at the same time. We define here "same Evolve migrations" by the fact that they use the same metadata table.

The way this is achieved (ignoring Cassandra) is to use the locking mechanism provided by the different RDBMS (except SQLite that doesn't support it, I don't know much about it but I guess it's only ever used by one application so it is kind of pointless).
My understanding is that these locking mechanisms are at database level, but that you must provide a key that you lock on. As these keys are constants at the moment, this prevents two Evolve migrations happening at the same time even if they are not the "same Evolve migrations", right?

If this locking is moved (conceptually) at the metadata table level, it would mean that it would then be coupled to the metadata table, which would theoretically allow two different Evolve migrations to run at the same time for as long as the don't use the same metadata table (it might sound like a weird weird scenario, but it might be the case for a large database that is used by multiple applications, which is bad design but not so uncommon; we have one at work). For RDBMS that provide a locking mechanism, this could simply be achieved by locking on the metadata tables's name while for Cassandra the plan is to use lightweight transactions which give strong consistency.

Sorry, this is a bit longer that I intended, but I think in the end, your suggestion to have TryLock and TryReleaseLock is the best course of action. Would that mean dropping TryAcquireApplicationLock?

@lecaillon
Copy link
Owner

lecaillon commented Mar 31, 2018

Always interresting ;)

No, I meant use the TryAcquireApplicationLock AND the TryLock in the Evolve.cs WaitForApplicationLock() method.
Considering lock acquired, when both of the locking methods returned true.
For example in Cassandra TryAcquireApplicationLock () will always return true, and for SQLServer TryLock() will always return true too.

Depending the database only one locking method is implemented, the other one returns true.

lecaillon added a commit that referenced this issue Apr 3, 2018
lecaillon added a commit that referenced this issue Apr 3, 2018
@Pvlerick
Copy link
Contributor Author

Pvlerick commented Apr 3, 2018

Ok, I'll try using what you included in the latest commits (cf8128c and up).

I'm still wondering why we need to lock both at the application level and at the table level. Can't we just lock at the table level, conceptually, and use the advanced locking mechanisms of RDBMs that support is in there (such as SELECT GET_LOCK('{schema}-{metadatatable}', 0);) ?

@lecaillon
Copy link
Owner

lecaillon commented Apr 3, 2018

@Pvlerick
Finally table locking strategy works...
There is one limitation for databases that use table locking (ie: Cassandra): the schema creation is not protected by the lock. Because to lock a row in the metadatatable you have to create the schema before... So the Evolve.ManageSchemas() is outside the WaitForMetadataTableLock().
But because you use IF NOT EXISTS when you create keyspace it should be ok !

@lecaillon
Copy link
Owner

lecaillon commented Apr 3, 2018

@Pvlerick

I'm still wondering why we need to lock both at the application level and at the table level.

In fact only one strategy is defined by database. And I prefer the application level one when possible. Less prone to bug and more robust in my mind.

@Pvlerick
Copy link
Contributor Author

Pvlerick commented Apr 3, 2018

@lecaillon ah, I see, I didn't think of the scenario indeed; it is true that the application level lock ensures that the schema creation is protected. This would be an issue in the case of the first call to Evolve.

Regardless, the locking in Cassandra is now done, as soon as the master branch is stable I'll rebase on it and make a PR. I see that the error in Travis is from the Cassandra implementation, I'll have a look into it.

@lecaillon
Copy link
Owner

@Pvlerick
I'll take care of the Travis error later today. It is on "my" Cassandra msbuild drivers tests. Don't waste your time. ;)

@lecaillon
Copy link
Owner

lecaillon commented Apr 3, 2018

@Pvlerick
I think the Travis test in error is due to a missing mandatory field when you lock the table (perhaps description ...?)

@Pvlerick
Copy link
Contributor Author

Pvlerick commented Apr 3, 2018

Yes, I'm working on it. I'll add a commit to #50 to address that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Evolve 1.9.0
  
Done
Development

No branches or pull requests

2 participants