Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review Transaction Semantics #153

Closed
gregrluck opened this issue Jun 19, 2013 · 21 comments
Closed

Review Transaction Semantics #153

gregrluck opened this issue Jun 19, 2013 · 21 comments
Labels

Comments

@gregrluck
Copy link
Member

While quite complete, we need to do a complete pass understanding and specifying fully how each operation interacts with transactions.

We will likely look at this after the first public review.

e.g.
What is the impact of transactions on expiry? How do they interact? eg: does expiry cause a transaction not to be committed? eg: say a getAndReplace may not work if the entry has expired on commit?

@verydapeng
Copy link

Can some one explains a typical use case for cache to participate in transaction? IMO, cache = some final value that can be shared for a period of time ...

@verydapeng
Copy link

thx ben

@Cotton-Ben
Copy link

You're welcome. Also, there is some spirited discussion re: this subject in our community discussion forum ( https://groups.google.com/forum/#!forum/jsr107 )

@gregrluck
Copy link
Member Author

Reviewed the Transaction section in the spec doc. Applied reformatting and fixed a couple of grammar errors. Raised #189 for missing exception types. We have a few people reviewing this area over the next few days. We know there are a lot of method interactions which need to be specified.

@brianoliver
Copy link
Member

With respect to CacheWriters, in the specification we state: "The semantics of Transactional Consistency are implementation specific." This really needs some explanation. I think it's very incomplete.

eg: Consider a distributed cache, backed by a cache writer to some underlying store, managed by a cluster of n servers. Assuming a developer can start a transaction against the distributed cache. The manipulate many entries (across the cluster) and then do a commit.

Currently the API does not provide the ability for the backing Cache Writers to be "prepared" with the "preparation" of the Cache entries in the transaction. It's obvious that the transaction must be two (or three) phased, but as we don't have the ability to provide the CacheWriter (or Loader) with transaction information, and/or call prepare / commit, we have to assume that the CacheWriter is non-transactional!

This is a very significant challenge. The writing to the CacheWriters across the n servers can only occur in the "commit" phase and if there's a single failure (or timeout), the entire transaction is now corrupted.

There are numerous possible solutions to this issue, none of which are very good.

  1. Don't allow CacheLoaders or CacheWriters to be used with Transactional Caches. This isn't too bad because if a developer is using a Transactional Cache, they are also Transacting against another resource, most likely a Database (or Messaging System). ie: The Cache should not be managing the underlying Database resource transactions. That could/should be done at the application level.

While this works, I can easily think of use-cases where this breaks down quickly.

  1. Define a new API for CacheLoaders and CacheWriters that Transactional Caches use. This API would provide the appropriate transactional context (sub-transactions of the application transaction), that of which can be used by CacheLoaders/Writers etc, it starts to erode the consistency of the API we have created. eg: We'd need to create a new type of Configuration for Transactional Caches. This isn't ideal.

@brianoliver
Copy link
Member

(as identified by Bill Shannon)

Currently specification only allows transactions to be supported or not supported. Instead we should be able to determine if local or XA transactions are supported. eg: some implementations may not support both.

@Cotton-Ben
Copy link

Brian, Musing openly (and quickly) let me just throw out some of our growing concerns. We are now not sure that it is even possible to completely specify Transactions semantics given how we formally define "Cache" capability guarantees. The point you made about 'how do we test a Cache's transactional completeness given that a Cache, formally, has no time of durability guarantee?' is VERY VALID. Musing openly, our use case (very much definitely) needs something that we have traditionally called a "Cache" to be at least transactional capability ambitious. Though we once said otherwise, we now believe that we don't need a "Cache" to guarantee transactional completeness -- especially wrt to XA. We don't use XA and it ntroduces hellish complications. The important thing is that something that we call a "Cache" allow us to operate w/ accommodation for all of our use cases that are DIRTY_READ intolerant and PHANTOM_READ intolerant. i.e we do need TX_ISOLATION completeness, but we do not need full XA capability completeness. The points you make about how striving to provide this completeness will basically disarm the whole efficacy of a Cache's core usefulness is gaining clarity with us. We like Bill Shannon's consideration that we should be able to determine if local or XA transactions are supported. To heck with full TX completeness. Maybe clarify semantics that we specify to consider this directly, possibly offering considerations to bridges for potential completeness? E.g. When Queues and Database participants do XA they enlist via API bridges like javax.sql.XADataSource and javax.jms.XAQueueConnection ... maybe we just specify (purposefully vaguely?) something like a javax.cache.XACacheSource as bridge for implementations to consider providing XA? Thx, -Ben

@cruftex
Copy link
Member

cruftex commented Aug 6, 2013

Hi,

wrote together some points on transactions here:

http://www.headissue.com/pub/jsr107-review-20130724.html#transactions

However, I don't know whether this leads somewhere, just another try to sort things.

Interestingly, there are many ways to address the topic (expectations, needs) which don't perfectly fit together.

Let me try some ask stupid questions approach:

  • What implementations exist in real live, so it is worth a standard?
  • What exactly is expected from a cache when data is accessed/read during a transaction? Mustn't it be the latest one because we do a transaction? Depending on exclusive access to the data source via loader/writer or non-exclusive access need to change (or drop?) the cache behavior.
  • What applications need transaction awareness and will communicate directly with the caching API, so this must be solved within the cache? What about to do a use case collection for this?

My current conclusion is:

  • Caching and transaction makes sense if the data source is exclusively accessed via loader/writer by the cache, or the "cache" is the storage itself (e.g. with persistence, or k/v in-memory store)
  • If the data is accessed non-exclusively the coherency issues, which may be also "cross technology", will be hard to unsolvable.
  • If accessed non-exclusively, everything above READ_COMMITED means that no caching can occur
  • The whole rules of transaction isolation needs to be re-thinked how to be applied on a cache. E.g. if I do a commit on some Cache.put operation is it successful event if some entries were already evicted, or is eviction not allowed during a transaction?

Best,

Jens

@Cotton-Ben
Copy link

What implementations exist in real live, so it is worth a standard?

Most of the big Caching providers (EhCache, Coherence, eXtreme Scale, Infinispan, etc. ) have indicated they will deliver JSR-107 compliant implementations that deliver the Transactions option.

What exactly is expected from a cache when data is accessed/read during a transaction? Mustn't it be the latest one because we do a transaction? Depending on exclusive access to the data source via loader/writer or non-exclusive access need to change (or drop?) the cache behavior.

The 'Cache' (or at least what my team has traditionally called a 'Cache') needs to guarantee a full ACID capability for a user-demarcated set of operation(s) (on the 'Cache' operand(s)).

This is a non-trivial challenge to deliver full ACID. For those use cases that only need the 'A' in ACID, JSR-107 provides the javax.cache.Cache.EntryProcessor<K,V,T> interface.

Note that I have qualified what we call a 'Cache'. As your white-paper (thank you, BTW) points out (and with which I agree:

This means, within a transaction context the semantics of a cache will be redefined, it is considered to act like a transactional storage and not like a cache. The reason for this is, that it is not allowed to "forget" during the transaction. The paradoxon is, when the transaction commits, removing the mappings of all entries involved (invalidating) seems fairly legal to me.

Yeah. EXACTLY. It gets really hairy if you try to axiomatically derive even the concept of a "Pure Cache" doing Transactions. I like this point from your white-paper.

What applications need transaction awareness and will communicate directly with the caching API, so this must be solved within the cache? What about to do a use case collection for this?

For a cursory intro, see slides 38-43 at https://community.jboss.org/servlet/JiveServlet/download/827161-100215/%40JPMorgan%3D%3DJavaKnowledgeForum%3DFINAL52Data%20Locality%20Latency%20and%20Caching.pptx

For more detail, see our discussion here https://groups.google.com/forum/#!topic/jsr107/MP1ae96LMvM

If accessed non-exclusively, everything above READ_COMMITED means that no caching can occur

If you are not being "Pure Cache" axiomatic, I disagree.

The whole rules of transaction isolation needs to be re-thinked how to be applied on a cache. E.g. if I do a commit on some Cache.put operation is it successful event if some entries were already evicted, or is eviction not allowed during a transaction?

I don't see it that way. We really need to only specify the isolation interface and semantics ... which IMHO are well done in Greg and Ludovic's most recent writing of Chapter 5.

But, again, your points about the axiomatic implications of a 'Pure Cache' being transactional are noted and appreciated.

@brianoliver
Copy link
Member

Ben,

This statement:

Most of the big Caching providers (EhCache, Coherence, eXtreme Scale, Infinispan, etc. ) have indicated they will > deliver JSR-107 compliant implementations that deliver the Transactions option.

Is simply not true. No vendor has or can commit to implementing this optional feature. Furthermore there is no notion of "compliance" in this space as there are no TCK tests. An implementation can't be "compliant" unless it passes all of the TCK tests.

Why are there no TCK tests? Simply because there is no agreement on the semantics of transactions, especially with respect to expiry and eviction.

-- Brian

@Cotton-Ben
Copy link

My bad. You are (of course) correct .... as written, that sentence cannot be true, it should have said

Most of the big Caching providers I have communicated with have indicated an ambition to deliver the JSR-107 Transactions option.

And with no valid TCK, I agree that any such ambition is moot.

What to do?

@brianoliver
Copy link
Member

What to do?

I wish I knew, because I know the use-cases you're trying to solve. :-)

Perhaps it's a magic 🎱

I think more discussions will need to happen. It's good we're getting to them. It's a hard problem and we really appreciate the work, effort, thought's on this from everyone.

@cruftex
Copy link
Member

cruftex commented Aug 8, 2013

If accessed non-exclusively, everything above READ_COMMITED means that no caching can occur

If you are not being "Pure Cache" axiomatic, I disagree.

Ben, what do you mean by "Pure Cache" axiomatic?

BTW: However, I am wrong because above READ_COMMITED the cache can expect that the value does not change once read, so it can cache within the transaction....

@Cotton-Ben
Copy link

Ben, what do you mean by "Pure Cache" axiomatic?

Actually, I was taking some poetic licence by using that term ... no such term exists in the literature.

Idea is this, when mathematicians make statements re: algebras they are grounded in very intense rigor. They start from the axioms (fundamental truths - that apply in all frames of reference - wrt to statements about the algebra's ({operators},{operands}) ... e.g 0 = 0 is the 'reflexive' axiom in the algebra of Natural numbers). From these axioms, they then make "statements" about the algebra and categorize/promote these statements as they improve in quality and maturity. e.g. statements can evolve from conjecture-->lemma-->theorem-->law. These statements get promoted as they survive proof arguments (which depend explicitly on axiomatic bases).

If we apply this kind of rigor to a "Caching algebra" ... and we want to make quality statements about Caching's operators and operands, well we're in kind of trouble wrt to Transactions!

Let's say we have this conjecture: "Caches can be Transactionally sound and complete". And now we want to promote this Caching statement to be a theorem. Well, if one of Pure Caching's "axioms" (actually definition) is that "a Cache can evict an Entry at any time and has ZERO durability obligation" then our conjecture is in big trouble. Any one can correctly counter this conjecture's efficacy by saying "Caches cannot be Transactionally sound and complete" because the statement is inconsistent with a Pure Cache's axioms.

So I am resigned to altering the original conjecture, modifying it to say "Something that I have historically called a Cache can be Transactionally sound and complete". Maybe that statement can get promoted to something of higher quality that conjectrure, but a "Pure Cache" won't be bothered with this consideration. Can't do it.

I took a lot of license using the term "Pure Cache axiomatic" in the last post (but you get what I mean).

@Cotton-Ben
Copy link

one of Pure Caching's "axioms" (actually definition) is that "a Cache can evict an Entry at any time and has ZERO durability obligation"

Brian, Greg - You guys may have answered this already, but, is this statment accepted as fundamentally true with regard to a "Pure Cache"?

@brianoliver
Copy link
Member

Yes. That's true. When we talk about Caches, we're always talking about caches that have those conditions.

@lorban
Copy link
Member

lorban commented Aug 8, 2013

I would like to answer Brian Oliver on the comment he posted about a week ago (#153 (comment)).

I personally don't see any problem with transactional caches and cache writers/readers. If a cache is configured for local transactions, then the transaction's context is local to the cache (hence its name) and should not be propagated in any way to any other resource. If you want a transaction to span across your cache and some other resource(s), this is what XA is for: you should configure your cache as XA and make sure the other resources the cache writer/reader access are XA compliant too and will enlist in the XA transaction's context. this may require propagating the context with the use of suspend/resume if different threads are used but that's standard JTA stuff.

I see no need for anything more than what we currently have.

@brianoliver
Copy link
Member

:) It doesn't look like a problem until a vendor tries to implement them. Like many things that have been designed, on the surface they look ok, but the devil is in the details.

As pointed out, the challenge with the specification is that it basically enforces implementations to support both if they are to support the "optional" transactions feature. It's been proposed that this be split into two parts, optionally supporting Local and XA.

For XA the APIs actually need to be changed - as confirmed by those implementors that attempt to do this. Changing the API would also make Local transactions a bit easier to implement, but then we have two different Caching APIs.

@Cotton-Ben
Copy link

@lorban wrote: I see no need for anything more than what we currently have.

If the obligation of our spec is only to specify a sound/complete interface and limited semantics (leaving the devlish details soley to the implementation), then I agree with Ludovic. If so, what we have in Chapter 5 and Chapter 6 right now is perfectly fine. But, if it is the obligation of the spec to unburden the implementation by providing more semantic details (especially wrt to XA specifics) then our spec may indeed need to say more.

@brianoliver wrote: It's been proposed that this be split into two parts, optionally supporting Local and XA.

This may be an ideal compromise.

I agree with Brian and Bill Shannon's proposal to do this split into two separate options (my agreement here may be a bit selfish .... both because (i) we don't use XA (ii) we now agree that XA betrays to some degree a Cache's fundamental efficacy).

If the devlish details of doing full XA would indeed derail any implementation's other-wise sound/complete local transactions capability from being JSR-107 compliant, then by all means let's make XA transactions and local transactions separately optional.

This proposal seems to safely liberate both spec and implementation from obligation to be "burdened'.

Brian, Greg, would it be at all appropriate to put this proposal to an EG vote at https://groups.google.com/forum/#!forum/jsr107?

@gregrluck
Copy link
Member Author

Removing transactions from V1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants