Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Store Promise<Response> instead of Response for HTTP API transactions #1624

Merged
merged 9 commits into from Nov 14, 2016

Conversation

Projects
None yet
2 participants
Contributor

Kegsay commented Nov 11, 2016

This fixes a race whereby:

  • User hits an endpoint.
  • No cached transaction so executes main code.
  • User hits same endpoint.
  • No cache transaction so executes main code.
  • Main code finishes executing and caches response and returns.
  • Main code finishes executing and caches response and returns.

This race is common in the wild when Synapse is struggling under load.
This commit fixes the race by:

  • User hits an endpoint.
  • Caches the promise to execute the main code and executes main code.
  • User hits same endpoint.
  • Yields on the same promise as the first request.
  • Main code finishes executing and returns, unblocking both requests.

Now with bonus sytests!

Kegsay added some commits Nov 10, 2016

Store Promise<Response> instead of Response for HTTP API transactions
This fixes a race whereby:
 - User hits an endpoint.
 - No cached transaction so executes main code.
 - User hits same endpoint.
 - No cache transaction so executes main code.
 - Main code finishes executing and caches response and returns.
 - Main code finishes executing and caches response and returns.

 This race is common in the wild when Synapse is struggling under load.
 This commit fixes the race by:
  - User hits an endpoint.
  - Caches the promise to execute the main code and executes main code.
  - User hits same endpoint.
  - Yields on the same promise as the first request.
  - Main code finishes executing and returns, unblocking both requests.

@Kegsay Kegsay added the bug label Nov 11, 2016

Owner

erikjohnston commented Nov 11, 2016

I wondering if a nicer API would be something like:

self.transactions.fetch_or_execute(
    self.handler.do_foo, txn_id,
    arg1, arg2, arg3=arg3
)

where fetch_or_execute would use the first arg as a txn_id, and if that wasn't in the cache call the given function with the arguments. This has the advantage you don't explicitly need to remember to both check and store.

Owner

erikjohnston commented Nov 11, 2016

Also, would be totally awesome if HttpTransactionCache had some python tests. Maybe also move it to synapse.utils?

Contributor

Kegsay commented Nov 11, 2016

fetch_or_execute doesn't appear to pass the request object which is used to select the key in the cache. Was this intentional?

Contributor

Kegsay commented Nov 11, 2016 edited

Also, I don't mind moving it to util, but that feels like a downgrade in specificity, since only the REST Servlets make use of this class, and the aforementioned reliance on a request object.

Contributor

Kegsay commented Nov 11, 2016

I'm guessing you're proposing I make it more generic (so txn_id is just the key in the cache)? Do we plan on using the generic form elsewhere?

Owner

erikjohnston commented Nov 11, 2016

fetch_or_execute doesn't appear to pass the request object which is used to select the key in the cache. Was this intentional?

Nah, I just made it up.

Also, I don't mind moving it to util, but that feels like a downgrade in specificity, since only the REST Servlets make use of this class, and the aforementioned reliance on a request object.

At the very least it should be moved up, but generally I quite helpers like this to live a bit separately, rather than being dumped alongside the rest servlets themselves

I'm guessing you're proposing I make it more generic (so txn_id is just the key in the cache)? Do we plan on using the generic form elsewhere?

Well, it is currently implemented in a generic fashion. I'm happy for the arg name to be txn_id or key or whatever

Contributor

Kegsay commented Nov 11, 2016

At the very least it should be moved up, but generally I quite helpers like this to live a bit separately, rather than being dumped alongside the rest servlets themselves

synapse/rest/client/transactions.py perhaps?

Well, it is currently implemented in a generic fashion. I'm happy for the arg name to be txn_id or key or whatever

Do you want the implementation to be generic or are you happy with it in its current form (accepting request objects which are then processed for their access_token for use as a key)?

Owner

erikjohnston commented Nov 11, 2016

synapse/rest/client/transactions.py perhaps?

That's fine i suppose

Do you want the implementation to be generic or are you happy with it in its current form (accepting request objects which are then processed for their access_token for use as a key)?

Oh, I misread. Yeah, ok, I guess the generation of the key is non-trivial. Though I'd still be tempted to move the _get_key onto the v1 rest base class, as that would make the transaction class a nice self-contained and easily testable class, rather than having it know about request objects.

Contributor

Kegsay commented Nov 11, 2016

SGTM

Contributor

Kegsay commented Nov 11, 2016

Hmmm. The old implementation was using transaction IDs as a way to prune the cache, but it means that you couldn't have multiple in-flight requests at the same time and get idempotency, which feels bad. I've removed that code in my fix, but now the cache will grow unbounded.

How do you propose I clear the cache? Periodic interval? 10 minutes? The generic form now just takes a key, so I can't be more intelligent like base it off the given user (access_token, which is now concatenated in the key).

synapse/rest/client/transactions.py
+ of (response_code, response_dict).
+ """
+ try:
+ return self.transactions[txn_key]
@erikjohnston

erikjohnston Nov 12, 2016

Owner

I think you need a .observe() on the end

@Kegsay

Kegsay Nov 14, 2016

Contributor

Done.

synapse/rest/client/transactions.py
+ deferred = fn(*args, **kwargs)
+ observable = ObservableDeferred(deferred)
+ self.transactions[txn_key] = observable
+ return observable
@erikjohnston

erikjohnston Nov 12, 2016

Owner

Ditto a .observe() here too

@Kegsay

Kegsay Nov 14, 2016

Contributor

Done.

synapse/rest/client/v1/room.py
+ observable = self.txns.fetch_or_execute_request(
+ request, self.on_POST, request
+ )
+ res = yield observable.observe()
@erikjohnston

erikjohnston Nov 12, 2016

Owner

Ah, I'd move this .observe() up into the actual cache to make things neater:

def on_PUT(self, request, txn_id):
    return self.txns.fetch_or_execute_request(
        request, self.on_POST, request
    )
@Kegsay

Kegsay Nov 14, 2016

Contributor

Done.

Owner

erikjohnston commented Nov 12, 2016

How do you propose I clear the cache? Periodic interval? 10 minutes? The generic form now just takes a key, so I can't be more intelligent like base it off the given user (access_token, which is now concatenated in the key).

For now, I'd probably expire after 30mins (10 is probably a bit on the low side). Ideally I'd guess we'd probably batch persist these txn_ids to the db so they survive restarts, and then purge that table after a few hours/days.

Owner

erikjohnston commented Nov 12, 2016

(Also, a python test case for the HttpTransactionCache class would be awesome)

Kegsay added some commits Nov 14, 2016

Contributor

Kegsay commented Nov 14, 2016

(Also, a python test case for the HttpTransactionCache class would be awesome)

Done.

For cleaning entries, I'm just periodically checking every 30 minutes, and timestamping when functions were invoked (which means the actual time in the cache is between 30~60 minutes). This feels simpler and less wasteful compared to registering timeouts for each entry in the cache, which has comparatively more function call overhead.

Contributor

Kegsay commented Nov 14, 2016 edited

@erikjohnston PTAL

Also, are the Dendron tests just flakey or should I be worried? Looking at the previous builds on http://matrix.org/jenkins/job/SynapseSytestDendronCommit/ makes me think flakey, but I don't know.

Owner

erikjohnston commented Nov 14, 2016

Also, are the Dendron tests just flakey or should I be worried? Looking at the previous builds on http://matrix.org/jenkins/job/SynapseSytestDendronCommit/ makes me think flakey, but I don't know.

Yes :(

Owner

erikjohnston commented Nov 14, 2016

LGTM

@Kegsay Kegsay merged commit 9355a5c into develop Nov 14, 2016

10 checks passed

Flake8 + Packaging (Commit) Build #2046 origin/kegan/idempotent-requests succeeded in 31 sec
Details
Flake8 + Packaging (Merged PR) Build finished.
Details
Sytest Dendron (Commit) Build #1113 origin/kegan/idempotent-requests succeeded in 11 min
Details
Sytest Dendron (Merged PR) Build finished.
Details
Sytest Postgres (Commit) Build #1952 origin/kegan/idempotent-requests succeeded in 10 min
Details
Sytest Postgres (Merged PR) Build finished.
Details
Sytest SQLite (Commit) Build #1997 origin/kegan/idempotent-requests succeeded in 7 min 7 sec
Details
Sytest SQLite (Merged PR) Build finished.
Details
Unit Tests (Commit) Build #2077 origin/kegan/idempotent-requests succeeded in 2 min 46 sec
Details
Unit Tests (Merged PR) Build finished.
Details

@freelock freelock referenced this pull request in matrix-org/matrix-js-sdk Nov 14, 2016

Open

room.currentState does not always get updated with new state #275

@richvdh richvdh deleted the kegan/idempotent-requests branch Dec 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment