Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: did-rotate #2816

Merged
merged 29 commits into from
Mar 15, 2024
Merged

feat: did-rotate #2816

merged 29 commits into from
Mar 15, 2024

Conversation

amanji
Copy link
Contributor

@amanji amanji commented Feb 27, 2024

Taken from #2494 (c/o @dbluhm)

This PR is an early implementation of the RFC 0794 DID Rotate protocol. This was an exercise to feel out any potential issues with the protocol as defined.

This is not yet a complete implementation. Notably missing right now is a way to actually trigger a DID rotation through an Admin API call. I also need to finish wiring up the handlers to the manager. And who could forget about tests. 🙂

Overall, the protocol is solid. I'll report my thoughts on the protocol on the PR to the Aries RFCs.

An ACA-Py specific concern I discovered in this process that probably deserves some discussion: invalidating the connection target cache when the DID Rotation is committed. As currently structured, we never really expect the connection targets for a connection to change after the protocol establishing it concludes. DID Rotation shakes that expectation up.

Right now, in my changes, I've added a clear_connection_targets_cache helper method to partially address this. It's partial because it will only work for ACA-Py when run as a single instance with the default in memory cache or if you're using Indicio's redis cache plugin for whole cluster shared caching.

I think it would be better for ACA-Py to use a value including both DIDs of the connection in the cache key as a more complete solution. The problem with this and why I've not done it on a first pass is that there would need to be some restructuring of how we use the Responder classes so that the send methods expect a whole ConnRecord and not just a connection_id (or individually a their_did and my_did, I suppose) so we can extract the DIDs from it. 9 times out of 10, I would say that connection record is already available to the caller of the send method and my hypothesis is that in the remaining 1 time out of 10, it will not dramatically impact performance of critical operations (it is not common for us to know the connection_id and not also have a connection record -- maybe only a few places in the Admin API might be like this).

I'll do some more investigation on these points. Might be good to talk implications at the next ACA-PUG call.

cc @swcurran @TelegramSam
FYI @Jsyro this has ties back to updating the DIDs associated with connections that we've talked about recently.

dbluhm and others added 13 commits September 12, 2023 13:12
Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
From record_keys_for_public_did. Peer DIDs are resolvable but not
necessarily public.

Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
Signed-off-by: Daniel Bluhm <dbluhm@pm.me>
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
@amanji amanji marked this pull request as draft February 27, 2024 21:14
@amanji amanji mentioned this pull request Feb 27, 2024
amanji and others added 6 commits February 27, 2024 21:42
@swcurran swcurran assigned swcurran and amanji and unassigned swcurran Mar 4, 2024
@amanji
Copy link
Contributor Author

amanji commented Mar 4, 2024

Tagging @dbluhm for review, please.

@dbluhm dbluhm marked this pull request as ready for review March 4, 2024 18:45
Copy link
Member

@dbluhm dbluhm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! Some comments below. A general comment: we need integration tests for this protocol.

aries_cloudagent/protocols/did_rotate/v1_0/manager.py Outdated Show resolved Hide resolved
aries_cloudagent/protocols/did_rotate/v1_0/manager.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
amanji and others added 6 commits March 5, 2024 14:37
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
"""
# TODO it would be better to include the DIDs of the connection in the
# target cache key This solution only works when using whole cluster
# caching or have only a single instance with local caching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should discuss this point; if we don't make this change now, there can be unexpected behavior in clusters. Additionally, once the rotation is completed, there is no way to handle any potentially out of order messages. If we did make this change, the cached value for the previous DIDs would be around until the cache expires them. There would still be issues in clustered environments without whole cluster caching, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can live with this for now, I think this PR is in a good state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm not entirely familiar with how the different caching mechanisms work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm feeling like the right option right now is probably to log an issue to address handling potentially out of order messages received after a rotation. It's an unlikely edge case and I think it will be more valuable to keep this PR moving for now.

Signed-off-by: Akiff Manji <akiff.manji@quartech.com>
Copy link

sonarcloud bot commented Mar 14, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@jamshale
Copy link
Contributor

Not sure why this PR is getting so many random integration test fails. Not the same as the ones as I fixed recently. Might try an look at these other ones at some point. Re-running again.

@amanji
Copy link
Contributor Author

amanji commented Mar 14, 2024

@jamshale Yeah the tests seem so finicky.

@jamshale
Copy link
Contributor

@jamshale Yeah the tests seem so finicky.

They seem to get worse sometimes and are better sometimes. The last fail was the one I thought I had fixed. Hadn't seen it in a while. Not sure if adding more retries might help or not.

@dbluhm dbluhm merged commit 3eb0bf7 into hyperledger:main Mar 15, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants