Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover by creating a new epoch if someone only posted an exclude message #127

Merged
merged 21 commits into from
Jul 24, 2023

Conversation

Powersource
Copy link
Collaborator

@Powersource Powersource commented May 30, 2023

Based on #126

Progresses #123

Fixes #30

@Powersource Powersource changed the base branch from master to crash-readd May 30, 2023 10:52
Base automatically changed from crash-readd to master May 31, 2023 10:01
listeners.js Outdated Show resolved Hide resolved
listeners.js Outdated Show resolved Hide resolved
listeners.js Outdated Show resolved Hide resolved
@Powersource Powersource marked this pull request as ready for review May 31, 2023 14:19
@Powersource Powersource requested a review from mixmix May 31, 2023 14:19
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'On exclusion, recover if we fail to re-add anyone at all' seems to be flaking a bit on 'There was only 1 re-addition message from alice', it gets 2 instead of 1. Seems to happen more locally

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if alice's listener runs so fast that it accidentally creates a new epoch while exclusion is happening, ending up with 2 epochs and therefore 2 re-addition msgs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also happened in ci now. but not because of the latest commit, it happened when i stashed that one too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm no both messages seem to belong to the same epoch. could be more race conditions? maybe just a race condition in the reAdd function?

ok 10 alice excludes bob but crashes before re-adding herself and carol
readdmember res [
  {
    "previous": "%TC5NW1rAOkiQ/W4YTLhFrh4ASAsI7EcQ8SjOWSZQIRw=.sha256",
    "sequence": 3,
    "author": "@M3OAID01925gO7/l3ZjKbtbmUkXw0gvqRwr5n7hwwPk=.ed25519",
    "timestamp": 1685619076372,
    "hash": "sha256",
    "content": {
      "type": "group/add-member",
      "version": "v2",
      "secret": "47qTvz1zDX5aUBXZPfYx45ORSNXculWk8pUeirK+Hyg=",
      "root": "ssb:message/classic/8ZNj1MuaTVEwULaFtrlzlIKqFLcM6NxmiVODovwoY-0=",
      "creator": "ssb:feed/bendybutt-v1/R7oVP6uSBqSmyC-iQxX67S9Zl2498yxJk4NarqMaBtI=",
      "recps": [
        "ssb:identity/group/98leh3cs6xQmqOVAG90zdMI6WhO04KsaR4kkyds25oU=",
        "ssb:feed/bendybutt-v1/R7oVP6uSBqSmyC-iQxX67S9Zl2498yxJk4NarqMaBtI=",
        "ssb:feed/bendybutt-v1/vNKLsZwOzBfYvqaOeaBQqQ6dPy7HE9ogb3DnzlTaonM="
      ],
      "tangles": {
        "group": {
          "root": "ssb:message/classic/8ZNj1MuaTVEwULaFtrlzlIKqFLcM6NxmiVODovwoY-0=",
          "previous": [
            "ssb:message/classic/mFD-lXW4dmVUoYOL0ybqjI5G8fVAORgNHrvvN1JOuMY="
          ]
        },
        "members": {
          "root": "ssb:message/classic/mFD-lXW4dmVUoYOL0ybqjI5G8fVAORgNHrvvN1JOuMY=",
          "previous": [
            "ssb:message/classic/mFD-lXW4dmVUoYOL0ybqjI5G8fVAORgNHrvvN1JOuMY="
          ]
        }
      }
    },
    "signature": "ybDL5FGpyvTCtTedu0NVe0EiDVXuesmLaS3rQsSZi6WOmWObIBAx6Qtq+o+QndvGMV65s2DDgTd3CcmWraw6Bw==.sig.ed25519"
  }
]
readdmember res [
  {
    "previous": "%6lgUNyUwrYgtu9PPbrmTnlyrbJq0z9paj2oBkcjXSog=.sha256",
    "sequence": 4,
    "author": "@M3OAID01925gO7/l3ZjKbtbmUkXw0gvqRwr5n7hwwPk=.ed25519",
    "timestamp": 1685619076457,
    "hash": "sha256",
    "content": {
      "type": "group/add-member",
      "version": "v2",
      "secret": "47qTvz1zDX5aUBXZPfYx45ORSNXculWk8pUeirK+Hyg=",
      "root": "ssb:message/classic/8ZNj1MuaTVEwULaFtrlzlIKqFLcM6NxmiVODovwoY-0=",
      "creator": "ssb:feed/bendybutt-v1/R7oVP6uSBqSmyC-iQxX67S9Zl2498yxJk4NarqMaBtI=",
      "recps": [
        "ssb:identity/group/98leh3cs6xQmqOVAG90zdMI6WhO04KsaR4kkyds25oU=",
        "ssb:feed/bendybutt-v1/R7oVP6uSBqSmyC-iQxX67S9Zl2498yxJk4NarqMaBtI=",
        "ssb:feed/bendybutt-v1/vNKLsZwOzBfYvqaOeaBQqQ6dPy7HE9ogb3DnzlTaonM="
      ],
      "tangles": {
        "group": {
          "root": "ssb:message/classic/8ZNj1MuaTVEwULaFtrlzlIKqFLcM6NxmiVODovwoY-0=",
          "previous": [
            "ssb:message/classic/6lgUNyUwrYgtu9PPbrmTnlyrbJq0z9paj2oBkcjXSog="
          ]
        },
        "members": {
          "root": "ssb:message/classic/mFD-lXW4dmVUoYOL0ybqjI5G8fVAORgNHrvvN1JOuMY=",
          "previous": [
            "ssb:message/classic/6lgUNyUwrYgtu9PPbrmTnlyrbJq0z9paj2oBkcjXSog="
          ]
        }
      }
    },
    "signature": "JPGP2ppjawHR4TV+s4pVyyfudYNK449ua3G2mhu6Vc9F/yB8G9L6XJBgxIJuTaRuMsGGXv+MztmpDjpzun3ZBg==.sig.ed25519"
  }
]
ok 11 got num of additions after exclude
not ok 12 There was only 1 re-addition message from alice
  ---
    operator: equal
    expected: 1
    actual:   2
    at: Test.<anonymous> (/home/me/prj/ssb/ssb-tribes2/test/exclude-members.test.js:857:5)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the timeout system a bit now so the default in tests is 50ms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm now there's a new one

# Can exclude a person in a group with a lot of members
ok 1 tribes2 started
ok 2 created a group
ok 3 invited 10 peers
ok 4 invited 10 peers
ok 5 peers accept invites
ok 6 alice was able to exclude bob in a group with many members
key ssb:message/classic/0Im6p8Re4bMMTmgaRod2E85YCvMpz49WBPHFeE6a8FI= found, but is disconnected from main graph
[...]
key ssb:message/classic/0Im6p8Re4bMMTmgaRod2E85YCvMpz49WBPHFeE6a8FI= found, but is disconnected from main graph
not ok 7 bob is excluded from group
  ---
    operator: deepEqual
    expected: |-
      { id: 'ssb:identity/group/MpdBB51KjaHZ_sFCqIcB1nNE-b9jMkvr6gROxeNflj4=', excluded: true }
    actual: |-
      { id: 'ssb:identity/group/MpdBB51KjaHZ_sFCqIcB1nNE-b9jMkvr6gROxeNflj4=', excluded: undefined }
    at: Test.<anonymous> (/home/me/prj/ssb/ssb-tribes2/test/exclude-members.test.js:600:5)
    stack: |-
      Error: bob is excluded from group
          at Test.assert [as _assert] (/home/me/prj/ssb/ssb-tribes2/node_modules/.pnpm/tape@5.6.3/node_modules/tape/lib/test.js:312:48)
          at Test.bound [as _assert] (/home/me/prj/ssb/ssb-tribes2/node_modules/.pnpm/tape@5.6.3/node_modules/tape/lib/test.js:95:17)
          at Test.tapeDeepEqual (/home/me/prj/ssb/ssb-tribes2/node_modules/.pnpm/tape@5.6.3/node_modules/tape/lib/test.js:553:7)
          at Test.bound [as deepEquals] (/home/me/prj/ssb/ssb-tribes2/node_modules/.pnpm/tape@5.6.3/node_modules/tape/lib/test.js:95:17)
          at Test.<anonymous> (/home/me/prj/ssb/ssb-tribes2/test/exclude-members.test.js:600:5)
          at runMicrotasks (<anonymous>)
          at processTicksAndRejections (node:internal/process/task_queues:96:5)
  ...
not ok 8 Others got excluded from the group
  ---
    operator: fail
    at: <anonymous> (/home/me/prj/ssb/ssb-tribes2/test/exclude-members.test.js:619:20)
    stack: |-
      Error: Others got excluded from the group
          at Test.assert [as _assert] (/home/me/prj/ssb/ssb-tribes2/node_modules/.pnpm/tape@5.6.3/node_modules/tape/lib/test.js:312:48)
          at Test.bound [as _assert] (/home/me/prj/ssb/ssb-tribes2/node_modules/.pnpm/tape@5.6.3/node_modules/tape/lib/test.js:95:17)
          at Test.fail (/home/me/prj/ssb/ssb-tribes2/node_modules/.pnpm/tape@5.6.3/node_modules/tape/lib/test.js:406:7)
          at Test.bound [as fail] (/home/me/prj/ssb/ssb-tribes2/node_modules/.pnpm/tape@5.6.3/node_modules/tape/lib/test.js:95:17)
          at /home/me/prj/ssb/ssb-tribes2/test/exclude-members.test.js:619:20
          at runMicrotasks (<anonymous>)
          at processTicksAndRejections (node:internal/process/task_queues:96:5)
          at async Test.<anonymous> (/home/me/prj/ssb/ssb-tribes2/test/exclude-members.test.js:606:3)
  ...

1..8
# tests 8
# pass  6
# fail  2

Copy link
Collaborator Author

@Powersource Powersource Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So bob thinks he's not excluded, and the other peers all seem to be creating new epochs, ending up with 11 sometimes...

Could definitely do more logging here to dig deeper.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mixmix feel free to take a look here. i can't decide if it's an antipattern to have the timeout be 0 in tests or having it be non-zero. and do these errors that pop up point to a design error in the recovery or is there a simple fix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm when we ask people to create a new epoch we don't also tell them to re-add members, should do that at least

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-adding people in the listener now too but didn't seem to affect this test

index.js Outdated Show resolved Hide resolved
listeners.js Show resolved Hide resolved
@mixmix mixmix mentioned this pull request Jun 7, 2023
@mixmix mixmix changed the base branch from master to createNewEpoch June 7, 2023 04:13
@mixmix mixmix changed the base branch from createNewEpoch to master June 7, 2023 04:20
@mixmix
Copy link
Member

mixmix commented Jun 7, 2023

OK, have split this into two PRs to simplify debugging:

  1. extract createNewEpoch #129 : passing
  2. crash-epoch-init (less) #130 : contains bugs

@Powersource
Copy link
Collaborator Author

@mixmix is #130 the same as this PR now?

Comment on lines 165 to 166
timeoutLow: 0.5,
timeoutHigh: 0.7,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wrote this in chat about this test:

hmm quick hunch in one test:

alice adds bob and carol
alice removes bob
carol creates a new epoch too for some reason
should be fine in theory but alice doesn't get that epoch, and when alice later adds david, it's only to her own epoch
then when alice gets carol's epoch, there's a conflict
alice picks carol's epoch as preferred since it has fewer members

-> wrong member list, missing david

easiest "fix" for all this is just to disable recovery in most tests (set timeout high (ie default)). then selectively enable in a few cases that we actually want to test

i might just also switch from this to doing the demo app. would be fun to show in berlin. and would show us what bugs actually exist IRL

@Powersource
Copy link
Collaborator Author

tests passing now (just linter failing). mostly since we disabled recovery in most tests, so, limited knowledge gained. wondering if we should just merge and see how this fares in the demo app. the situations we're testing just get more and more complex 😓 🤷

@Powersource
Copy link
Collaborator Author

demo app mostly working now, gonna try this branch out

@Powersource
Copy link
Collaborator Author

Doesn't seem to break stuff a lot but there's this bug that pops up (i think unrelated to this PR) that probably stops the code in this PR from running at all ssbc/ssb-tribes2-demo#7

@Powersource
Copy link
Collaborator Author

@mixmix see #135 where i (using the demo) tested this PR together with #134 that allows the code in this PR to get run (i think?). doesn't seem to break stuff, so i'd argue we just merge this and maybe polish more later

@Powersource Powersource requested a review from mixmix July 17, 2023 13:57
@mixmix
Copy link
Member

mixmix commented Jul 23, 2023

Fresh review (in progress)

Copy link
Member

@mixmix mixmix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it.

lib/exclude.js Outdated Show resolved Hide resolved
@@ -780,10 +788,10 @@ test('On exclusion, if we fail to re-add all people, someone else does that inst

test('On exclusion, recover if we fail to re-add anyone at all', async (t) => {
const run = Run(t)
const alice = Testbot({ name: 'alice', timeoutScale: 0 })
const alice = Testbot({ name: 'alice', ...lowTimeouts })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌶️ - IMO use the API that ssb-tribes uses, so people reading test code don't get confused by API

Suggested change
const alice = Testbot({ name: 'alice', ...lowTimeouts })
const alice = Testbot({ name: 'alice', tribes2: lowTimeouts })

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to skip this change. It's just a recommended pattern

p(bob.close)(true),
p(carol.close)(true),
])
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A beautiful test 🍑

@Powersource Powersource merged commit 169569e into master Jul 24, 2023
3 checks passed
@Powersource Powersource deleted the crash-epoch-init branch July 24, 2023 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use npm module debug
2 participants