update ssb-conn to 0.15 #1242

staltz · 2020-01-25T19:52:43Z

This is mostly a small update to ssb-conn, here are some highlights (from most relevant to least):

mark old and failing peers as defunct in the DB
- there are lots of dead pubs in your typical conn.json, and those are all going to be (at some point) attempted for a connection
- I went through most of these and realized that those with hundreds of failures were very likely to be dead pubs
- this update will detect pubs with 200+ failures, and mark them as defunct in the conn.json
- defunct means this peer will never be attempted for a connection again by this scheduler (maybe other schedulers that people implement could ignore this)
- we can't just delete these peers from the conn.json, because they would be re-added to conn.json when your SSB app queries the flumelog for messages of type "pub"
- but peers marked "defunct" have a bunch of fields deleted, this means that the size of conn.json gets compressed
- for instance mine went from 534 KB to 189 KB
update scheduler: remove neverJustOne, it was hard to justify it
- tiny update to the scheduler's behavior
- before, when it picked a pub to connect with, it always picked two of them to maximize chances of connecting
- that was quite an arbitrary decision, and didn't always make sense, so I removed it
update ssb-conn-db with self-healing conn.json
- conn.json files can get corrupted (see Not that atomic, unfortunately flumedb/atomic-file#4), so this update will check if it's corrupted and try to do its best to recover the contents
- but this is mostly on mobile, I haven't seen this happening on desktop

christianbundy · 2020-01-25T21:12:07Z

LGTM, thanks for this patch!

christianbundy · 2020-01-25T21:14:36Z

I'm particularly excited about defunct, that's a super welcome addition.

cinnamon-bun · 2020-02-18T20:03:40Z

@christianbundy @staltz

Re defunct, how long does it take to accumulate 200+ failures, and do they have to be consecutive?

I'm worried about scenarios like...

I was traveling and couldn't connect to the pub in my house for a month so it became defunct
I was offline for a month and all pubs became defunct
I'm offline occasionally and all pubs ended up accumulating 200+ intermittent failures over time
I learned about a pub from another feed. It wasn't on the global internet so I marked it defunct. But then I visited their hackerspace in person and could have connected to it, but it was already defunct

E.g. will this work well in a world of peers and pubs that are not part of the globally connected internet?

One solution could be: when we hear a feed mention a pub, make it un-defunct. Since the feed mentioned it, it's probably still alive, we just can't reach it right now.

black-puppydog · 2020-02-19T13:24:36Z

@cinnamon-bun

when we hear a feed mention a pub

I'd say that does not really apply to pubs. Many people just try following them because they don't understand the invite system, or they try an old invite and it never follows back. But I do agree that any sign of life should reactivate the pub in conn.json. Usually that would mean a message that we see from the pub.

There's also the scenario of a pub just not being on 24/7. Think solar-powered rPi pub. So setting The Right Value ™️ for this is important. I'm already quite concerned about silent fracturing of the network due to undetected communication/gossip inhibition. This has the potential to lighten the work on the client, but it's important to make sure it doesn't prevent gossip from happening.

That all being said: thank you @staltz for this. I'm particularly happy about the self-healing. It's super important to improve robustness, or else we'll have to rely on out-of-band support on this here very technical platform for helping potentially very non-techy users. 👍

staltz · 2020-02-19T13:39:10Z

Re defunct, how long does it take to accumulate 200+ failures, and do they have to be consecutive?

The count is the number of failed-to-connect events since the last succeeded-to-connect event. ssb-conn (like ssb-gossip before) puts an exponential backoff timeout between the attempt-to-connect events. So the more failures, the longer the timeout lasts, and this can be something like hours. (It has a maximum timeout). And it doesn't happen "every X hours" consistently, because if there is another quicker attempt-to-connect to another pub, then we don't even try the failed one.

All this is to say that I believe "soon-defunct pubs" (e.g. failure count at 150) are attempted-to-connect every ~24 hours or so, supposing that you have the SSB app online during all those 24 hours. So the failure count would go up to 200 in my opinion in about 200 days or maybe even a whole year. I think it's reasonable to assume that if a pub couldn't be connected after hundreds of times in a year, then we consider it defunct.

I was traveling and couldn't connect to the pub in my house for a month so it became defunct

If you are truly offline (don't have an ethernet or wifi network interface active), then those pubs won't even be attempted-to-connect to begin with, so their failure count would not get incremented. Even if they would be attempted-to-connect, I believe that in a month the count would go up by maximum 50.

I was offline for a month and all pubs became defunct

Same as above.

I'm offline occasionally and all pubs ended up accumulating 200+ intermittent failures over time

If you're offline occasionally, and if the pub succeeds-to-connect when the failure count is (say) 120, then the failure count would reset immediately back to zero.

E.g. will this work well in a world of peers and pubs that are not part of the globally connected internet?

Yes. On the other hand, regarding network partitions in general, suppose you are in China, and because of the Great Firewall, suppose you can't connect to pubs in the US. In real life, whether a person is dead or whether they now permanently live (they are alive!) on Jupiter, doesn't really matter to you because they are out of your reach, therefore defunct.

I learned about a pub from another feed. It wasn't on the global internet so I marked it defunct. But then I visited their hackerspace in person and could have connected to it, but it was already defunct
...
One solution could be: when we hear a feed mention a pub, make it un-defunct. Since the feed mentioned it, it's probably still alive, we just can't reach it right now.

This is a really good point, I think a mention of a pub at time X, supposing it got defunct at time Y, and supposing X > Y, then I believe we should resurrect the peer. This would require adding the timeOfDeath as a timestamp when marking them as defunct. I opened issue ssbc/ssb-conn#14 for that.

staltz · 2020-02-21T19:23:31Z

@cinnamon-bun I was so wrong about defunct 😱, apparently my Manyverse account now doesn't connect to anything at all, I looked at my conn.json * and lots (not all) of my peers are marked defunct. In hindsight, I should have known that storing unnecessary kilobytes and unnecessarily trying some pubs is a much less worse problem than the presence of (many) false positives when declaring a peer defunct. I think a better implementation of defunctness will be timestamp based: only mark it defunct if there are hundreds of failed connections and the last timestamp of a successful connection was ~1 year ago. But I'm considering making a hot fix in ssb-conn to just sidestep it for now.

* That said, I also seem to have a problem with my public feed: after 1 or 2 scrolls, it doesn't load more messages, indicating that there might be a database problem (like a JS error that goes silently and doesn't cause a crash) which then kills the JS execution but doesn't kill the app, and that would explain why also ssb-conn doesn't run, because no JS is running. But anyway, there are stuff to investigate and fix.

christianbundy · 2020-02-21T19:52:10Z

FWIW I think Sami mentioned this problem where they aren't connecting to anyone anymore without using an invite or something. If you ping me when the hotfix is ready I can release a new Patchwork ASAP.

cinnamon-bun · 2020-02-22T22:53:53Z

@staltz Thanks for answering my worries! I apologize for posting them as a wall of questions like that, I wish I had expressed more gratitude. ❤️

Overall there's no way to know if something is truly dead or just not visible to us for a long time. I agree it makes sense to give up after a long time, I'm just worried that it's permanent. Maybe if defunct peers were still attempted once a month, or something, that would be more resilient.

update ssb-conn to 0.15

0f10938

christianbundy merged commit b5230c6 into master Jan 25, 2020

staltz mentioned this pull request Feb 19, 2020

Un-defunct pubs that got mentioned recently ssbc/ssb-conn#14

Open

staltz deleted the conn15 branch February 19, 2020 13:39

This was referenced Feb 21, 2020

Defunctness causes no pubs to be attempted ssbc/ssb-conn#15

Closed

update ssb-conn to 0.16.2, fix defunctness mess-up #1253

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update ssb-conn to 0.15 #1242

update ssb-conn to 0.15 #1242

Uh oh!

staltz commented Jan 25, 2020

Uh oh!

christianbundy commented Jan 25, 2020

Uh oh!

christianbundy commented Jan 25, 2020

Uh oh!

cinnamon-bun commented Feb 18, 2020

Uh oh!

black-puppydog commented Feb 19, 2020 •

edited

Loading

Uh oh!

staltz commented Feb 19, 2020 •

edited

Loading

Uh oh!

staltz commented Feb 21, 2020 •

edited

Loading

Uh oh!

christianbundy commented Feb 21, 2020

Uh oh!

cinnamon-bun commented Feb 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

update ssb-conn to 0.15 #1242

update ssb-conn to 0.15 #1242

Uh oh!

Conversation

staltz commented Jan 25, 2020

Uh oh!

christianbundy commented Jan 25, 2020

Uh oh!

christianbundy commented Jan 25, 2020

Uh oh!

cinnamon-bun commented Feb 18, 2020

Uh oh!

black-puppydog commented Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staltz commented Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staltz commented Feb 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christianbundy commented Feb 21, 2020

Uh oh!

cinnamon-bun commented Feb 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

black-puppydog commented Feb 19, 2020 •

edited

Loading

staltz commented Feb 19, 2020 •

edited

Loading

staltz commented Feb 21, 2020 •

edited

Loading