Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stuck sync-ing #2

Open
ahdinosaur opened this issue Dec 19, 2019 · 12 comments
Open

stuck sync-ing #2

ahdinosaur opened this issue Dec 19, 2019 · 12 comments

Comments

@ahdinosaur
Copy link
Member

hi folx! (@pietgeursen) 😺

here's an issue for the situation i'm currently in:

  • i've installed the app, works great, no crashes
  • i followed (or at least attempted to follow) the on-boarding instructions
  • i did somewhat of an initial sync, and have some content (but not all content)
  • when Sunrise Social is running, i see the Sunrise peer from my Patchwork peer
  • i no longer get any new content, the latest i have now is the latest i always have, my Sunrise peer doesn't seem to be successfully replicating

what steps could i take to better debug this situation?

would i be able to start over my log, while keeping the same Sunrise identity?

cheers ❤️

@pietgeursen
Copy link
Member

As a first pass, you probably need to kill the app properly to close the server and restart it.

One way to reliably do this: press the square in Android to see all your running apps and then swipe up on the sunrise social app.

Re open the app and try again.

Heads up, because stuff is sorted by timestamp, it is possible the app is is syncing lots of other threads that are older than your latest and it appears like nothing is being added. But it is. It takes my phone 40-50mins to replicate down the 300± people I follow.

To help understand if replication is actually happening, use adb logcat | grep GoLog to see the server logs.

@black-puppydog
Copy link

I think I'm seeing the same.
Trying to parse logcat atm, with this command:

adb logcat | grep -F "`adb shell ps | grep sunrise  | tr -s [:space:] ' ' | cut -d' ' -f2`"

There's quite a lot of logging and I'm not sure it would all be helpful, but here's something that stuck out to me:
I wrote a new post on patchwork, then clicked the "disconnect" button on the rusty client:

image

That is, the green button on the side. Sure enough, soon after, one of the two seemed to re-connect to the other. And at that moment, I got this message in the logcat:

12-20 11:06:57.739 26362 26439 E GoLog   : level=debug time=2019-12-20T10:06:57.739406523Z caller=wants.go:204 module=sbot module=WantManager gauge=blobs.proc add=1
12-20 11:06:59.916 26362 26439 E GoLog   : level=warn time=2019-12-20T10:06:59.915726314Z caller=fetch.go:100 module=sbot plugin=gossip event="skipped updating of stored feed" err="consistency error: wrong stored message sequence for feed @6ilZq3kN0F+dXFHAPjAwMm87JEb/VdB+LC9eIMW3sa0=.ed25519. stored:&{274921829856 1 275122948880 {0 0 0}} indexed:8569" fr=6ilZ
12-20 11:07:00.806 26362 26445 E GoLog   : level=debug time=2019-12-20T10:07:00.805874231Z caller=fetch.go:85 module=sbot plugin=gossip event="feed fetch workers filled" n=24
12-20 11:07:00.860 26362 26439 E GoLog   : level=debug time=2019-12-20T10:07:00.860389335Z caller=fetch.go:209 module=sbot plugin=gossip event=gossiprx fr=MRiJ latest=2023 received=1 took=1.905711978s
12-20 11:07:00.864 26362 26439 E GoLog   : level=debug time=2019-12-20T10:07:00.864241939Z caller=fetch.go:87 module=sbot plugin=gossip event="workers done" err=null
12-20 11:07:00.864 26362 26439 E GoLog   : level=debug time=2019-12-20T10:07:00.864411731Z caller=handler.go:95 module=sbot plugin=gossip remote=MRiJ event=gossiprx msg="hops fetch done" count=234 took=3.180292708s

Not sure why it only includes a snippet, but MRiJ is a prefix of my profile's pubkey.

@cryptix
Copy link

cryptix commented Dec 20, 2019

12-20 11:06:59.916 26362 26439 E GoLog   :
level=warn
time=2019-12-20T10:06:59.915726314Z
caller=fetch.go:100
module=sbot
plugin=gossip
event="skipped updating of stored feed"
err="consistency error: wrong stored message sequence for feed @6ilZq3kN0F+dXFHAPjAwMm87JEb/VdB+LC9eIMW3sa0=.ed25519. stored:&{274921829856 1 275122948880 {0 0 0}} indexed:8569" fr=6ilZ

This error doesn't stop ALL replication but it suggests a corruption of a single feed which can't be updated any longer. This can happen if the bot crashes during writing messages... If the bot start and stops correctly this should never happen. I will make a note to handle this better. There is code to wipe a single feed but it's not hooked into this automatically.

@pietgeursen there is a FSCK method on Sbot that you can call to check for this, too.

Long story short, the only class of error that should stop fetching feeds is the broken pipe/connection disconnected category.

event=gossiprx msg="hops fetch done" count=234 took=3.180292708s where count is the number of fetched feeds.

Not sure if this number is too low but if your main feed doesn't update correctly it could stop further processing of contact messages which would explain missing content.

Not sure why it only includes a snippet

I'm converting to only logging prefixes for privacy reasons if the code is just for debugging. People running this on a server would basically produce logs of when they received what from who. I'm also trying to obfuscate IP addresses in the logs for similar reasons. Standard GDPR practice is to null the last two bytes of an IP address for instance.

@pietgeursen is there logging or a status page for how many messages are in graphql? would be interesting to know if all the messages made it into sqlite or not to get a better understanding of what is (not) happening.

@ahdinosaur a similar logcat dump would be useful to get an idea of where it hangs.

@black-puppydog
Copy link

I'm converting to only logging prefixes for privacy reasons if the code is just for debugging. People running this on a server would basically produce logs of when they received what from who.

That's very good thinking! 👍

Okay so if the inconsistency in one feed isn't the reason, then I'll just leave you with a full log and hope you see something I don't. :P
sunrise_social_start.log
This is the same dance: launch sunrise social, wait for it to show up in patchwork, click "disconnect", wait for connection to be reestablished.

@cryptix
Copy link

cryptix commented Dec 20, 2019

Oh, I should have made this clearer: This consistency error is bad and you sadly probably need to wipe the data folder and start fresh otherwise it will not be able to establish all your (friends) follows. I also fear that this is currently not possible without loosing the secret... 😰

@black-puppydog
Copy link

ah, yeah, that is clearer. That's fine, I didn't expect to keep this identity for long anyhow, for pretty much this reason :D

@ahdinosaur
Copy link
Member Author

As a first pass, you probably need to kill the app properly to close the server and restart it... Heads up, because stuff is sorted by timestamp, it is possible the app is is syncing lots of other threads that are older than your latest and it appears like nothing is being added. But it is.

yes i installed the app when it was first released, been hoping it would eventually work one of these restarts, but i keep seeing the same message from 30 November.

@ahdinosaur a similar logcat dump would be useful to get an idea of where it hangs.

my logs: sunrise-social-2019-12-22.log

maybe:

12-22 13:49:25.567   404   691 E GoLog   : level=error time=2019-12-22T00:49:25.566907253Z caller=handler.go:113 module=sbot plugin=gossip/hist event=gossiptx err="bad request: ssb/message: not a feed ref: feedRef: couldn't parse \"@S9lG7keeOfIAepU1LRD9XUDtuesbejRC2+3/FRaGZ3s=.sha256\": ssb: Invalid Ref Algo"

because then immediately after is:

12-22 13:49:27.192   404   689 E GoLog   : level=debug time=2019-12-22T00:49:27.192326159Z caller=handler.go:95 module=sbot plugin=gossip remote=6ilZ event=gossiprx msg="hops fetch done" count=588 took=2.35403137s

and this happens consistently every time, even when doing a manual disconnect and reconnect.

@cryptix
Copy link

cryptix commented Dec 22, 2019

Oh! Thanks @ahdinosaur! That just lighted a spark and might possibly should be a new issue in itself...

The first error with the .sha256 „keypair“ is from an incoming call asking for such a feed. The go-ssb implementation answers with an error to that but maybe I should silently close the stream? I think the error might make the remote disconnect, also stopping replication in the other direction.

@pietgeursen
Copy link
Member

@ahdinosaur Let's try something. I want to see what happens when we delete just the patchql database and let it rebuild

Kill the app.

Run

adb shell
run-as social.sunrise.app
rm -rf databases

Restart the app

If you want to delete the go logs and try and re-sync:

in the app dir

cd app_scuttlebutt/go_log
rm -rf indexes log sublogs

@ahdinosaur
Copy link
Member Author

$ adb shell
Phone:/ $ run-as social.sunrise.app
run-as: package not debuggable: social.sunrise.app

so i assume i'll need to build from source?

@pietgeursen
Copy link
Member

@ahdinosaur oh noes, I didn't know that was a thing.

Maybe if I made debug builds we could use them.

For now, just wait for a few days. I want to do a new release with cryptix's latest changes which might fix this issue.

When you install the next version, uninstall this version first.

Sorry!

@justinabrahms
Copy link
Contributor

justinabrahms commented Feb 23, 2020

I've just installed the app a few days ago, but hit a similar issue as described here. I also didn't have a debug build. I attempted to force it to be debuggable using these instructions, but was unsuccessful in assembling the APK again.

I tried to build it from source, but had two errors:

During make, the :app:kaptGenerateStubsDebugKotlin step fails with this error:

Failed to open zip file.
Gradle's dependency cache may be corrupt (this sometimes occurs after a network connection timeout.)
Re-download dependencies and sync project (requires network)

I re-downloaded it to no avail. There are no logs that I see in android studio that would indicate there's more information to be had.

Second, I thought that maybe ssb-patchql-android should be built first. That java project fails with: failed to load source for a dependency on ssb-patchql-core . My guess is that this is because I've not build the rust code yet. There's a missing dependency for that (ssb-patchql, which needs to be in an odd location). Doing a cargo build once that's installed also fails.

cargo build                              
    Updating crates.io index
    Updating git repository `https://github.com/sunrise-choir/flumedb-rs`                                
    Updating git repository `https://github.com/sunrise-choir/ssb-causal-sort`                           
    Updating crates.io index                                                                             
    Updating git repository `https://github.com/sunrise-choir/ssb-multiformats`
error: failed to load source for a dependency on `ssb-multiformats`                                      

Caused by:
  Unable to update https://github.com/sunrise-choir/ssb-multiformats#6ddcf582

Caused by:
  revspec '6ddcf582bd41dcf7755419a6c48061643efe15b1' not found; class=Reference (4); code=NotFound (-3)

At this point, I'm out of things to try. Any chance that next version is coming? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants