New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search all silo posts for links to users' sites and send mentions #456

Closed
snarfed opened this Issue Sep 1, 2015 · 28 comments

Comments

Projects
None yet
4 participants
@snarfed
Owner

snarfed commented Sep 1, 2015

spun out of #51. from #51 (comment):

an idea for expanding this: search silos for any posts, from anyone, that link to the user's domain(s), and send wms for them too. these are effectively mentions.

silo support for this is mixed:

@snarfed snarfed added now listen labels Sep 1, 2015

@snarfed snarfed changed the title from seach all silo posts for links to users' sites and send mentions to search all silo posts for links to users' sites and send mentions Sep 1, 2015

@snarfed snarfed self-assigned this Sep 12, 2015

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 13, 2015

Owner

cc @kylewm in case you're interested in adding flickr search support... (see above)

Owner

snarfed commented Sep 13, 2015

cc @kylewm in case you're interested in adding flickr search support... (see above)

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 14, 2015

Owner

the remaining part here is to send mention posts themselves, not just their responses. this needs a new post response type connected to the post mf2 handler.

Owner

snarfed commented Sep 14, 2015

the remaining part here is to send mention posts themselves, not just their responses. this needs a new post response type connected to the post mf2 handler.

snarfed added a commit that referenced this issue Sep 19, 2015

snarfed added a commit that referenced this issue Sep 19, 2015

snarfed added a commit to snarfed/granary that referenced this issue Sep 20, 2015

snarfed added a commit that referenced this issue Sep 20, 2015

snarfed added a commit that referenced this issue Sep 20, 2015

snarfed added a commit that referenced this issue Sep 20, 2015

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 20, 2015

Owner

finally soft launched this, and it worked well, but evidently has a memory leak, so i had to roll it back.

Exceeded soft private memory limit of 256 MB with 328 MB after servicing 2 requests total.

ugh.

there's FUD here and there about the sockets API maybe causing memory leaks due to badly handled range requests, but i can't tell how real it is or if it could be causing this. i suspect i've just been wasteful with memory, e.g. lots of string concatenations and copy.deepcopys, and it's finally time to pay the piper. whee, can't wait to heap profile. 😭

silver lining: at least i know the window of commits where the leak was introduced!

Owner

snarfed commented Sep 20, 2015

finally soft launched this, and it worked well, but evidently has a memory leak, so i had to roll it back.

Exceeded soft private memory limit of 256 MB with 328 MB after servicing 2 requests total.

ugh.

there's FUD here and there about the sockets API maybe causing memory leaks due to badly handled range requests, but i can't tell how real it is or if it could be causing this. i suspect i've just been wasteful with memory, e.g. lots of string concatenations and copy.deepcopys, and it's finally time to pay the piper. whee, can't wait to heap profile. 😭

silver lining: at least i know the window of commits where the leak was introduced!

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 20, 2015

Owner

the little orange bump of 500s here is our instances flapping (OOMing, restarting, and OOMing again):

chart

here's a snippet of individual requests at peak flap. the red !!! ones are OOMs. not pretty!

screen shot 2015-09-20 at 12 24 48 pm

Owner

snarfed commented Sep 20, 2015

the little orange bump of 500s here is our instances flapping (OOMing, restarting, and OOMing again):

chart

here's a snippet of individual requests at peak flap. the red !!! ones are OOMs. not pretty!

screen shot 2015-09-20 at 12 24 48 pm

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 20, 2015

Owner

silver lining: it's working ok, at least! e.g. the top response here: https://www.brid.gy/twitter/kylewmahan#responses is this tweet: https://twitter.com/anarcho/status/643921641664200704 which propagated as a mention to https://kylewm.com/2015/09/repost-of-glenn-greenwald-the-new-revolving-door

Owner

snarfed commented Sep 20, 2015

silver lining: it's working ok, at least! e.g. the top response here: https://www.brid.gy/twitter/kylewmahan#responses is this tweet: https://twitter.com/anarcho/status/643921641664200704 which propagated as a mention to https://kylewm.com/2015/09/repost-of-glenn-greenwald-the-new-revolving-door

@kylewm

This comment has been minimized.

Show comment
Hide comment
@kylewm

kylewm Sep 21, 2015

Collaborator

wow, that mention is hidden behind a redirect too, pretty cool!

Collaborator

kylewm commented Sep 21, 2015

wow, that mention is hidden behind a redirect too, pretty cool!

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 24, 2015

Owner

this has noticeably increased our poll latency:
screen shot 2015-09-24 at 10 43 13 am

the poll task queue is now ~90m behind. not a big deal, but definitely not ideal. hrmph. time to profile i guess.

Owner

snarfed commented Sep 24, 2015

this has noticeably increased our poll latency:
screen shot 2015-09-24 at 10 43 13 am

the poll task queue is now ~90m behind. not a big deal, but definitely not ideal. hrmph. time to profile i guess.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 24, 2015

Owner

some of this might be just because our slow poll frequency is once a day, so we're still working through the first set of search results for many users. that should be done by around noon PST. i'll revisit if latency is still consistently bad after that.

Owner

snarfed commented Sep 24, 2015

some of this might be just because our slow poll frequency is once a day, so we're still working through the first set of search results for many users. that should be done by around noon PST. i'll revisit if latency is still consistently bad after that.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 24, 2015

Owner

scratch that, we'll be caught up by ~1:30pm PST today, since we're ~90m behind. math!

Owner

snarfed commented Sep 24, 2015

scratch that, we'll be caught up by ~1:30pm PST today, since we're ~90m behind. math!

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 25, 2015

Owner

poll latency is looking better now. averaging 5-10s, higher than ~4s before, but still reasonable.

screen shot 2015-09-25 at 11 42 22 am

Owner

snarfed commented Sep 25, 2015

poll latency is looking better now. averaging 5-10s, higher than ~4s before, but still reasonable.

screen shot 2015-09-25 at 11 42 22 am

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 25, 2015

Owner

the poll queue is still behind by 45m :/, but i'm hoping some of that was due to #490. i pushed out a change there (1ebfe1c) a few hours ago that adds a bunch of shortlink generator domains to the blacklist and checks the blacklist before searching for a domain, so i'm hoping that will help some too.

Owner

snarfed commented Sep 25, 2015

the poll queue is still behind by 45m :/, but i'm hoping some of that was due to #490. i pushed out a change there (1ebfe1c) a few hours ago that adds a bunch of shortlink generator domains to the blacklist and checks the blacklist before searching for a domain, so i'm hoping that will help some too.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 26, 2015

Owner

tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them.

Owner

snarfed commented Sep 26, 2015

tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them.

@singpolyma

This comment has been minimized.

Show comment
Hide comment
@singpolyma

singpolyma Oct 22, 2015

Contributor

Does brid.gy also turn @ mentions to my twitter username to webmentions to my domain? That would be similar to this and very nice

Contributor

singpolyma commented Oct 22, 2015

Does brid.gy also turn @ mentions to my twitter username to webmentions to my domain? That would be similar to this and very nice

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Oct 22, 2015

Owner

@singpolyma not right now, but that's an interesting feature request. just to confirm, you're proposing they'd be sent to your front page, e.g. target=https://singpolyma.net/?

Owner

snarfed commented Oct 22, 2015

@singpolyma not right now, but that's an interesting feature request. just to confirm, you're proposing they'd be sent to your front page, e.g. target=https://singpolyma.net/?

@singpolyma

This comment has been minimized.

Show comment
Hide comment
@singpolyma

singpolyma Oct 22, 2015

Contributor

@snarfed yes. or whatever URL is on my twitter profile

Contributor

singpolyma commented Oct 22, 2015

@snarfed yes. or whatever URL is on my twitter profile

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Dec 6, 2015

Owner

i currently craft search queries by stripping scheme (ie http://), putting quotes around the remaining domain and path, and ORing all of those together, e.g. "snarfed.org" OR "instagram.com/snarfed". sadly, this has been returning both false positive and false negatives in both G+ and Twitter. :/

i added the scheme back to G+ searches in 485af73, and it looks like that cut out the false positives but didn't add any false negatives.

still working on Twitter. here's some research so far for the example domain hypothes.is, including links to searches:

hrmph.

Owner

snarfed commented Dec 6, 2015

i currently craft search queries by stripping scheme (ie http://), putting quotes around the remaining domain and path, and ORing all of those together, e.g. "snarfed.org" OR "instagram.com/snarfed". sadly, this has been returning both false positive and false negatives in both G+ and Twitter. :/

i added the scheme back to G+ searches in 485af73, and it looks like that cut out the false positives but didn't add any false negatives.

still working on Twitter. here's some research so far for the example domain hypothes.is, including links to searches:

hrmph.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Dec 6, 2015

Owner

i'm now thinking about still using the "hypothes.is" style search for twitter and filtering out the false positives manually.

Owner

snarfed commented Dec 6, 2015

i'm now thinking about still using the "hypothes.is" style search for twitter and filtering out the false positives manually.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed
Owner

snarfed commented Dec 6, 2015

@singpolyma

This comment has been minimized.

Show comment
Hide comment
@singpolyma

singpolyma Dec 6, 2015

Contributor

Filtering false positives seems like an essential thing to do. Trying to get as much as possible is probably the best, then filter after

Contributor

singpolyma commented Dec 6, 2015

Filtering false positives seems like an essential thing to do. Trying to get as much as possible is probably the best, then filter after

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Dec 6, 2015

Owner

i wish! sadly many users' domains are common words, or have common words in them, so their false positive rate can be 1K:1 or even 1M:1 for domains with words like blog or web. :/ and bridgy is approaching 1k twitter users, so I'd like to try to cut down that workload (and cost) a bit.

Owner

snarfed commented Dec 6, 2015

i wish! sadly many users' domains are common words, or have common words in them, so their false positive rate can be 1K:1 or even 1M:1 for domains with words like blog or web. :/ and bridgy is approaching 1k twitter users, so I'd like to try to cut down that workload (and cost) a bit.

@singpolyma

This comment has been minimized.

Show comment
Hide comment
@singpolyma

singpolyma Dec 6, 2015

Contributor

filter out common words and only search for the unique part maybe?

Contributor

singpolyma commented Dec 6, 2015

filter out common words and only search for the unique part maybe?

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Dec 6, 2015

Owner

oh boy, and now i'm in the business of maintaining a stop word list and search query rewriter. :P you're definitely right, it's doable, i'm just not sure i want to take that plunge...

Owner

snarfed commented Dec 6, 2015

oh boy, and now i'm in the business of maintaining a stop word list and search query rewriter. :P you're definitely right, it's doable, i'm just not sure i want to take that plunge...

@singpolyma

This comment has been minimized.

Show comment
Hide comment
@singpolyma

singpolyma Dec 6, 2015

Contributor

Sorry. Was a thought

Contributor

singpolyma commented Dec 6, 2015

Sorry. Was a thought

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Dec 6, 2015

Owner

np! definitely appreciated. 👬

Owner

snarfed commented Dec 6, 2015

np! definitely appreciated. 👬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment