Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-text search does not appear to update, ever #410

Closed
Preskton opened this issue May 1, 2023 · 27 comments
Closed

Full-text search does not appear to update, ever #410

Preskton opened this issue May 1, 2023 · 27 comments
Assignees
Labels
Infra Team Requires review / feedback / etc. with infra team Suggestions New feature, service integration, or any other improvements

Comments

@Preskton
Copy link
Sponsor Contributor

Preskton commented May 1, 2023

Background

From the mastodon website:

Mastodon supports full-text search when Elasticsearch is available. Mastodon’s full-text search allows logged in users to find results from their own statuses, their mentions, their favourites, and their bookmarks. It deliberately does not allow searching for arbitrary strings in the entire database.

Previously, we offered full-text search against Hachyderm. Hachydermians have asked if we will reintroduce it. To do so, the infrastructure team will:

  • identify an infrastructure implementation
  • perform a cost analysis
  • make a final decision as to whether or not to re-launch this feature

should we choose to reintroduce search, Hachydermians should be able to search for any keyword in toots per the Mastodon manual.

Related Issues

#385
#387
#300
#386

@Preskton Preskton added Suggestions New feature, service integration, or any other improvements Infra Team Requires review / feedback / etc. with infra team labels May 1, 2023
@Preskton
Copy link
Sponsor Contributor Author

Mini-update here - we are working on establishing some base OS images & pipeline that make it easier for us to spin up new services (like ElasticSearch) -- but more importantly -- maintain and keep them secure in a sustainable way. Thanks for your patience, all, as we keep things rolling.

@prohr
Copy link

prohr commented Jul 2, 2023

It's been a few months. Any updates?

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Jul 2, 2023

server build is in progress. if things go well, hope to dark launch it tomorrow, wait it a bit, then formally announce it.

@prohr
Copy link

prohr commented Jul 2, 2023 via email

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Jul 2, 2023

fyi the full text indices are building now. looks like it will take somewhere between 12-16 hours (best guess) for them to process through the backlog.

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Jul 2, 2023

Looks like the index built way faster than expected. Full text search for historical and new toots should be good to go. We'll watch for a few days and then announce if everything continues to look good.

@prohr
Copy link

prohr commented Jul 7, 2023

I can confirm that searching for bulk-loaded content (> 5 days old) works great, but am having trouble searching for more recent posts.

How long a built-in delay should we expect between when posts are created vs. when they become searchable?

The fact that I can't search for content as much as 3 days old suggests that something may be stalled along the content ingestion pipeline.

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Jul 9, 2023

Hrm, I would expect "somewhat immediate", e.g. like maybe 10-15 minutes max. I'll poke and see if we can see what's up.

@prohr
Copy link

prohr commented Jul 12, 2023

To help with your debugging, here's a reproducible test case. ~40 minutes ago, I posted the following:

https://hachyderm.io/@pevohr/110702125969608382

Yet I still get only 2 results when searching for the word "teeny":

  • a December post by me (w/teeny in the ALT text)
  • a May boost by me of a post from another instance

Hence my hunch that something's going wrong when ingesting new posts to be indexed.

@joelanman
Copy link

joelanman commented Aug 9, 2023

I'm a bit confused about how text search is currently supposed to work, I'm currently only finding my starred posts, nothing else

UPDATE

I can search some of my posts but not others, not sure what the difference is

@prohr
Copy link

prohr commented Aug 9, 2023

I'm a bit confused about how text search is currently supposed to work, I'm currently only finding my starred posts, nothing else

How it's supposed to work is described accurately in the OP.

The bug I've been asking @Preskton to chase is that it currently only seems to work for posts before they backfilled the index in early July. AFAICT, new activity since then isn't getting indexed at all.

@joelanman
Copy link

ah nothing since July was the part I didn't know

@prohr
Copy link

prohr commented Aug 9, 2023

Does that fit what you're seeing too? Then it's not just me. 😉

@joelanman
Copy link

After a bit of debugging, thats right, nothing after July 2

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Sep 1, 2023

Full-text search continues to be subject to some gremlin in our system that prevents updates. I'll be looking at this today to double-check configs & firewalls.

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Sep 1, 2023

Cross-linking this to mastodon/mastodon#20230, as this seems to be hitting us as well. Although it seems to be all indexing activity.

@Preskton Preskton changed the title Feature Request: Full-text search Full-text search does not appear to update, ever Sep 1, 2023
@Preskton Preskton self-assigned this Sep 1, 2023
@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Sep 2, 2023

Re-running tootctl search deploy - will take ~17 hours from current estimate.

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Sep 2, 2023

This seems to have worked and refreshed the indices - but I'm not sure if real-time updates are happening.

image

@prohr
Copy link

prohr commented Sep 2, 2023

I can confirm that content since the last bulk load in early July (including the the linked test post above) up through yesterday is all available:

https://hachyderm.io/@pevohr/110985396128335885

However, real-time updates are still broken:

https://hachyderm.io/@pevohr/110996075411410901

Replaying my "teeny" test search now gets me three posts instead of two, but today's fourth post doesn't appear.

@prohr
Copy link

prohr commented Sep 16, 2023

Sigh. Sometime recently all the bulk-loaded posts stopped being searchable too.

Possibly an unintended side effect of the upgrade from 4.1.4 to 4.1.7?

@Preskton
Copy link
Sponsor Contributor Author

Hiya, @prohr - we had to disable FT search the other day as it was affecting sidekiq performance pretty badly. We are going to revisit after 4.2.0 which drops next week, as there are some significant changes there.

@prohr
Copy link

prohr commented Sep 16, 2023

Gotcha. Figured it was something like that. Have the core devs given you any guidance on how to tune this feature so it actually has a chance of ... working?

@mariyadelano
Copy link
Sponsor

Hi, wondering if there are any updates to this given that Mastodon 4.2.0 has been implemented now?

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Oct 3, 2023

The saga continues - we are attempting to re-enable it today. :)

/cc https://github.com/hachyderm/infrastructure/issues/518

@Preskton
Copy link
Sponsor Contributor Author

Preskton commented Oct 3, 2023

We've finished re-indexing -- took about 8 hours. I'm cautiously saying that we've re-enabled full-text search, but we'll watch it over the next week or so to judge impact on sidekiq queues.

@Preskton Preskton closed this as completed Oct 3, 2023
@prohr
Copy link

prohr commented Oct 3, 2023

Confirmed that the old behavior seems to be be working well (including dynamic updates) if you add the in:library qualifier.

Thanks!

@joelanman
Copy link

@Preskton thanks to everyone who worked on it, I really find search very useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infra Team Requires review / feedback / etc. with infra team Suggestions New feature, service integration, or any other improvements
Projects
None yet
Development

No branches or pull requests

4 participants