New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexingScheduler: fetch and import in batches #24285
IndexingScheduler: fetch and import in batches #24285
Conversation
For reference, a typical |
vmst.io contributed their numbers, which peak around 600 queued statuses. |
redis.pipelined do |pipeline| | ||
ids.each { |id| pipeline.srem("chewy:queue:#{type.name}", id) } | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it wouldn't be better to pass all ids
at once.
I also wonder if it wouldn't be better to rewrite the whole thing to use spop
instead. On the one hand, a reason not to use it would be error handling, but on the other hand if errors pile up it does not make sense to let the Redis set grow uncontrollably either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered why the original didn't use the batch form of SREM
. Maybe there's no reason?
The SPOP
approach could work, and prevents the queue from growing without bound, especially in scenarios where an error is unrecoverable.
The one wrinkle there is that the existing code handles short transient ES outages well, and I wouldn't want to change that property casually if instances are accidentally relying on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, Universeodon didn't suffer any obvious ill effects from having 2M status IDs queued for indexing, and once the bug was fixed and this patch was applied, it cleared the backlog successfully.
Just pushed a change to make the import batch size larger and use batched SREM
, since neither would change the behavior on error.
@@ -6,17 +6,17 @@ class Scheduler::IndexingScheduler | |||
|
|||
sidekiq_options retry: 0 | |||
|
|||
IMPORT_BATCH_SIZE = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if this would make sense as a namespaced environment variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? Mastodon doesn't break out the majority of its limits and tuning constants, but I don't know if that's house style or what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this kind of tuning is marginal enough to not warrant adding an environment variable. That can be revisited in the future if the need arises.
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
@ClearlyClaire I noticed an error while running the version that's on |
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com> (cherry picked from commit 6f484fb)
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/mastodon/mastodon](https://github.com/mastodon/mastodon) | patch | `v4.1.2` -> `v4.1.6` | --- ### Release Notes <details> <summary>mastodon/mastodon (ghcr.io/mastodon/mastodon)</summary> ### [`v4.1.6`](https://github.com/mastodon/mastodon/releases/tag/v4.1.6) [Compare Source](mastodon/mastodon@v4.1.5...v4.1.6) <h1><picture> <source media="(prefers-color-scheme: dark)" srcset="./lib/assets/wordmark.dark.png?raw=true"> <source media="(prefers-color-scheme: light)" srcset="./lib/assets/wordmark.light.png?raw=true"> <img alt="Mastodon" src="./lib/assets/wordmark.light.png?raw=true" height="34"> </picture></h1> >⚠️ We recently released critical security updates, so if you are still using 4.1.2 or below, 4.0.4 or below, or 3.5.8 or below, please update as soon as possible (see the release notes for [v4.1.4](https://github.com/mastodon/mastodon/releases/tag/v4.1.4)). #### Upgrade overview ℹ️ Requires streaming API restart For more information, scroll down to the upgrade instructions section. #### Changelog ##### Fixed - Fix memory leak in streaming server ([ThisIsMissEm](mastodon/mastodon#26228)) - Fix wrong filters sometimes applying in streaming ([ClearlyClaire](mastodon/mastodon#26159), [ThisIsMissEm](mastodon/mastodon#26213), [renchap](mastodon/mastodon#26233)) - Fix incorrect connect timeout in outgoing requests ([ClearlyClaire](mastodon/mastodon#26116)) #### Upgrade notes To get the code for v4.1.6, use `git fetch && git checkout v4.1.6`. > As always, **make sure you have backups of the database before performing any upgrades**. If you are using docker-compose, this is how a backup command might look: `docker exec mastodon_db_1 pg_dump -Fc -U postgres postgres > name_of_the_backup.dump` ##### Dependencies External dependencies have not changed compared to v4.1.5, the compatible Ruby, PostgreSQL, Node, Elasticsearch and Redis versions are the same, that is: - Ruby: 2.7 to 3.0 - PostgreSQL: 9.5 or newer - Elasticsearch (optional, for full-text search): 7.x - Redis: 4 or newer - Node: >= 14, < 18 - ImageMagick: 6.9.7-7 or newer > If your uploaded images are broken after the upgrade, it means your installed ImageMagick version is older than the new minimum version (6.9.7-7), for example if you are running Ubuntu 18.04. If this happens, you can find more information and ways to fix it [on this page](mastodon/mastodon#25776). ##### Update steps The following instructions are for updating from 4.1.5. If you are upgrading directly from an earlier release, please carefully read the upgrade notes for the skipped releases as well, as they often require extra steps such as database migrations. **Non-Docker only:** 1. Install dependencies: `bundle install` and `yarn install` **Both Docker and non-Docker:** 1. Restart all Mastodon processes ### [`v4.1.5`](https://github.com/mastodon/mastodon/releases/tag/v4.1.5) [Compare Source](mastodon/mastodon@v4.1.4...v4.1.5) <h1><picture> <source media="(prefers-color-scheme: dark)" srcset="./lib/assets/wordmark.dark.png?raw=true"> <source media="(prefers-color-scheme: light)" srcset="./lib/assets/wordmark.light.png?raw=true"> <img alt="Mastodon" src="./lib/assets/wordmark.light.png?raw=true" height="34"> </picture></h1> >⚠️ We recently released critical security updates, so if you are still using 4.1.2 or below, 4.0.4 or below, or 3.5.8 or below, please update as soon as possible (see the release notes for [v4.1.4](https://github.com/mastodon/mastodon/releases/tag/v4.1.4)). #### Changelog ##### Added - Add check preventing Sidekiq workers from running with Makara configured ([ClearlyClaire](mastodon/mastodon#25850)) ##### Changed - Change request timeout handling to use a longer deadline ([ClearlyClaire](mastodon/mastodon#26055)) ##### Fixed - Fix moderation interface for remote instances with a .zip TLD ([ClearlyClaire](mastodon/mastodon#25885)) - Fix remote accounts being possibly persisted to database with incomplete protocol values ([ClearlyClaire](mastodon/mastodon#25886)) - Fix trending publishers table not rendering correctly on narrow screens ([vmstan](mastodon/mastodon#25945)) ##### Security - Fix CSP headers being unintentionally wide ([ClearlyClaire](mastodon/mastodon#26105)) #### Upgrade notes To get the code for v4.1.5, use `git fetch && git checkout v4.1.5`. > As always, **make sure you have backups of the database before performing any upgrades**. If you are using docker-compose, this is how a backup command might look: `docker exec mastodon_db_1 pg_dump -Fc -U postgres postgres > name_of_the_backup.dump` ##### Dependencies External dependencies have not changed compared to v4.1.4, the compatible Ruby, PostgreSQL, Node, Elasticsearch and Redis versions are the same, that is: - Ruby: 2.7 to 3.0 - PostgreSQL: 9.5 or newer - Elasticsearch (optional, for full-text search): 7.x - Redis: 4 or newer - Node: >= 14, < 18 - ImageMagick: 6.9.7-7 or newer > If your uploaded images are broken after the upgrade, it means your installed ImageMagick version is older than the new minimum version (6.9.7-7), for example if you are running Ubuntu 18.04. If this happens, you can find more information and ways to fix it [on this page](mastodon/mastodon#25776). ##### Update steps The following instructions are for updating from 4.1.4. If you are upgrading directly from an earlier release, please carefully read the upgrade notes for the skipped releases as well, as they often require extra steps such as database migrations. **Non-Docker only:** 1. Install dependencies: `bundle install` and `yarn install` **Both Docker and non-Docker:** 1. Restart all Mastodon processes ### [`v4.1.4`](https://github.com/mastodon/mastodon/releases/tag/v4.1.4) [Compare Source](mastodon/mastodon@v4.1.3...v4.1.4) <h1><picture> <source media="(prefers-color-scheme: dark)" srcset="./lib/assets/wordmark.dark.png?raw=true"> <source media="(prefers-color-scheme: light)" srcset="./lib/assets/wordmark.light.png?raw=true"> <img alt="Mastodon" src="./lib/assets/wordmark.light.png?raw=true" height="34"> </picture></h1> > This release addresses a few issues that were missed in the last security update and includes changelogs for both updates. > >⚠️ It is a follow-up to the important 4.1.3 security release fixing multiple **critical security issues** (CVE-2023-36460, CVE-2023-36459). > > Corresponding security releases are available for the [4.0.x branch](https://github.com/mastodon/mastodon/releases/tag/v4.0.6) and the [3.5.x branch](https://github.com/mastodon/mastodon/releases/tag/v3.5.10). > If you are using nightly builds, **do not use this release** but update to `nightly-2023-07-07-v4.1.4` or newer instead. If you are on the `main` branch, update to the latest commit. #### Upgrade overview This release contains upgrade notes that deviate from the norm: ℹ️ Requires streaming API restart ℹ️ There are suggested reverse proxy configuration changes :warning: The minimal supported ImageMagick version has been bumped to 6.9.7-7 For more information, scroll down to the upgrade instructions section. #### Changelog (v4.1.4) ##### Fixed - Fix branding:generate_app_icons failing because of disallowed ICO coder ([ClearlyClaire](mastodon/mastodon#25794)) - Fix crash in admin interface when viewing a remote user with verified links ([ClearlyClaire](mastodon/mastodon#25796)) - Fix processing of media files with unusual names ([ClearlyClaire](mastodon/mastodon#25788)) #### Changelog (v4.1.3) ##### Added - Add fallback redirection when getting a webfinger query `LOCAL_DOMAIN@LOCAL_DOMAIN` ([ClearlyClaire](mastodon/mastodon#23600)) ##### Changed - Change OpenGraph-based embeds to allow fullscreen ([ClearlyClaire](mastodon/mastodon#25058)) - Change AccessTokensVacuum to also delete expired tokens ([ClearlyClaire](mastodon/mastodon#24868)) - Change profile updates to be sent to recently-mentioned servers ([ClearlyClaire](mastodon/mastodon#24852)) - Change automatic post deletion thresholds and load detection ([ClearlyClaire](mastodon/mastodon#24614)) - Change `/api/v1/statuses/:id/history` to always return at least one item ([ClearlyClaire](mastodon/mastodon#25510)) - Change auto-linking to allow carets in URL query params ([renchap](mastodon/mastodon#25216)) ##### Removed - Remove invalid `X-Frame-Options: ALLOWALL` ([ClearlyClaire](mastodon/mastodon#25070)) ##### Fixed - Fix wrong view being displayed when a webhook fails validation ([ClearlyClaire](mastodon/mastodon#25464)) - Fix soft-deleted post cleanup scheduler overwhelming the streaming server ([ThisIsMissEm](mastodon/mastodon#25519)) - Fix incorrect pagination headers in `/api/v2/admin/accounts` ([danielmbrasil](mastodon/mastodon#25477)) - Fix multiple inefficiencies in automatic post cleanup worker ([ClearlyClaire](mastodon/mastodon#24607), [ClearlyClaire](mastodon/mastodon#24785), [ClearlyClaire](mastodon/mastodon#24840)) - Fix performance of streaming by parsing message JSON once ([ThisIsMissEm](mastodon/mastodon#25278), [ThisIsMissEm](mastodon/mastodon#25361)) - Fix CSP headers when `S3_ALIAS_HOST` includes a path component ([ClearlyClaire](mastodon/mastodon#25273)) - Fix `tootctl accounts approve --number N` not aproving N earliest registrations ([danielmbrasil](mastodon/mastodon#24605)) - Fix reports not being closed when performing batch suspensions ([ClearlyClaire](mastodon/mastodon#24988)) - Fix being able to vote on your own polls ([ClearlyClaire](mastodon/mastodon#25015)) - Fix race condition when reblogging a status ([ClearlyClaire](mastodon/mastodon#25016)) - Fix “Authorized applications” inefficiently and incorrectly getting last use date ([ClearlyClaire](mastodon/mastodon#25060)) - Fix “Authorized applications” crashing when listing apps with certain admin API scopes ([ClearlyClaire](mastodon/mastodon#25713)) - Fix multiple N+1s in ConversationsController ([ClearlyClaire](mastodon/mastodon#25134), [ClearlyClaire](mastodon/mastodon#25399), [ClearlyClaire](mastodon/mastodon#25499)) - Fix user archive takeouts when using OpenStack Swift ([ClearlyClaire](mastodon/mastodon#24431)) - Fix searching for remote content by URL not working under certain conditions ([ClearlyClaire](mastodon/mastodon#25637)) - Fix inefficiencies in indexing content for search ([VyrCossont](mastodon/mastodon#24285), [VyrCossont](mastodon/mastodon#24342)) ##### Security - Add finer permission requirements for managing webhooks ([ClearlyClaire](mastodon/mastodon#25463)) - Update dependencies - Add hardening headers for user-uploaded files ([ClearlyClaire](mastodon/mastodon#25756)) - Fix verified links possibly hiding important parts of the URL (CVE-2023-36462) - Fix timeout handling of outbound HTTP requests (CVE-2023-36461) - Fix arbitrary file creation through media processing (CVE-2023-36460) - Fix possible XSS in preview cards (CVE-2023-36459) #### Upgrade notes To get the code for v4.1.4, use `git fetch && git checkout v4.1.4`. > As always, **make sure you have backups of the database before performing any upgrades**. If you are using docker-compose, this is how a backup command might look: docker exec mastodon_db\_1 pg_dump -Fc -U postgres postgres > name_of_the_backup.dump ##### Dependencies Apart from ImageMagick, external dependencies have not changed compared to v4.1.2, the compatible Ruby, PostgreSQL, Node, Elasticsearch and Redis versions are the same, that is: - Ruby: 2.7 to 3.0 - PostgreSQL: 9.5 or newer - Elasticsearch (optional, for full-text search): 7.x - Redis: 4 or newer - Node: >= 14, < 18 - ImageMagick: 6.9.7-7 or newer If your uploaded images are broken after the upgrade, it means your installed ImageMagick version is older than the new minimum version (6.9.7-7), for example if you are running Ubuntu 18.04. If this happens, you can find more informations and ways to fix it [on this page](mastodon/mastodon#25776). ##### Update steps The following instructions are for updating from 4.1.2. If you are upgrading directly from an earlier release, please carefully read the upgrade notes for the skipped releases as well, as they often require extra steps such as database migrations. **Non-Docker only:** 1. Install dependencies: `bundle install` and `yarn install` **Both Docker and non-Docker:** ℹ️ The recommended configuration for reverse proxies has been updated. Unlike updating Mastodon itself, this is not urgent, but hardening. The change is about setting `Content-Security-Policy: default-src 'none'; form-action 'none'` and `X-Content-Type-Options: nosniff` on assets. Check `dist/nginx.conf` for more information, and [the documentation](https://docs.joinmastodon.org/admin/optional/object-storage-proxy/) if you are proxying external object storage. 1. Restart all Mastodon processes ### [`v4.1.3`](https://github.com/mastodon/mastodon/releases/tag/v4.1.3) [Compare Source](mastodon/mastodon@v4.1.2...v4.1.3) <h1><picture> <source media="(prefers-color-scheme: dark)" srcset="./lib/assets/wordmark.dark.png?raw=true"> <source media="(prefers-color-scheme: light)" srcset="./lib/assets/wordmark.light.png?raw=true"> <img alt="Mastodon" src="./lib/assets/wordmark.light.png?raw=true" height="34"> </picture></h1> >⚠️ This release is an important security release fixing multiple **critical security issues** (CVE-2023-36460, CVE-2023-36459). > > Corresponding security releases are available for the [4.0.x branch](https://github.com/mastodon/mastodon/releases/tag/v4.0.5) and the [3.5.x branch](https://github.com/mastodon/mastodon/releases/tag/v3.5.9). > If you are using nightly builds, **do not use this release** but update to `nightly-2023-07-06-security` or newer instead. If you are on the `main` branch, update to the latest commit. #### Upgrade overview This release contains upgrade notes that deviate from the norm: ℹ️ Requires streaming API restart ℹ️ There are suggested reverse proxy configuration changes :warning: The minimal supported ImageMagick version has been bumped to 6.9.7-7 For more information, scroll down to the upgrade instructions section. #### Changelog ##### Added - Add fallback redirection when getting a webfinger query `LOCAL_DOMAIN@LOCAL_DOMAIN` ([ClearlyClaire](mastodon/mastodon#23600)) ##### Changed - Change OpenGraph-based embeds to allow fullscreen ([ClearlyClaire](mastodon/mastodon#25058)) - Change AccessTokensVacuum to also delete expired tokens ([ClearlyClaire](mastodon/mastodon#24868)) - Change profile updates to be sent to recently-mentioned servers ([ClearlyClaire](mastodon/mastodon#24852)) - Change automatic post deletion thresholds and load detection ([ClearlyClaire](mastodon/mastodon#24614)) - Change `/api/v1/statuses/:id/history` to always return at least one item ([ClearlyClaire](mastodon/mastodon#25510)) - Change auto-linking to allow carets in URL query params ([renchap](mastodon/mastodon#25216)) ##### Removed - Remove invalid `X-Frame-Options: ALLOWALL` ([ClearlyClaire](mastodon/mastodon#25070)) ##### Fixed - Fix wrong view being displayed when a webhook fails validation ([ClearlyClaire](mastodon/mastodon#25464)) - Fix soft-deleted post cleanup scheduler overwhelming the streaming server ([ThisIsMissEm](mastodon/mastodon#25519)) - Fix incorrect pagination headers in `/api/v2/admin/accounts` ([danielmbrasil](mastodon/mastodon#25477)) - Fix multiple inefficiencies in automatic post cleanup worker ([ClearlyClaire](mastodon/mastodon#24607), [ClearlyClaire](mastodon/mastodon#24785), [ClearlyClaire](mastodon/mastodon#24840)) - Fix performance of streaming by parsing message JSON once ([ThisIsMissEm](mastodon/mastodon#25278), [ThisIsMissEm](mastodon/mastodon#25361)) - Fix CSP headers when `S3_ALIAS_HOST` includes a path component ([ClearlyClaire](mastodon/mastodon#25273)) - Fix `tootctl accounts approve --number N` not aproving N earliest registrations ([danielmbrasil](mastodon/mastodon#24605)) - Fix reports not being closed when performing batch suspensions ([ClearlyClaire](mastodon/mastodon#24988)) - Fix being able to vote on your own polls ([ClearlyClaire](mastodon/mastodon#25015)) - Fix race condition when reblogging a status ([ClearlyClaire](mastodon/mastodon#25016)) - Fix “Authorized applications” inefficiently and incorrectly getting last use date ([ClearlyClaire](mastodon/mastodon#25060)) - Fix “Authorized applications” crashing when listing apps with certain admin API scopes ([ClearlyClaire](mastodon/mastodon#25713)) - Fix multiple N+1s in ConversationsController ([ClearlyClaire](mastodon/mastodon#25134), [ClearlyClaire](mastodon/mastodon#25399), [ClearlyClaire](mastodon/mastodon#25499)) - Fix user archive takeouts when using OpenStack Swift ([ClearlyClaire](mastodon/mastodon#24431)) - Fix searching for remote content by URL not working under certain conditions ([ClearlyClaire](mastodon/mastodon#25637)) - Fix inefficiencies in indexing content for search ([VyrCossont](mastodon/mastodon#24285), [VyrCossont](mastodon/mastodon#24342)) ##### Security - Add finer permission requirements for managing webhooks ([ClearlyClaire](mastodon/mastodon#25463)) - Update dependencies - Add hardening headers for user-uploaded files ([ClearlyClaire](mastodon/mastodon#25756)) - Fix verified links possibly hiding important parts of the URL (CVE-2023-36462) - Fix timeout handling of outbound HTTP requests (CVE-2023-36461) - Fix arbitrary file creation through media processing (CVE-2023-36460) - Fix possible XSS in preview cards (CVE-2023-36459) #### Upgrade notes To get the code for v4.1.3, use `git fetch && git checkout v4.1.3`. > As always, **make sure you have backups of the database before performing any upgrades**. If you are using docker-compose, this is how a backup command might look: docker exec mastodon_db\_1 pg_dump -Fc -U postgres postgres > name_of_the_backup.dump ##### Dependencies Apart from ImageMagick, external dependencies have not changed compared to v4.1.2, the compatible Ruby, PostgreSQL, Node, Elasticsearch and Redis versions are the same, that is: - Ruby: 2.7 to 3.0 - PostgreSQL: 9.5 or newer - Elasticsearch (optional, for full-text search): 7.x - Redis: 4 or newer - Node: >= 14, < 18 - ImageMagick: 6.9.7-7 or newer If your uploaded images are broken after the upgrade, it means your installed ImageMagick version is older than the new minimum version (6.9.7-7), for example if you are running Ubuntu 18.04. If this happens, you can find more informations and ways to fix it [on this page](mastodon/mastodon#25776). ##### Update steps The following instructions are for updating from 4.1.2. If you are upgrading directly from an earlier release, please carefully read the upgrade notes for the skipped releases as well, as they often require extra steps such as database migrations. **Non-Docker only:** 1. Install dependencies: `bundle install` and `yarn install` **Both Docker and non-Docker:** ℹ️ The recommended configuration for reverse proxies has been updated. Unlike updating Mastodon itself, this is not urgent, but hardening. The change is about setting `Content-Security-Policy: default-src 'none'; form-action 'none'` and `X-Content-Type-Options: nosniff` on assets. Check `dist/nginx.conf` for more information, and [the documentation](https://docs.joinmastodon.org/admin/optional/object-storage-proxy/) if you are proxying external object storage. 1. Restart all Mastodon processes </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4yMy4yIiwidXBkYXRlZEluVmVyIjoiMzYuMjMuMiIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Reviewed-on: https://git.home/nrdufour/home-ops/pulls/17 Co-authored-by: Renovate <renovate@ptinem.io> Co-committed-by: Renovate <renovate@ptinem.io>
This fixes an issue that @supernovae and I encountered on Universeodon: when a Redis set for Mastodon's Chewy import strategy (like
chewy:queue:StatusesIndex
) becomes extremely large (over 2M IDs in our case1), trying to bulk-import so many objects at once causes the Sidekiq job to run out of memory and die after importing few to no objects and removing none of their IDs from the set, causing the job to make no progress and the set to never decrease in size.This patch does two things to make
IndexingScheduler
more robust:SMEMBERS
command with severalSSCAN
commands (seeSCAN
for more detail), limiting to 10,000 the number of IDs returned from Redis at once. This results in several more Redis calls to retrieve the whole set, but smaller responses. TheCOUNT
argument is treated as a hint, but the returned number of responses should be the same order of magnitude for large collections. Deleting already-seen elements from a set during a scan should not cause issues with the scan; see the "Scan guarantees" section of theSCAN
docs.IMPORT_BATCH_SIZE
number of 100 worked for us, but is conservative, and it's possible that increasing it up toSCAN_BATCH_SIZE
or increasing both would result in higher throughput; it'd be useful to know the size ofchewy:queue:StatusesIndex
on other large production instances to tune it. The important thing is that the import batch size is capped instead of being potentially unbounded.TODOs to exit draft:
Footnotes
This was mostly due to a bug in Universeodon's extended search functionality. That bug has already been fixed in Extended status and account search #24055, but the fix was inadvertently removed from Universeodon during a merge, causing status indexing to fail for some statuses. That said, sufficiently active instances with future extended search functionality may well end up with large
chewy:queue:StatusesIndex
sets due to normal use, or if the Sidekiq process runningIndexingScheduler
is down for an extended period of time. ↩