Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTL is not displayed after applying #11623 in some instance #11643

Closed
mayaeh opened this issue Aug 23, 2019 · 3 comments · Fixed by #11648
Closed

LTL is not displayed after applying #11623 in some instance #11643

mayaeh opened this issue Aug 23, 2019 · 3 comments · Fixed by #11648

Comments

@mayaeh
Copy link
Contributor

mayaeh commented Aug 23, 2019

Actual behaviour

After applying #11623, LTL of my instance ( taruntarun.net ) can no longer be displayed with “504 gateway timeout”.
Currently it is operating normally.
I have temporarily restored to the state before #11623.

But, for my other two instances, there is no problem.

Maybe some PostgreSQL task was working.

These are my DB server load graphs.

スクリーンショット 2019-08-23 21 48 56

スクリーンショット 2019-08-23 21 49 09

I ran assets:precompile and db:migrate at 15:05 - 15:10 JST.
( At this point, I didn’t run tootctl cache clear. )

And at 15:10 - 15:12, I restarted mastodon with another VPS.
( My instance taruntarun.net is running mastodon-*.service and DB (PostgreSQL and Redis) on different VPS. )

Then when I reloaded the web browser, LTL was not displayed.
After that, “504 gateway timeout” was displayed.

Then, the CPU usage and disk access of DB servers have increased rapidly.

Then I stopped mastodon-*.service again and ran tootctl cache clear.
And I re-run mastodon-*.service, but the situation did not improve.

Then, I ran rake db:rollback.

About 2 hours later (not on the graph) I stopped mastodon-*service and PostgreSQL.

After that I returned the code to the state before #11623. Now it works fine.

Specifications

Equivalent to master branch 4ef8d8b ( 6aefe60 )

@Gargron
Copy link
Member

Gargron commented Aug 23, 2019

Did you by any change try to EXPLAIN the LTL SQL query? Maybe CREATE INDEX CONCURRENTLY tmp_statuses_kept ON statuses (deleted_at) would help...

@Gargron
Copy link
Member

Gargron commented Aug 23, 2019

It seems like the federated timeline works as fast as before... It is specifically the query for the local timeline that is much slower...

Local timeline query:

Limit  (cost=39150.63..39150.68 rows=20 width=396) (actual time=6520.485..6520.489 rows=20 loops=1)
  ->  Sort  (cost=39150.63..39150.84 rows=84 width=396) (actual time=6520.484..6520.486 rows=20 loops=1)
        Sort Key: statuses.id DESC
        Sort Method: top-N heapsort  Memory: 33kB
        ->  Nested Loop Left Join  (cost=0.98..39148.39 rows=84 width=396) (actual time=0.332..6515.107 rows=16240 loops=1)
              Filter: (accounts.silenced_at IS NULL)
              Rows Removed by Filter: 544
              ->  Index Scan using index_statuses_20190820 on statuses  (cost=0.56..38891.15 rows=101 width=396) (actual time=0.326..6476.242 rows=16784 loops=1)
                    Index Cond: (visibility = 0)
                    Filter: ((local OR (uri IS NULL)) AND (reblog_of_id IS NULL) AND ((NOT reply) OR (in_reply_to_account_id = account_id)) AND (account_id <> ALL ('{61919,125171,111310,77076,100948,103833,1633,1181,505,19662,7254,12537,12749,884,3033,645}'::bigint[])))
                    Rows Removed by Filter: 7290950
              ->  Index Scan using index_accounts_on_id on accounts  (cost=0.42..2.54 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=16784)
                    Index Cond: (id = statuses.account_id)
Planning time: 0.232 ms
Execution time: 6520.514 ms

Compared to federated timeline query:

Limit  (cost=0.85..1214.70 rows=20 width=396) (actual time=0.068..0.854 rows=20 loops=1)
  ->  Nested Loop Left Join  (cost=0.85..891023.42 rows=14681 width=396) (actual time=0.067..0.851 rows=20 loops=1)
        Filter: (accounts.silenced_at IS NULL)
        Rows Removed by Filter: 29
        ->  Index Scan Backward using statuses_pkey on statuses  (cost=0.43..868835.08 rows=17692 width=396) (actual time=0.052..0.395 rows=49 loops=1)
              Filter: ((reblog_of_id IS NULL) AND (deleted_at IS NULL) AND ((NOT reply) OR (in_reply_to_account_id = account_id)) AND (visibility = 0) AND (account_id <> ALL ('{61919,125171,111310,77076,100948,103833,1633,1181,505,19662,7254,12537,12749,884,3033,645}'::bigint[])))
              Rows Removed by Filter: 84
        ->  Index Scan using index_accounts_on_id on accounts  (cost=0.42..1.24 rows=1 width=16) (actual time=0.008..0.009 rows=1 loops=49)
              Index Cond: (id = statuses.account_id)
Planning time: 0.784 ms
Execution time: 0.882 ms

@Gargron
Copy link
Member

Gargron commented Aug 23, 2019

The follow index reduces the query time back to 0.133 ms:

add_index :statuses, [:id, :account_id], name: :index_statuses_on_local, where: '(local OR (uri IS NULL)) AND deleted_at IS NULL AND visibility = 0 AND reblog_of_id IS NULL AND ((NOT reply) OR (in_reply_to_account_id = account_id))', algorithm: :concurrently, order: { id: :desc }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants