Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use associations to define rubygem reverse dependencies #4512

Conversation

segiddins
Copy link
Member

@segiddins segiddins commented Mar 6, 2024

This ends up speeding up the endpoint significantly due to the small differences in the generated query

Before:

rubygems_development=# EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT "rubygems".*
FROM "rubygems"
      inner join versions as v on v.rubygem_id = rubygems.id
      inner join dependencies as d on d.version_id = v.id
      INNER JOIN "gem_downloads" ON "gem_downloads"."version_id" = 0
      AND "gem_downloads"."rubygem_id" = "rubygems"."id"
WHERE (
            v.indexed = 't'
            and v.position = 0
            and d.rubygem_id = 19983
      )
ORDER BY gem_downloads.count DESC
LIMIT 30 OFFSET 0;


 Limit  (cost=1.71..6999.10 rows=30 width=43) (actual time=103.210..676.526 rows=15 loops=1)
   Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed, gem_downloads.count
   Buffers: shared hit=1756997
   ->  Nested Loop  (cost=1.71..335410.16 rows=1438 width=43) (actual time=103.199..676.513 rows=15 loops=1)
         Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed, gem_downloads.count
         Join Filter: (v.rubygem_id = gem_downloads.rubygem_id)
         Rows Removed by Join Filter: 2919150
         Buffers: shared hit=1756997
         ->  Index Scan using index_gem_downloads_on_count on public.gem_downloads  (cost=0.43..82344.40 rows=193948 width=12) (actual time=0.045..480.088 rows=194611 loops=1)
               Output: gem_downloads.id, gem_downloads.rubygem_id, gem_downloads.version_id, gem_downloads.count
               Filter: (gem_downloads.version_id = 0)
               Rows Removed by Filter: 1605653
               Buffers: shared hit=1756288
         ->  Materialize  (cost=1.28..8691.49 rows=84 width=39) (actual time=0.000..0.000 rows=15 loops=194611)
               Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed, v.rubygem_id
               Buffers: shared hit=709
               ->  Nested Loop  (cost=1.28..8691.07 rows=84 width=39) (actual time=1.087..2.204 rows=15 loops=1)
                     Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed, v.rubygem_id
                     Inner Unique: true
                     Buffers: shared hit=709
                     ->  Nested Loop  (cost=0.86..8650.10 rows=84 width=4) (actual time=1.039..1.936 rows=15 loops=1)
                           Output: v.rubygem_id
                           Inner Unique: true
                           Buffers: shared hit=649
                           ->  Index Scan using index_dependencies_on_rubygem_id on public.dependencies d  (cost=0.43..2651.85 rows=738 width=4) (actual time=0.064..0.376 rows=171 loops=1)
                                 Output: d.id, d.requirements, d.created_at, d.updated_at, d.rubygem_id, d.version_id, d.scope, d.unresolved_name
                                 Index Cond: (d.rubygem_id = 19983)
                                 Buffers: shared hit=41
                           ->  Index Scan using versions_pkey on public.versions v  (cost=0.43..8.13 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=171)
                                 Output: v.id, v.authors, v.description, v.number, v.rubygem_id, v.built_at, v.updated_at, v.summary, v.platform, v.created_at, v.indexed, v.prerelease, v."position", v.latest, v.full_name, v.licenses, v.size, v.requirements, v.required_ruby_version, v.sha256, v.metadata, v.required_rubygems_version, v.yanked_at, v.info_checksum, v.yanked_info_checksum, v.pusher_id, v.canonical_number, v.cert_chain, v.pusher_api_key_id, v.gem_platform, v.gem_full_name, v.spec_sha256
                                 Index Cond: (v.id = d.version_id)
                                 Filter: (v.indexed AND (v."position" = 0))
                                 Rows Removed by Filter: 1
                                 Buffers: shared hit=608
                     ->  Index Scan using rubygems_pkey on public.rubygems  (cost=0.42..0.49 rows=1 width=35) (actual time=0.017..0.017 rows=1 loops=15)
                           Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed
                           Index Cond: (rubygems.id = v.rubygem_id)
                           Buffers: shared hit=60
 Planning:
   Buffers: shared hit=765
 Planning Time: 9.089 ms
 Execution Time: 676.730 ms

After:

irb(main):002> puts Rubygem.find_by!(name: "chriseppstein-compass").reverse_dependencies.by_downloads.page(0).without_count.explain("analyze", "costs", "verb
ose", "buffers")
2024-03-05 18:23:23.332955 D [42334:8940 log_subscriber.rb:167] ActiveRecord::Base --   Rubygem Load (0.9ms)  SELECT "rubygems".* FROM "rubygems" WHERE "rubygems"."name" = $1 LIMIT $2  [["name", "chriseppstein-compass"], ["LIMIT", 1]]
2024-03-05 18:23:23.337780 D [42334:8940 log_subscriber.rb:167] ActiveRecord::Base --   Rubygem Load (2.3ms)  SELECT "rubygems".* FROM "rubygems" INNER JOIN "versions" ON "rubygems"."id" = "versions"."rubygem_id" INNER JOIN "dependencies" ON "versions"."id" = "dependencies"."version_id" INNER JOIN "gem_downloads" ON "gem_downloads"."version_id" = $1 AND "gem_downloads"."rubygem_id" = "rubygems"."id" WHERE "dependencies"."rubygem_id" = $2 AND "versions"."indexed" = $3 AND "versions"."position" = $4 ORDER BY gem_downloads.count DESC LIMIT $5 OFFSET $6  [["version_id", 0], ["rubygem_id", 19983], ["indexed", true], ["position", 0], ["LIMIT", 30], ["OFFSET", 0]]
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT "rubygems".* FROM "rubygems" INNER JOIN "versions" ON "rubygems"."id" = "versions"."rubygem_id" INNER JOIN "dependencies" ON "versions"."id" = "dependencies"."version_id" INNER JOIN "gem_downloads" ON "gem_downloads"."version_id" = $1 AND "gem_downloads"."rubygem_id" = "rubygems"."id" WHERE "dependencies"."rubygem_id" = $2 AND "versions"."indexed" = $3 AND "versions"."position" = $4 ORDER BY gem_downloads.count DESC LIMIT $5 OFFSET $6 [["version_id", 0], ["rubygem_id", 19983], ["indexed", true], ["position", 0], ["LIMIT", 30], ["OFFSET", 0]]
                                                                                                                                                                                                                                                                                                                                                                      QUERY PLAN

 Limit  (cost=8734.14..8734.21 rows=30 width=43) (actual time=0.439..0.443 rows=15 loops=1)
   Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed, gem_downloads.count
   Buffers: shared hit=755
   ->  Sort  (cost=8734.14..8734.35 rows=84 width=43) (actual time=0.439..0.441 rows=15 loops=1)
         Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed, gem_downloads.count
         Sort Key: gem_downloads.count DESC
         Sort Method: quicksort  Memory: 27kB
         Buffers: shared hit=755
         ->  Nested Loop  (cost=1.71..8731.65 rows=84 width=43) (actual time=0.155..0.431 rows=15 loops=1)
               Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed, gem_downloads.count
               Inner Unique: true
               Buffers: shared hit=755
               ->  Nested Loop  (cost=1.28..8691.07 rows=84 width=39) (actual time=0.150..0.391 rows=15 loops=1)
                     Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed, versions.rubygem_id
                     Inner Unique: true
                     Buffers: shared hit=709
                     ->  Nested Loop  (cost=0.86..8650.10 rows=84 width=4) (actual time=0.147..0.349 rows=15 loops=1)
                           Output: versions.rubygem_id
                           Inner Unique: true
                           Buffers: shared hit=649
                           ->  Index Scan using index_dependencies_on_rubygem_id on public.dependencies  (cost=0.43..2651.85 rows=738 width=4) (actual time=0.004..0.045 rows=171 loops=1)
                                 Output: dependencies.id, dependencies.requirements, dependencies.created_at, dependencies.updated_at, dependencies.rubygem_id, dependencies.version_id, dependencies.scope, dependencies.unresolved_name
                                 Index Cond: (dependencies.rubygem_id = 19983)
                                 Buffers: shared hit=41
                           ->  Index Scan using versions_pkey on public.versions  (cost=0.43..8.13 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=171)
                                 Output: versions.id, versions.authors, versions.description, versions.number, versions.rubygem_id, versions.built_at, versions.updated_at, versions.summary, versions.platform, versions.created_at, versions.indexed, versions.prerelease, versions."position", versions.latest, versions.full_name, versions.licenses, versions.size, versions.requirements, versions.required_ruby_version, versions.sha256, versions.metadata, versions.required_rubygems_version, versions.yanked_at, versions.info_checksum, versions.yanked_info_checksum, versions.pusher_id, versions.canonical_number, versions.cert_chain, versions.pusher_api_key_id, versions.gem_platform, versions.gem_full_name, versions.spec_sha256
                                 Index Cond: (versions.id = dependencies.version_id)
                                 Filter: (versions.indexed AND (versions."position" = 0))
                                 Rows Removed by Filter: 1
                                 Buffers: shared hit=608
                     ->  Index Scan using rubygems_pkey on public.rubygems  (cost=0.42..0.49 rows=1 width=35) (actual time=0.002..0.002 rows=1 loops=15)
                           Output: rubygems.id, rubygems.name, rubygems.created_at, rubygems.updated_at, rubygems.indexed
                           Index Cond: (rubygems.id = versions.rubygem_id)
                           Buffers: shared hit=60
               ->  Index Only Scan using index_gem_downloads_on_version_id_and_rubygem_id_and_count on public.gem_downloads  (cost=0.43..0.65 rows=4 width=12) (actual time=0.002..0.002 rows=1 loops=15)
                     Output: gem_downloads.version_id, gem_downloads.rubygem_id, gem_downloads.count
                     Index Cond: ((gem_downloads.version_id = 0) AND (gem_downloads.rubygem_id = rubygems.id))
                     Heap Fetches: 0
                     Buffers: shared hit=46
 Planning:
   Buffers: shared hit=90
 Planning Time: 1.495 ms
 Execution Time: 0.470 ms
(43 rows)

Copy link

codecov bot commented Mar 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.15%. Comparing base (f3d2cb3) to head (c2c255a).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4512      +/-   ##
==========================================
- Coverage   97.15%   97.15%   -0.01%     
==========================================
  Files         391      391              
  Lines        8261     8260       -1     
==========================================
- Hits         8026     8025       -1     
  Misses        235      235              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This ends up speeding up the endpoint significantly due to the small differences in the generated query
@segiddins segiddins force-pushed the segiddins/use-associations-to-define-rubygem-reverse-dependencies branch from 08da890 to c2c255a Compare March 6, 2024 02:20
@segiddins segiddins marked this pull request as ready for review March 9, 2024 06:53
@segiddins
Copy link
Member Author

Turns out that swapping the operands of that first ON clause gives us a 1500x performance improvement. Not bad.

@segiddins
Copy link
Member Author

image

this definitely worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants