Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New sql schema #53

Merged
merged 8 commits into from
Jul 20, 2020
Merged

New sql schema #53

merged 8 commits into from
Jul 20, 2020

Conversation

darix
Copy link
Member

@darix darix commented Jul 15, 2020

New proposed database schema

for storing file information and mirror mapping

all the mirr_* functions are not ported yet.

For the details see https://github.com/openSUSE/mirrorbrain/wiki/Roadmap

@lrupp
Copy link
Member

lrupp commented Jul 16, 2020

+1 from my side.
We started with the new Roadmap in 2019. Time to move forward...

@lrupp lrupp added the Prio1 We definitely want this. It's a major feature, and it's obvious that it would be useful to every label Jul 16, 2020
@lrupp lrupp added this to In progress in Roadmap 3.0 via automation Jul 16, 2020
@lrupp lrupp requested a review from andrii-suse July 16, 2020 08:09
@darix
Copy link
Member Author

darix commented Jul 16, 2020

the comment patch is now already in the opensuse branch so it can be ignored here.

for storing file information and mirror mapping

all the mirr_* functions are not ported yet.

For the details see https://github.com/openSUSE/mirrorbrain/wiki/Roadmap
and have consistent naming of all functions
Copy link
Collaborator

@andrii-suse andrii-suse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments are less important here, but we should build common understanding on some of the notes.

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved
sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved
sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved

CREATE TABLE filemetadata
(
id integer GENERATED ALWAYS AS IDENTITY,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest going with bigint: currently max id in db is above 900M, which is close to 50% of int capacity. So limit may be reached in few years

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved
DELETE FROM filemetadata
WHERE id IN (
SELECT filemetadata_id
FROM filemetadata_mirror_count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following query should do the same job and is much lighter, because it doesn't calculate exact counts (m.id will be NULL only for those rows, which don't have any mirror):

DELETE FROM filemetadata 
USING filemetadata AS f
LEFT OUTER JOIN mirrors AS m ON filemetadata_id = f.id
WHERE filemetadata.id = f.id AND m.id is NULL AND f.mtime < (now()-'3 months'::interval)

I know that the counts are pre-calculated anyway - but maybe we can simplify workflow and do not maintain MATERIALIZED VIEW at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain (analyze, buffers) delete from filemetadata using filemetadata AS f LEFT OUTER JOIN mirrors AS m ON filemetadata_id = f.id WHERE filemetadata_id = f.id AND  m.server_id is NULL AND f.mtime < (now()-'3 months'::interval);
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on filemetadata  (cost=1.00..1116601.47 rows=2862434 width=18) (actual time=0.638..0.638 rows=0 loops=1)
   Buffers: shared read=4
   ->  Nested Loop  (cost=1.00..1116601.47 rows=2862434 width=18) (actual time=0.638..0.638 rows=0 loops=1)
         Buffers: shared read=4
         ->  Nested Loop  (cost=1.00..16.55 rows=1 width=12) (actual time=0.637..0.637 rows=0 loops=1)
               Buffers: shared read=4
               ->  Index Scan using idx_mirrors_server_id on mirrors m  (cost=0.56..8.08 rows=1 width=10) (actual time=0.636..0.636 rows=0 loops=1)
                     Index Cond: (server_id IS NULL)
                     Buffers: shared read=4
               ->  Index Scan using pk_filemetadata on filemetadata f  (cost=0.44..8.46 rows=1 width=10) (never executed)
                     Index Cond: (id = m.filemetadata_id)
                     Filter: (mtime < (now() - '3 mons'::interval))
         ->  Seq Scan on filemetadata  (cost=0.00..888591.96 rows=22799296 width=6) (never executed)
 Planning Time: 5.143 ms
 JIT:
   Functions: 14
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1.544 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 1.544 ms
 Execution Time: 2.353 ms
(19 rows)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh it probably should be WHERE filemetadata.id instead of WHERE filemetadata_id

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved
sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved
sha1pieces bytea,
btih bytea,
pgp text,
zblocksize smallint,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use INT instead of SMALLINT, which is mileted to 32K : zsync algorithm doesn't have practical limit for block size and e.g. block size of 1M is much suitable for huge files (see #47 )

zblocksize,
zhashlens,
zsums,
encode(zsums, 'hex') AS zsumshex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need path column here as well, so the view can be queried without join

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean instead of subquery in the hash query

"WHERE file_id = (SELECT id " \
"FROM filearr " \
"WHERE path = %s) " \
we can just have WHERE path = %s if the view would have path

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darix darix requested a review from andrii-suse July 20, 2020 00:49
@darix
Copy link
Member Author

darix commented Jul 20, 2020

all your changes should be addressed

@darix darix merged commit e6fd0bb into opensuse Jul 20, 2020
Roadmap 3.0 automation moved this from In progress to Done Jul 20, 2020
@darix darix deleted the new_sql_schema branch July 21, 2020 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Prio1 We definitely want this. It's a major feature, and it's obvious that it would be useful to every
Projects
Roadmap 3.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants