Put Postgres import and sql commands into a single script #124

ijabz · 2020-09-06T11:47:50Z

Put the following into a script since you need to do all every time to properly import discogs data into the database. I run importcsv.dir with nohup, it takes some time so when I find it has completed I dont want then have to create the keys/indexes ectera and wait some more time.

Would need to check previous step has worked okay before moving to next step.

python3 postgresql/psql.py < postgresql/sql/CreateTables.sql
python3 postgresql/importcsv.py /csvdir/*
python3 postgresql/psql.py < postgresql/sql/CreatePrimaryKeys.sql
python3 postgresql/psql.py < postgresql/sql/CreateFKConstraints.sql
python3 postgresql/psql.py < postgresql/sql/CreateIndexes.sql

philipmat · 2020-09-06T13:40:57Z

I have some concerns as to how having the indexes and keys present in the database before the import affects the speed of the import.

In most databases it tends to be slower, even in bulk load, because:

every row has to be PK-checked, FK-checked, before insert; failure of a FK check would result in the entire import failing;
having the FKs in place means that the files need to be loaded in a precise order, with the files providing the FK first, and the files using the FK last;
after every row or row-threshold insert the indexes may have to be rebuilt, slowing the whole process down (rebuilding at the end is a single table scan)

By contrast, the current solution leaves a DB full of data at the end of the csv import; optimization (PK, FK, indexes) is then left at the latitude of the users.

All that aside, could you try the following two scenarios and see if the outcome is acceptable?

In a script:

Scenario A

Run CreateTables.sql
Import csv
Run the rest of the sql scripts

Scenario B

Run all sql scripts
Import csv

Run each script of the scripts above with the time command and if the results from scenario B are positive and faster than A, I'd be happy to start looking into it.

ijabz · 2020-09-06T15:48:01Z

Hi, if you look at my scripts you'll see indexes etc are created AFTER the import. Im just suggesting the user should be able to run one command that creates db tables, import and then creates indexes, keys and constraints, rather than having to run 5 different commands.

ijabz · 2020-09-06T15:49:37Z

--- is then left at the latitude of the users.
There maybe cases where they want to change this, but for most I cannot see why they would not want to run add keys and indexes after the db import, otherwise the db will be almost unusable.

philipmat added the PR plz label Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Put Postgres import and sql commands into a single script #124

Put Postgres import and sql commands into a single script #124

ijabz commented Sep 6, 2020

philipmat commented Sep 6, 2020

ijabz commented Sep 6, 2020

ijabz commented Sep 6, 2020 •

edited

Loading

Put Postgres import and sql commands into a single script #124

Put Postgres import and sql commands into a single script #124

Comments

ijabz commented Sep 6, 2020

philipmat commented Sep 6, 2020

ijabz commented Sep 6, 2020

ijabz commented Sep 6, 2020 • edited Loading

ijabz commented Sep 6, 2020 •

edited

Loading