Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix parallel test error on update #246

Merged
merged 17 commits into from
Dec 17, 2023

Conversation

ezra-varady
Copy link
Collaborator

@ezra-varady ezra-varady commented Dec 12, 2023

closes #226
This is a workspace to try solutions to #226. Apologies if this is spamming test result notifications

EDIT: it looks like the reindexing operations from the version update scripts are getting rolled into the regression tests. I'm pretty sure this is because the update is done in the test runner which is run with each individual test in the group. The original error comes from multiple parallel tests (runners) trying to reindex the same tables at once. Additionally for some reason if only one test is run the output from the reindexing process is rolled into the regression test. I don't know exactly how pg_regress handles outputs from the test runner and the test itself, but it feels plausible to me that it merges them.

@ezra-varady ezra-varady marked this pull request as ready for review December 13, 2023 08:32
@ezra-varady
Copy link
Collaborator Author

I think this should solve the issue now. I didn't modify the workflow, I wasn't sure if we wanted the check to fail if the update tests fail. Otherwise I think this is ready to merge

@var77
Copy link
Collaborator

var77 commented Dec 14, 2023

Looks good!
The update tests are failing with crash, I think I should have fixed this issue. Can you try to rebase with main and see if it still crash? If the tests will be okay we can make the workflow fail if the update tests will fail.

@ezra-varady
Copy link
Collaborator Author

ezra-varady commented Dec 14, 2023

The branch is up to date with main, I just pushed changes to make the workflow fail when update tests fail. If you have logs of the issue I can take a look, it sounds like I may still have some bugs to work through

EDIT: ah I'm seeing it now, maybe it's related to the DO block I introduced

@var77
Copy link
Collaborator

var77 commented Dec 14, 2023

The branch is up to date with main, I just pushed changes to make the workflow fail when update tests fail. If you have logs of the issue I can take a look, it sounds like I may still have some bugs to work through

EDIT: ah I'm seeing it now, maybe it's related to the DO block I introduced

Actually I was having that issue when the to_tag and from_tag was the same, I am not sure if this is the case here or not.

@ezra-varady
Copy link
Collaborator Author

I can recreate this locally, I'm not totally sure what the issue is, it seems like there are maybe the update scripts aren't compatible with older versions of postgres. The crash seems to come from violating an assertion in the scan. It also seems like the concurrent update may not have been the reindex but something else in the script which is probably good news. Relevant section of logs below

2023-12-14 19:27:41.020 UTC [2573] ezra@ldb_parallel ERROR:  extension "pageinspect" already exists
2023-12-14 19:27:41.020 UTC [2573] ezra@ldb_parallel STATEMENT:  CREATE EXTENSION pageinspect;
2023-12-14 19:27:41.037 UTC [2577] ezra@ldb_parallel ERROR:  extension "pageinspect" already exists
2023-12-14 19:27:41.037 UTC [2577] ezra@ldb_parallel STATEMENT:  CREATE EXTENSION pageinspect;
2023-12-14 19:27:41.038 UTC [2578] ezra@ldb_parallel ERROR:  extension "pageinspect" already exists
2023-12-14 19:27:41.038 UTC [2578] ezra@ldb_parallel STATEMENT:  CREATE EXTENSION pageinspect;
2023-12-14 19:27:41.047 UTC [2577] ezra@ldb_parallel ERROR:  tuple concurrently updated
2023-12-14 19:27:41.047 UTC [2577] ezra@ldb_parallel STATEMENT:  CREATE OR REPLACE FUNCTION ldb_get_indexes(tblname text)
        RETURNS TABLE(
            indexname name,
            size text,
            indexdef text,
            total_index_size text
        ) AS
        $BODY$
        BEGIN
            RETURN QUERY
            WITH total_size_data AS (
                SELECT
                    SUM(pg_relation_size(indexrelid)) as total_size
                FROM
                    pg_index
                WHERE
                    indisvalid
                    AND indrelid = tblname::regclass
            )
            SELECT
                idx.indexname,
                pg_size_pretty(pg_relation_size(idx.indexname::REGCLASS)) as size,
                idx.indexdef,
                pg_size_pretty(total_size_data.total_size) as total_index_size
            FROM
                pg_indexes idx,
                total_size_data
            WHERE
                idx.tablename = tblname;
        END;
        $BODY$
        LANGUAGE plpgsql;
2023-12-14 19:27:41.047 UTC [2578] ezra@ldb_parallel ERROR:  tuple concurrently updated
2023-12-14 19:27:41.047 UTC [2578] ezra@ldb_parallel STATEMENT:  CREATE OR REPLACE FUNCTION ldb_get_indexes(tblname text)
        RETURNS TABLE(
            indexname name,
            size text,
            indexdef text,
            total_index_size text
        ) AS
        $BODY$
        BEGIN
            RETURN QUERY
            WITH total_size_data AS (
                SELECT
                    SUM(pg_relation_size(indexrelid)) as total_size
                FROM
                    pg_index
                WHERE
                    indisvalid
                    AND indrelid = tblname::regclass
            )
            SELECT
                idx.indexname,
                pg_size_pretty(pg_relation_size(idx.indexname::REGCLASS)) as size,
                idx.indexdef,
                pg_size_pretty(total_size_data.total_size) as total_index_size
            FROM
                pg_indexes idx,
                total_size_data
            WHERE
                idx.tablename = tblname;
        END;
        $BODY$
        LANGUAGE plpgsql;
2023-12-14 19:27:41.061 UTC [2579] ezra@ldb_parallel ERROR:  extension "pageinspect" already exists
2023-12-14 19:27:41.061 UTC [2579] ezra@ldb_parallel STATEMENT:  CREATE EXTENSION pageinspect;
2023-12-14 19:27:42.383 UTC [2603] ezra@ldb_parallel ERROR:  function array_append(text[], name) does not exist at character 8
2023-12-14 19:27:42.383 UTC [2603] ezra@ldb_parallel HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
2023-12-14 19:27:42.383 UTC [2603] ezra@ldb_parallel QUERY:  SELECT array_append(index_names, r.indexname)
2023-12-14 19:27:42.383 UTC [2603] ezra@ldb_parallel CONTEXT:  PL/pgSQL function inline_code_block line 129 at assignment
2023-12-14 19:27:42.383 UTC [2603] ezra@ldb_parallel STATEMENT:  SET client_min_messages=error; ALTER EXTENSION lantern UPDATE TO '0.0.10';
2023-12-14 19:27:43.387 UTC [2605] ezra@ldb_parallel ERROR:  function array_append(text[], name) does not exist at character 8
2023-12-14 19:27:43.387 UTC [2605] ezra@ldb_parallel HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
2023-12-14 19:27:43.387 UTC [2605] ezra@ldb_parallel QUERY:  SELECT array_append(index_names, r.indexname)
2023-12-14 19:27:43.387 UTC [2605] ezra@ldb_parallel CONTEXT:  PL/pgSQL function inline_code_block line 129 at assignment
2023-12-14 19:27:43.387 UTC [2605] ezra@ldb_parallel STATEMENT:  SET client_min_messages=error; ALTER EXTENSION lantern UPDATE TO '0.0.10';
_parallel [local] SELECT: /home/ezra/lanterndb/src/hnsw/scan.c:48: ldb_ambeginscan: Assertion `headerp->magicNumber == LDB_WAL_MAGIC_NUMBER' failed.
2023-12-14 19:27:43.852 UTC [789] LOG:  server process (PID 2644) was terminated by signal 6: Aborted
2023-12-14 19:27:43.852 UTC [789] DETAIL:  Failed process was running: SELECT id FROM sift_base10k ORDER BY  v <-> '{21,24,5,0,0,26,22,6,16,16,10,9,0,18,114,19,13,13,9,1,2,53,111,19,39,32,5,0,4,9,10,13,6,10,8,0,2,130,77,4,2,0,0,0,3,130,130,11,130,0,0,0,0,37,130,84,130,5,0,1,17,11,4,28,17,39,3,3,30,77,28,3,20,0,0,1,49,125,13,7,130,6,0,0,0,5,11,61,130,2,0,1,12,84,48,73,1,12,2,0,31,57,9,2,16,12,1,0,32,36,0,1,63,6,3,1,0,0,24,51,9,0,0,0,0,44,88,48}'  ASC LIMIT 1;
2023-12-14 19:27:43.852 UTC [789] LOG:  terminating any other active server processes
2023-12-14 19:27:43.853 UTC [2626] ezra@ldb_parallel WARNING:  terminating connection because of crash of another server process

@var77
Copy link
Collaborator

var77 commented Dec 14, 2023

It seems for some reason the indices are not reindexed after the upgrade. The magicNumber is changed here . I think that assertion might fail if index is created before 0.0.8, the extension is updated to 0.0.8 (but indexes are not reindexed) and table scan is done.

@ezra-varady
Copy link
Collaborator Author

@var77 I think it should work across all the versions now

Copy link
Collaborator

@var77 var77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

left_cursor REFCURSOR;
left_row RECORD;

right_cursor REFCURSOR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is quite some repetition between this file and the common.sql in non-parallel tests. Can we somehow avoid copying and have the file in one of the two locations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. No reason we can't use one file

@Ngalstyan4 Ngalstyan4 merged commit c62fca4 into lanterndata:main Dec 17, 2023
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel tests fail on upgrades
3 participants