Functional pipeline UI #8

kalensk · 2023-06-10T01:32:18Z

No description provided.

It's very important to make sure the master and standby master are synchronized before running Initialize and Execute. This is because Revert will rsync the standby master data directory onto the master data directory which can cause severe master corruption if the standby master not up-to-date. We have this check for GPDB 6X+ but it was seemingly missed for 5X because the catalog view gp_stat_replication does not exist on 5X. However, we're only checking the master and standby master replication status so there's no need to use the cluster-wide gp_stat_replication view; we can simply use pg_stat_replication which will only report the replication status of the individual segment (in this case, just the master segment).

We already check that the cluster is synchronized when creating the source cluster config file at the beginning of Initialize. However, the state of the source cluster could have changed any time afterwards between that early Initialize step and upgrading the master segment in Execute. If the cluster was out-of-sync and we run Revert after Execute, there is potential for severe catalog corruption. In particular, this is more likely to happen to the master and standby master since there's a GPDB-specific mechanism to bypass the synchronous WAL replication when the standby master is not responsive. To prevent this, we add a quick sanity check for cluster synchronization before the Initialize pg_upgrade checks and more importantly before doing the actual upgrade operation in Execute.

Pass the DUMP_PATH environment variable with the path to the dump file when making the functional pipeline such as `make DUMP_PATH=dump/5X/dump.sql.xz functional-pipeline`

Set max_statement_mem to be half of the total memory, and set statement_mem to be 1/16th of the max_statement_mem. For an n2-standard-16 instnace with 64GB of memory this sets max_statement_mem to 32GB and statement_mem to 2GB.

Ensure resource filenames are unique with git branch name to avoid collisions when multiple pipelines are run. Specify BranchName and resource filenames during pipeline creation since the Concourse GCS resource does not guarantee pulling the latest resource. See issue frodenas/gcs-resource#47

When printing the duration just pass string rather than the timer object to keep things simple. No need for the function to have access to the entire object when it's not needed.

Remove AddClusters which is no longer used. Rename add_clusters.go to for clarity.

To be consistent with the online documentation and internal naming use --upgrade-id when running `gpupgrade config show --upgrade-id`.

Keep it simple! Since upgrade ID is only ever used as a string there is no reason not to just have it be that type. It also allows us to serialize it in config.json as a string rather than a random number making it easy to correlate and debug on customer systems.

Add yearMonthDay to filename format.

Show coverage in unit and integration output.

Need to wait for "both" the work being done and for the progress bars being incremented (ie: rendered). The progress bar is tied to the number of seed scripts executed rather than the number of actual generated scripts written. This is because the progress bar needs to know up front how many units of work to do. It makes sense to increment the bar right after the seed scripts are executed since this is what the actual bar is tied to. The previous implementation had that. However, there is a small section of code in which the bar is incremented before the generated script is fully written to. This may show the bar being completed even though there is still a little work to do in writing the file. In reality this is negligible since for even extremely large scripts the file is written very fast on current systems. Given all of the above some people find it easier to increment the bar after the script was written even though this is technically not correct. Given the way the code is structured this results in the bar being incremented in two places: 1) once after the seed scripts are run and there is no generated script output to write, and 2) after the seed scripts are run and after the generated script is written. So it is a tradeoff of having the increment scattered in two places versus a negligible scenario where the progress bar could show being completed while the file is being written.

Consolidate all (non-unit) test files into the test directory. The motivation is that there are soon to be golang acceptance tests to slowly replace the BATs tests. As we were working on where to best place the upcoming golang acceptance tests the organization of the existing integration tests were discovered. Co-authored-by: Kevin Yeap <kyeap@vmware.com>

Previously, we were using a passed constraint on the terraform resource such as: ``` - get: terraform passed: [ generate-cluster ] ``` However, for long running jobs 3+ days it appears Concourse would expire or roll off the specific terraform resource associated with the generate-cluster job. This would cause the destroy jobs to hang while "waiting for a suitable set of input versions": `terraform - no satisfiable builds from passed jobs found for set of inputs` Thus, pass the saved_cluster_env_files from GCP to the ccp_destroy task to avoid this issue.

The previous version of golangci/golangci-lint-action@v2 was showing the following warning: Warning: The `save-state` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

For extra validation post-upgrade for end-to-end tests run gpcheckcat. This is in addition to validating the standby and mirrors post-upgrade by failing over and failing back. Note, we do not run gpcheckcat or validate mirrors after the pg_upgrade acceptance tests or multi-host gpupgrade acceptance tests. This would require leaving the cluster intact after running the tests which is cumbersome during dev iteration and upon failures.

The timestamp in the log prefix was not being updated. The logger from the go standard library sets the static prefix during initialization, and does not call it before each log message. Thus, the timestamp is fixed. Use the built in standard log timestamp. Update log prefix to: 2023/06/07 14:44:11 gpupgrade:gpadmin:sdw-1:161174 [INFO]: State directory /home/gpadmin/.gpupgrade already present. Skipping. from: 20230607:14:44:11 gpupgrade:gpadmin:sdw-1:161174-[INFO]:-State directory /home/gpadmin/.gpupgrade already present. Skipping.

Tsquery types cannot be part of a distribution key. When generating ALTER statements that modify tsquery datatype to text datatype there is no need to check if the tsquery column is being used as a distribution key. This was likely leftover code when gen_fix_tsquery_to_text.sql was copied from the script that handled deprecated name types.

Revert and finalize data migration scripts were generating duplicate tsquery partition table indexes. Indexes on partition tables using tsquery fall under the jurisdiction of two different data migration scripts. 1. recreate_partition_indexes_step 2. recreate_indexes_on_deprecated_built_in_types To fix this, recreate_partition_indexes_step_1.sql will no longer generate CREATE INDEX statements for tsquery partition tables. This was decided based on the following considerations 1. recreate_indexes_on_deprecated_built_in_types is a script with a narrower scope. 2. We do not want to create a script order dependency where a CREATE INDEX statement in recreate_partition_indexes_step depends on an ALTER statement in the corresponding gen_fix_tsquery_to_text/gen_change_tsquery_to_text

Altering a column type when the column is a partition key is not allowed. For this reason, intialize migration scripts for deprecated tsquery type does not generate an ALTER statement for tables partitioned on a tsquery column. Tables that are partitioned on a tsquery column should also not generate an ALTER statement for revert and finalize migration scripts. When recreating indexes on partition tables during revert and finalize, dropped child partition indexes are also brought back. This is a known behavior that has not been addressed. Tsquery partition indexes recreation is not immune to this behavior. Commit where tsquery ALTER statements were disabled in intialize for tables partitioned on a tsquery column: 1c455b8

Update the name parameter to Run Tests for clarity when viewing in the github actions page.

jimmyyih and others added 27 commits May 23, 2023 14:19

don't use debug builds for functional testing

197f9b0

collect stats for No. of User Defined Types

fe64565

functional pipeline: bump instance type and disk size

283a24a

specify DUMP_PATH when making functional pipeline

aa529e5

Pass the DUMP_PATH environment variable with the path to the dump file when making the functional pipeline such as `make DUMP_PATH=dump/5X/dump.sql.xz functional-pipeline`

functional pipeline: set GUCs

7c420b0

Set max_statement_mem to be half of the total memory, and set statement_mem to be 1/16th of the max_statement_mem. For an n2-standard-16 instnace with 64GB of memory this sets max_statement_mem to 32GB and statement_mem to 2GB.

pass duration string instead of timer

4a39471

When printing the duration just pass string rather than the timer object to keep things simple. No need for the function to have access to the entire object when it's not needed.

remove AddClusters dead code

69f3557

Remove AddClusters which is no longer used. Rename add_clusters.go to for clarity.

rename config show --id to --upgrade-id

864d0a2

To be consistent with the online documentation and internal naming use --upgrade-id when running `gpupgrade config show --upgrade-id`.

tweak functional README

0a671c2

Add yearMonthDay to filename format.

add --cover to make unit and make integration

a06af3b

Show coverage in unit and integration output.

output labels for number of UDFs and Types

9240ee4

Remove unused test_role2

3ce94f8

rename actions.yml to run_tests.yml

e246a67

Update the name parameter to Run Tests for clarity when viewing in the github actions page.

add functional pipeline github action

136f211

kalensk merged commit 0f5f120 into main Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functional pipeline UI #8

Functional pipeline UI #8

kalensk commented Jun 10, 2023

Functional pipeline UI #8

Functional pipeline UI #8

Conversation

kalensk commented Jun 10, 2023