forked from greenplum-db/gpupgrade-archive
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functional pipeline UI #8
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It's very important to make sure the master and standby master are synchronized before running Initialize and Execute. This is because Revert will rsync the standby master data directory onto the master data directory which can cause severe master corruption if the standby master not up-to-date. We have this check for GPDB 6X+ but it was seemingly missed for 5X because the catalog view gp_stat_replication does not exist on 5X. However, we're only checking the master and standby master replication status so there's no need to use the cluster-wide gp_stat_replication view; we can simply use pg_stat_replication which will only report the replication status of the individual segment (in this case, just the master segment).
We already check that the cluster is synchronized when creating the source cluster config file at the beginning of Initialize. However, the state of the source cluster could have changed any time afterwards between that early Initialize step and upgrading the master segment in Execute. If the cluster was out-of-sync and we run Revert after Execute, there is potential for severe catalog corruption. In particular, this is more likely to happen to the master and standby master since there's a GPDB-specific mechanism to bypass the synchronous WAL replication when the standby master is not responsive. To prevent this, we add a quick sanity check for cluster synchronization before the Initialize pg_upgrade checks and more importantly before doing the actual upgrade operation in Execute.
Pass the DUMP_PATH environment variable with the path to the dump file when making the functional pipeline such as `make DUMP_PATH=dump/5X/dump.sql.xz functional-pipeline`
Set max_statement_mem to be half of the total memory, and set statement_mem to be 1/16th of the max_statement_mem. For an n2-standard-16 instnace with 64GB of memory this sets max_statement_mem to 32GB and statement_mem to 2GB.
Ensure resource filenames are unique with git branch name to avoid collisions when multiple pipelines are run. Specify BranchName and resource filenames during pipeline creation since the Concourse GCS resource does not guarantee pulling the latest resource. See issue frodenas/gcs-resource#47
When printing the duration just pass string rather than the timer object to keep things simple. No need for the function to have access to the entire object when it's not needed.
Remove AddClusters which is no longer used. Rename add_clusters.go to for clarity.
To be consistent with the online documentation and internal naming use --upgrade-id when running `gpupgrade config show --upgrade-id`.
Keep it simple! Since upgrade ID is only ever used as a string there is no reason not to just have it be that type. It also allows us to serialize it in config.json as a string rather than a random number making it easy to correlate and debug on customer systems.
Add yearMonthDay to filename format.
Show coverage in unit and integration output.
Need to wait for "both" the work being done and for the progress bars being incremented (ie: rendered). The progress bar is tied to the number of seed scripts executed rather than the number of actual generated scripts written. This is because the progress bar needs to know up front how many units of work to do. It makes sense to increment the bar right after the seed scripts are executed since this is what the actual bar is tied to. The previous implementation had that. However, there is a small section of code in which the bar is incremented before the generated script is fully written to. This may show the bar being completed even though there is still a little work to do in writing the file. In reality this is negligible since for even extremely large scripts the file is written very fast on current systems. Given all of the above some people find it easier to increment the bar after the script was written even though this is technically not correct. Given the way the code is structured this results in the bar being incremented in two places: 1) once after the seed scripts are run and there is no generated script output to write, and 2) after the seed scripts are run and after the generated script is written. So it is a tradeoff of having the increment scattered in two places versus a negligible scenario where the progress bar could show being completed while the file is being written.
Consolidate all (non-unit) test files into the test directory. The motivation is that there are soon to be golang acceptance tests to slowly replace the BATs tests. As we were working on where to best place the upcoming golang acceptance tests the organization of the existing integration tests were discovered. Co-authored-by: Kevin Yeap <kyeap@vmware.com>
Previously, we were using a passed constraint on the terraform resource such as: ``` - get: terraform passed: [ generate-cluster ] ``` However, for long running jobs 3+ days it appears Concourse would expire or roll off the specific terraform resource associated with the generate-cluster job. This would cause the destroy jobs to hang while "waiting for a suitable set of input versions": `terraform - no satisfiable builds from passed jobs found for set of inputs` Thus, pass the saved_cluster_env_files from GCP to the ccp_destroy task to avoid this issue.
The previous version of golangci/golangci-lint-action@v2 was showing the following warning: Warning: The `save-state` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
For extra validation post-upgrade for end-to-end tests run gpcheckcat. This is in addition to validating the standby and mirrors post-upgrade by failing over and failing back. Note, we do not run gpcheckcat or validate mirrors after the pg_upgrade acceptance tests or multi-host gpupgrade acceptance tests. This would require leaving the cluster intact after running the tests which is cumbersome during dev iteration and upon failures.
The timestamp in the log prefix was not being updated. The logger from the go standard library sets the static prefix during initialization, and does not call it before each log message. Thus, the timestamp is fixed. Use the built in standard log timestamp. Update log prefix to: 2023/06/07 14:44:11 gpupgrade:gpadmin:sdw-1:161174 [INFO]: State directory /home/gpadmin/.gpupgrade already present. Skipping. from: 20230607:14:44:11 gpupgrade:gpadmin:sdw-1:161174-[INFO]:-State directory /home/gpadmin/.gpupgrade already present. Skipping.
Tsquery types cannot be part of a distribution key. When generating ALTER statements that modify tsquery datatype to text datatype there is no need to check if the tsquery column is being used as a distribution key. This was likely leftover code when gen_fix_tsquery_to_text.sql was copied from the script that handled deprecated name types.
Revert and finalize data migration scripts were generating duplicate tsquery partition table indexes. Indexes on partition tables using tsquery fall under the jurisdiction of two different data migration scripts. 1. recreate_partition_indexes_step 2. recreate_indexes_on_deprecated_built_in_types To fix this, recreate_partition_indexes_step_1.sql will no longer generate CREATE INDEX statements for tsquery partition tables. This was decided based on the following considerations 1. recreate_indexes_on_deprecated_built_in_types is a script with a narrower scope. 2. We do not want to create a script order dependency where a CREATE INDEX statement in recreate_partition_indexes_step depends on an ALTER statement in the corresponding gen_fix_tsquery_to_text/gen_change_tsquery_to_text
Altering a column type when the column is a partition key is not allowed. For this reason, intialize migration scripts for deprecated tsquery type does not generate an ALTER statement for tables partitioned on a tsquery column. Tables that are partitioned on a tsquery column should also not generate an ALTER statement for revert and finalize migration scripts. When recreating indexes on partition tables during revert and finalize, dropped child partition indexes are also brought back. This is a known behavior that has not been addressed. Tsquery partition indexes recreation is not immune to this behavior. Commit where tsquery ALTER statements were disabled in intialize for tables partitioned on a tsquery column: 1c455b8
Update the name parameter to Run Tests for clarity when viewing in the github actions page.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.