Skip to content

Add the ability to provide "image based recovery" for setting up new environments#394

Open
simonpane wants to merge 1 commit intogoogle:masterfrom
pythian:rel/oratk55-42
Open

Add the ability to provide "image based recovery" for setting up new environments#394
simonpane wants to merge 1 commit intogoogle:masterfrom
pythian:rel/oratk55-42

Conversation

@simonpane
Copy link
Copy Markdown
Collaborator

Change Description:

The PR includes changes required to support the provisioning of new systems from storage level snapshot clones which may be applicable for a number of different possible use cases. Mostly with the aim of reducing installation and configuration time.

Solution Overview:

System configuration, software installation and patching, and database creation times can be minimized by building new environments from physical storage/disk clones. The toolkit can be used to re-configure Oracle Restart and databases on cloned storage by specifying the new --disk-snapshot-clone command line argument to the install-oracle.sh script.

NOTE: Google Cloud Machine images can not be used as they are incompatible with any attached Hyperdisk volumes. See limitations and restrictions.

In addition to the required enhancements for reconfiguring cloned Oracle software and/or databases, this PR includes:

  • Breaking reusable sets of tasks out of the large active-copy.yml task file into several new, smaller, and more purpose oriented task files - most of which to be called from several other current or upcoming tasks. Specifically, the new task files include: oracle-net-setup.yml, password-file-backup.yml, password-file-copy.yml.
  • A new db-state role for starting/stopping the Oracle database, putting the database in to or taking it out of backup mode, and setting the database's auto-start configuration. This role is called by tasks from the existing plays but can also be used independently through the new calling db-state.sh shell script. (Additional details below.)

Possible Customer Use Cases:

  1. Customers can build a "golden image" of the entire system including database data (e.g. a seed version of the application data) and then use snapshot restores to restore the entire system without needing to install the software or create the database. Similar to a "machine image" although machine images do not support hyperdisks.
  2. Customers can build a new system, with a patched or upgraded OS, then snapshot and copy all non-boot volumes to the patched OS.
  3. Customers can prepare snapshot images of the patched Oracle software (no-DB created). Later to be used to provision new machines saving the software installation and patching time before creating the database.
  4. Customers can "swing" an existing database from one VM server of a lower Oracle software release to another of an equal or higher software release. (Certain restrictions/caveats will apply to this option, as described in detail below.)


Using the Toolkit with Disk Clones

New Shell Script and Ansible Role db-state

Creating disk storage snapshots of Oracle Database systems often require changing the database state beforehand. For example, the database may have to be stopped or put into "backup mode" prior to taking snapshots and re-started or taken out of "backup mode" once the snapshots have been taken.

To facilitate this, a new shell script called db-state.sh and a new Ansible role, also called db-state, have been added.

The db-state.sh script requires at least one of the (self-explanatory) desired state argument:

$ ./db-state.sh
Command used:
./db-state.sh

Please specify at least one action: --start-database, --stop-database, --enable-backup-mode, --disable-backup-mode, --enable-autostart, --disable-autostart

Additionally, an inventory file specifying the managed host target is mandatory:

$ ./db-state.sh --enable-backup-mode
Command used:
./db-state.sh --enable-backup-mode

Please specify the inventory file using --inventory-file <file_name>

Example typical workflow:

./db-state.sh --inventory-file inventory_files/inventory_db-server19c_ORCL --disable-autostart --enable-backup-mode
< run steps to take snapshot of source disks>
./db-state.sh --inventory-file inventory_files/inventory_db-server19c_ORCL --enable-autostart --disable-backup-mode

Or similarly:

./db-state.sh --inventory-file inventory_files/inventory_db-server19c_ORCL --disable-autostart --stop-database
< run steps to take snapshot of source disks>
./db-state.sh --inventory-file inventory_files/inventory_db-server19c_ORCL --enable-autostart --start-database

IMPORTANT: When taking a snapshot of a boot device for a system that does not use Grid Infrastructure & Oracle Restart, be sure to set the database startup to N in the /etc/oratab file prior to taking the online or offline disk snapshot. This is critical to prevent an inadvertent startup on a cloned system. The db-start.sh can, optionally, be used to make this change as shown in the previous examples.

Toolkit Usage when using Image Clones

A new command line argument to the installation shell script --disk-snapshot-clone is essential when working with systems restored from disk snapshots. Other arguments (preexisting) may also be required depending on the restoration scenario.

Case 1: Restoring a system where all disks, including the boot device, were restored from snapshot clones:

The toolkit supports cloning all disks (which would include the boot device, disks where the Oracle software is installed, and disks containing the Oracle database files) to a new VM instance:

2

This may be useful in situations where you want to create a "golden image" of an entire system, including the OS boot device, the Oracle software, and the Oracle database (for example with initial or seed data), and restore that to another system at a later date.

Regardless of whether the restored system uses ASM or not, some degree of software reconfiguration (such as updating the host name in Oracle Net files for example) will be required as the new server name likely differs from that from which the disk snapshots were created.

Run the toolkit using the --disk-snapshot-clone argument at a minimum. Example:

./install-oracle.sh \
  --instance-ip-addr 10.0.1.100 \
  --ora-version 19 \
  --ora-swlib-bucket gs://[storage_bucket] \
  --ora-data-mounts data_mounts_config.json \
  --ora-asm-disks asm_disk_config.json \
  --backup-dest "+RECO" \
  --disk-snapshot-clone

IMPORTANT: When used with --disk-snapshot-clone the toolkit will not re-install the Oracle software or re-create the database. It only takes the steps necessary to reconfigure the system as needed. Which may include reconfiguring HAS in the base of Grid Infrastructure included installations

If confident that no OS adjustments are required (as the boot partition was cloned), then optionally limit the toolkit's tasks using the the --install-sw and --config-db arguments:

./install-oracle.sh \
  --instance-ip-addr 10.0.1.100 \
  --ora-version 19 \
  --ora-swlib-bucket gs://[storage_bucket] \
  --ora-data-mounts data_mounts_config.json \
  --ora-asm-disks asm_disk_config.json \
  --backup-dest "+RECO" \
  --disk-snapshot-clone --install-sw --config-db

Case 2: Restoring a system where all disks, excluding the boot device, were restored from snapshot clones:

It may be desirable to attach all Oracle software disks, and Oracle database data disks to an already built VM instance. For example, in the case when cloning to an upgraded or patched OS is required:

2

Unlike with the previous scenario, when the boot device is not restored and consequently has no Oracle specific configuration, then the toolkit must be run as normal but with the inclusion of the --disk-snapshot-clone argument:

./install-oracle.sh \
  --instance-ip-addr 10.0.1.100 \
  --ora-version 19 \
  --ora-swlib-bucket gs://[storage_bucket] \
  --ora-data-mounts data_mounts_config.json \
  --ora-asm-disks asm_disk_config.json \
  --backup-dest "+RECO" \
  --disk-snapshot-clone

Again, no special steps are required depending on whether the snapshot source was from a an Oracle Database that was shutdown (aka "offline" or "cold" backup) or in backup mode (aka "online" or "hot" backup).

Case 3: Restoring a system from a snapshot that included the software installation only, and no database

To save the time required to install and patch the Oracle software, only a snapshot clone of the boot disk and the Oracle software disk(s) can be restored to a new VM:

3A

Or, if desired, only the Oracle software disk(s) can be cloned from a snapshot and attached to an existing VM with a pre-configured boot disk and OS:

3B

In such scenarios, similar to the previous cases, the toolkit can be used to re-configure the Oracle software installation by using the --disk-snapshot-clone argument. However, since no Oracle Database exists on the restored system, this should be done including the --install-sw argument to limit the toolkit's scope. For example:

./install-oracle.sh \
  --instance-ip-addr 10.0.1.100 \
  --ora-version 19 \
  --ora-swlib-bucket gs://[storage_bucket] \
  --ora-data-mounts data_mounts_config.json \
  --ora-asm-disks asm_disk_config.json \
  --backup-dest "+RECO" \
  --disk-snapshot-clone --install-sw

If the system was built by attaching the disk with the Oracle software (again, without a database), but without also including the boot device, then simply also include the --prep-host argument:

./install-oracle.sh \
  --instance-ip-addr 10.0.1.100 \
  --ora-version 19 \
  --ora-swlib-bucket gs://[storage_bucket] \
  --ora-data-mounts data_mounts_config.json \
  --ora-asm-disks asm_disk_config.json \
  --backup-dest "+RECO" \
  --disk-snapshot-clone --prep-host --install-sw

After the software has been reconfigured for the new host, the rest of the toolkit can be run as per normal and critically, without the --disk-snapshot-clone argument. For example:

./install-oracle.sh \
  --instance-ip-addr 10.0.1.100 \
  --ora-version 19 \
  --ora-swlib-bucket gs://[storage_bucket] \
  --ora-data-mounts data_mounts_config.json \
  --ora-asm-disks asm_disk_config.json \
  --backup-dest "+RECO" \
  --config-db

Case 4: Restoring only Oracle database disks

A more advanced scenario involves "swinging" an Oracle database from one server to another. This can be accomplished by snapshotting only the disk devices with Oracle database files and then cloning/attaching them to another server with a pre-installed OS and an already installed Oracle Home (software).

The toolkit can of course be used to configure the OS and install and patch the required Oracle software on the target server (e.g. using the --skip-database-config argument) in advance.

3B

A number of limitations, cautions, or caveats must be considered with this more advanced workflow:

  1. Only clone the database to a new system of the same major Oracle software release (e.g. Oracle Database 19c or 26ai). Otherwise a database upgrade will be required.
  2. The Oracle software on the target system should be at the same, or at a higher patch level (e.g. cloning an Oracle 19.3 database to a 19.26 patch-level Oracle Home).
  3. Consider the location of the database's parameter file or server parameter file (spfile). If the parameter file is not part of the cloned disk devices, it may need to be manually copied or re-created on the target server. Copying or re-creating the parameter file in this scenario is not performed by the toolkit.
  4. Consider the location and names of the database control files. The database's parameter file may need to be updated to point to the correct location or file names of the database's control files on the cloned and attached devices. Updating the parameter file to accurately list the names of the control files is not performed by the toolkit.
  5. Before taking snapshots of the disk devices for the source database, the database should be shutdown to prevent version inconsistency between multiplexed control files. Even if the database is in "backup mode", control files can become inconsistent since storage devices/disks are typically snapshotted serially without any guaranteed inter-snapshot consistency. Creating and restoring from a control file backup to prevent inconsistent versions between control files is currently not performed by the toolkit.
  6. Before taking snapshots of the disk devices for the source database, database auto-start should be disabled to prevent it from automatically (inadvertently) starting on the cloned and restored system prior to making any required adjustments such as the aforementioned. Disabling auto-start can optionally be achieved by using the db-state.sh script with the --disable-autostart argument - this command is compatible for both databases with and without Oracle Restart management.

Once these conditions have all been satisfied, the toolkit can be run with the --config-db option to complete the configuration. For example:

./install-oracle.sh \
  --instance-ip-addr 10.0.1.100 \
  --ora-version 19 \
  --ora-swlib-bucket gs://[storage_bucket] \
  --ora-data-mounts data_mounts_config.json \
  --ora-asm-disks asm_disk_config.json \
  --backup-dest "+RECO" \
  --config-db

NOTE: After cloning a database of a lower patch level to an Oracle home of a higher patch level, the datapatch utility can be used to upgrade the database ("SQL update") to match the version of the software binaries. Running datapatch requires downtime and can run for some time. And consequently it may be desirable to perform this during a scheduled outage window in the future.

Running datapatch automatically on a cloned system is not performed by the toolkit.

@simonpane simonpane self-assigned this Dec 19, 2025
@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: simonpane
Once this PR has been reviewed and has the lgtm label, please assign mfielding for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@simonpane simonpane requested review from AlexBasinov and mfielding and removed request for mfielding December 19, 2025 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant