Allow brute-cleanup to work with RAC #67

mfielding · 2021-04-05T20:58:25Z

brute-cleanup was originally designed for single-node, Oracle Restart installs; on RAC clusters I ran into a few issues and making a few fixes:

Making existing service shutdown and CRS deconfigure run only on non-RAC clusters (detected by the presence or absence of a cluster_name in the inventory file)
Adding a somewhat hacky RAC hard shutdown using the OHASD shutdown handler, plus rootcrs.sh. Why not use Oracle's deinstall? I'd prefer not to depend on a response file taht may or may not reflect current reality, but rather do as generic a deconfigure/shutdown as possible
Adding tfa (trace file anaalyzer) to the kill list, as well as removing the initscript itself
I ran into a case where umount failed on /u01 but lsof showed nothing in use. a lazy (umount -l) seems to resolve by simply detaching the mount, and seems to fit the context of a "brute cleanup", especially since we're going to zero out the device later anyway. Ansible's mount handler doens't support the option, so we're reverting to shell here, with the unfortunate side effect that nonexistant mounts will generated ignored error messages.

Remount /dev/shm if necessary

brute-cleanup was originally designed for single-node, Oracle Restart installs; on RAC clusters I ran into a few issues and making a few fixes: - Making existing service shutdown and CRS deconfigure run only on non-RAC clusters (detected by the presence or ansence of a cluster_name in the inventory file) - Adding a somewhat hacky RAC hard shutdown using the OHASD shutdown handler, plus rootcrs.sh. Why not use Oracle's deinstall? I'd prefer not to depend on a response file taht may or may not reflect current reality, but rather do as generic a deconfigure/shutdown as possible - Adding tfa (trace file anaalyzer) to the kill list, as well as removing the initscript itself - I ran into a case where umount failed on /u01 but lsof showed nothing in use. a lazy (umount -l) seems to resolve by simply detaching the mount, and seems to fit the context of a "brute cleanup", especially since we're going to zero out the device later anyway. Ansible's mount handler doens't support the option, so we're reverting to shell here, with the unfortuantely side effect that nonexistant mounts will generated ignored error messages.

A few more objects that can be left behind if CRS deconfig scripts fail for some reason: shared memory segments, semaphors, and kernel modules (ACFS, ASMFD, etc). In the spirit of a brute cleanup, let's yank stragglers the hard way.

Forcibly unloading oracleasm will impact the ability to identify ASM devices, so delay kernel module removal until disks and asmlib are already removed.

Using pkill instead of a pipeline to avoid error messages when these processes don't exist. And thus avoid the need for complicated return code processing.

jcnars · 2021-04-20T18:50:47Z

Tested the cleanup branch's roles/brute-ora-cleanup/tasks/main.yml file on a non-RAC host and confirm that the newly added commits successfully execute the intended functionality.

The run is documented in this gpaste (internal):
https://paste.googleplex.com/6116459345346560

This LGTM from a non-rac standpoint.
Aiming to test this on a RAC cluster in the coming days.

jcnars · 2021-04-27T17:04:14Z

Able to test run this multiple times successfully for a single-node RAC cleanup.
https://paste.googleplex.com/6236106396794880 (internal)

mfielding added 2 commits April 5, 2021 11:30

Merge pull request #66 from google/mount

74e730d

Remount /dev/shm if necessary

mfielding requested a review from jcnars April 5, 2021 20:58

mfielding added 4 commits April 8, 2021 22:21

Clean up shmsegs, semaphores, kernel modules

e8a883e

A few more objects that can be left behind if CRS deconfig scripts fail for some reason: shared memory segments, semaphors, and kernel modules (ACFS, ASMFD, etc). In the spirit of a brute cleanup, let's yank stragglers the hard way.

Remove kernel modules after asmlib cleanup

b60c203

Forcibly unloading oracleasm will impact the ability to identify ASM devices, so delay kernel module removal until disks and asmlib are already removed.

Clean up process kill

bd81e7e

Using pkill instead of a pipeline to avoid error messages when these processes don't exist. And thus avoid the need for complicated return code processing.

Merge branch 'master' into cleanup

ccfeedb

Merge branch 'master' into cleanup

eaa8726

jcnars approved these changes Apr 27, 2021

View reviewed changes

mfielding merged commit 21f1984 into master Apr 27, 2021

mfielding deleted the cleanup branch April 27, 2021 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow brute-cleanup to work with RAC #67

Allow brute-cleanup to work with RAC #67

mfielding commented Apr 5, 2021

jcnars commented Apr 20, 2021

jcnars commented Apr 27, 2021

Allow brute-cleanup to work with RAC #67

Allow brute-cleanup to work with RAC #67

Conversation

mfielding commented Apr 5, 2021

jcnars commented Apr 20, 2021

jcnars commented Apr 27, 2021