Skip to content

aws-hypervisor: restructure instance-data into per-node subdirectories#72

Open
fonta-rh wants to merge 3 commits into
openshift-eng:mainfrom
fonta-rh:instance-data-subdirs
Open

aws-hypervisor: restructure instance-data into per-node subdirectories#72
fonta-rh wants to merge 3 commits into
openshift-eng:mainfrom
fonta-rh:instance-data-subdirs

Conversation

@fonta-rh
Copy link
Copy Markdown
Contributor

@fonta-rh fonta-rh commented May 18, 2026

Summary

  • Add get_node_dir / get_shared_dir helpers to common.sh with migration fallback for backward compatibility
  • Move per-node files (aws-instance-id, public_address, ssh_user, etc.) into instance-data/node-0/ subdirectory; shared files (network_stack_name, to_be_removed_cf_stack_list) stay at root
  • Update 6 openshift-clusters scripts with dual-path existence checks (node-0/ first, flat fallback)

Part of the multi-node groundwork series (OCPEDGE-2608). Follow-up to #70 (split CF template).

Test plan

  • make shellcheck passes
  • make create writes per-node files to instance-data/node-0/ and shared files to instance-data/
  • make start / make stop still work via migration fallback (unported scripts)
  • make info reads instance data correctly via fallback
  • make destroy tears down both stacks and cleans up instance-data/ including node subdirs
  • Openshift-clusters scripts (make clean, make deploy, etc.) find instance data in both layouts

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for multi-node deployments with configurable node identifiers and per-node state management.
    • Instance data and configuration now isolated per node across deployment operations.
  • Chores

    • Updated deployment scripts and templates to support multi-node configurations while maintaining backward compatibility with existing single-node setups.

Move per-node files (aws-instance-id, public_address, ssh_user, etc.)
into instance-data/node-0/ while keeping shared files (network_stack_name,
to_be_removed_cf_stack_list) at the root. This prepares the layout for
multi-node support (OCPEDGE-2608).

Add get_node_dir/get_shared_dir helpers to common.sh with a migration
fallback: if node-0/ doesn't exist but flat files do, get_node_dir
returns the flat path so unported scripts keep working.

Update openshift-clusters scripts with dual-path existence checks
(node-0/ first, flat fallback) for backward compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 18, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Walkthrough

This PR restructures multi-node deployment support by introducing NODE_ID configuration, directory resolution helpers, and updating AWS hypervisor and OpenShift cluster scripts to store instance data in node-scoped directories instead of a shared directory, while maintaining backward compatibility for legacy single-node deployments.

Changes

Multi-node deployment infrastructure

Layer / File(s) Summary
Multi-node configuration and directory helpers
deploy/aws-hypervisor/instance.env.template, deploy/aws-hypervisor/scripts/common.sh
NODE_ID environment variable (defaults to node-0) is introduced. New helper functions get_shared_dir() and get_node_dir() resolve node-specific paths with automatic fallback to the shared directory when a legacy aws-instance-id marker exists, supporting both new and old deployment layouts.
Create script: per-node state storage
deploy/aws-hypervisor/scripts/create.sh
Initializes node_dir and shared_dir paths. Capacity-reservation metadata, CloudFormation stack events, instance outputs (instance-id, public/private IPs, stack names), SSH credentials, and completion markers are all stored under node_dir. Network stack orchestration metadata is stored under shared_dir.
Destroy script: per-node cleanup
deploy/aws-hypervisor/scripts/destroy.sh
Reads instance data and capacity-reservation metadata from node_dir. Reservation cleanup is guarded by file existence. Final shared-directory cleanup checks whether the directory exists before attempting removal.
Lifecycle operation scripts
deploy/aws-hypervisor/scripts/force-stop.sh, init.sh, inventory.sh, print_instance_data.sh, ssh.sh, start.sh, stop.sh
All scripts consistently resolve instance connection details (instance-id, public_address, ssh_user) from node_dir instead of hardcoded shared paths, using the new helper functions.
OpenShift cluster scripts: backward compatibility
deploy/openshift-clusters/scripts/clean-spoke.sh, clean.sh, deploy-cluster.sh, deploy-fencing-assisted.sh, full-clean.sh, patch-nodes.sh
Instance-data detection now succeeds when either the new node-0/aws-instance-id path or the legacy aws-instance-id path exists, enabling coexistence of both deployment layouts.

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fonta-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2026
fonta-rh and others added 2 commits May 18, 2026 14:56
Scripts using hardcoded flat instance-data paths failed after create.sh
moved per-node files into node-0/ subdirectories. The migration fallback
in get_node_dir only helps scripts that call it — these didn't.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cache get_node_dir/get_shared_dir in create.sh for consistency with
  all other scripts in the PR
- Move .done marker write inside directory guard in destroy.sh so
  cleanup is idempotent when instance-data was already removed
- Fix missing trailing newline in destroy.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fonta-rh fonta-rh marked this pull request as ready for review May 18, 2026 15:09
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2026
@openshift-ci openshift-ci Bot requested review from eggfoobar and jerpeter1 May 18, 2026 15:09
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/openshift-clusters/scripts/clean-spoke.sh`:
- Around line 42-43: The INSTANCE_IP assignment can cause the script to exit
under set -o errexit if both files are missing; update the command that sets
INSTANCE_IP to fall back to an empty string instead of letting the subshell fail
(e.g. change the substitution to use "|| echo \"\"" so it becomes
INSTANCE_IP=$(cat ... 2>/dev/null || cat ... 2>/dev/null || echo "")). This
mirrors the SSH_USER fallback pattern and ensures later validation logic (used
after INSTANCE_IP) runs instead of the script exiting prematurely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7674ec47-98d7-41e1-92b7-3aeb56a3aadb

📥 Commits

Reviewing files that changed from the base of the PR and between 305cb4f and 41a102c.

📒 Files selected for processing (17)
  • deploy/aws-hypervisor/instance.env.template
  • deploy/aws-hypervisor/scripts/common.sh
  • deploy/aws-hypervisor/scripts/create.sh
  • deploy/aws-hypervisor/scripts/destroy.sh
  • deploy/aws-hypervisor/scripts/force-stop.sh
  • deploy/aws-hypervisor/scripts/init.sh
  • deploy/aws-hypervisor/scripts/inventory.sh
  • deploy/aws-hypervisor/scripts/print_instance_data.sh
  • deploy/aws-hypervisor/scripts/ssh.sh
  • deploy/aws-hypervisor/scripts/start.sh
  • deploy/aws-hypervisor/scripts/stop.sh
  • deploy/openshift-clusters/scripts/clean-spoke.sh
  • deploy/openshift-clusters/scripts/clean.sh
  • deploy/openshift-clusters/scripts/deploy-cluster.sh
  • deploy/openshift-clusters/scripts/deploy-fencing-assisted.sh
  • deploy/openshift-clusters/scripts/full-clean.sh
  • deploy/openshift-clusters/scripts/patch-nodes.sh

Comment thread deploy/openshift-clusters/scripts/clean-spoke.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant