Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-10542: Allow users to skip writing over specific disks - SaaS #4199

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

omertuc
Copy link
Contributor

@omertuc omertuc commented Jul 28, 2022

tl;dr This is an implementation of a feature that gives users the
ability to tell Assisted to not format disks that it otherwise would.

Background

After the assisted-installer writes the OCP operating system to the
host's installation disk and then reboots the host, we can only hope
that the disk that will boot after that reboot is the installation disk
and not some other device.

After the reboot, one of the following three things could happen:

  1. The discovery ISO media gets booted, again. This is bad, but we can
    detect this and let the user know.

  2. Some disk other than the installation disk gets booted. This is bad,
    and there's nothing we can do about it. We can't even tell for sure if
    that happened.

  3. The installation disk boots and everything is fine and installation
    / cluster initialization proceeds as expected.

On UEFI based systems we can usually make (3) happen using efibootmgr,
on BIOS based systems we cannot influence the boot process in the same
way. However, there is one thing yet we try to do to avoid (2) - during
installation the assisted-service asks the assisted-installer to attempt
to format the boot partition of some of the disks that the
assisted-agent recognized as "Bootable" (these are disks that have an
MBR bootloader as their first sector).

By attempting to clear the boot sector of these disks, we hope that BIOS
will no longer recognize these disks as bootable and so will not try to
automatically boot from them, thus preventing (2).

The issue

ALL disks that we recognize as bootable (save for some special
exceptions) get formatted, all for the sake of ensuring the user won't
run into confusing boot order issues. This is an opinionated decision we
made to improve the ease of use of the Assisted Installer, but it comes
at a cost of destroying user disks, some of which could potentially be
important and might be used by the cluster / server hardware. Some users
may be unhappy with this decision and may want to have more control.

The solution

Give users an abiltiy to opt out of formatting particular disk(s). When
users choose to opt out of formatting, the UI (not as part of this PR)
will show them a warning explaining that it is on them to ensure they
won't run into any boot order issues, and it's up to them to take care
of it.

Implementation

Two new API changes have been introduced -

  • New read-only disks_to_be_formatted column added to every host. This
    column is managed by the service. It's a comma separated list of disks
    that are automatically planned to undergo formatting once installation
    begins, unless otherwise set to be skipped by skip_formatting_disks.
    The reason this new API was added is because we would like the new UI
    for this feature to also show the users which disks are planned to
    undergo formatting.

  • New user-updatable skip_formatting_disks column. Can be modified
    through a newly added host UPDATE API. The structure of this new
    UPDATE API is a list of disks, and for each disk, whether the user
    wants it removed-from or added-to the skip_formatting_disks list. In
    order to make sure that direct API users didn't make a typo (which
    could lead to potentially harmful formatting), this API would also
    validate that the disk ID that the user asked to skip actually appears
    in the inventory. This also ensures the user doesn't choose to skip a
    disk before the inventory arrives - as that could lead to the
    installation disk being chosen to be one of the skipped disks, which
    would be awkward.

Finally, when determining which disks the assisted-service should ask
the assisted-installer to format, the assisted-service will consult both
disks_to_be_formatted and skip_formatting_disks to decide on the
final list (the final list is a simple set(disks_to_be_formatted)
minus set(skip_formatting_disks))

Installation disk

I've also added a blocking validation to hosts which checks that the
installation disk of the host does not appear in the host's
skip_formatting_disks list. This validation would be the simplest way
to block the user from asking to "skip the formatting of the
installation disk" (which would obviously be non-sensical, as the
installation disk has to be formatted). Another way to solve this would
be to block the user during the host UPDATE API call, but that would be
mixing / tightly coupling the installation disk choice API with the new
skip formatting disks API. An "offline" validation happening in the
background is simpler.

Missing skip-formatting disks validation

If for some reason one of the disks that is set to be skipped no longer
appears in the inventory, a new validation will block installation. This
is to ensure that if the user reboots the system and somehow the disk
they asked to skip the formatting of changed ID, then they would have to
manually and consciously remove it from the skip_formatting_disks list.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
    Manual API interaction, unit-tests
  • [] No tests needed

Assignees

/cc @
/cc @

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • Reviewers have been listed
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@omertuc omertuc changed the title MGMT-10542: Allow users to skip writing over specific disks - SaaS WIP - MGMT-10542: Allow users to skip writing over specific disks - SaaS Jul 28, 2022
@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 28, 2022
@omertuc omertuc force-pushed the skip branch 3 times, most recently from f649c8f to 302d959 Compare July 28, 2022 19:54
@openshift-ci openshift-ci bot requested review from flaper87 and tsorya July 28, 2022 19:56
@openshift-ci openshift-ci bot added the api-review Categorizes an issue or PR as actively needing an API review. label Jul 28, 2022
@openshift-ci
Copy link

openshift-ci bot commented Jul 28, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omertuc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 28, 2022
@codecov
Copy link

codecov bot commented Jul 28, 2022

Codecov Report

Merging #4199 (cab82a0) into master (cab82a0) will not change coverage.
The diff coverage is n/a.

❗ Current head cab82a0 differs from pull request most recent head 032b258. Consider uploading reports for the commit 032b258 to get more accurate results

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4199   +/-   ##
=======================================
  Coverage   65.86%   65.86%           
=======================================
  Files         188      188           
  Lines       26482    26482           
=======================================
  Hits        17443    17443           
  Misses       7442     7442           
  Partials     1597     1597           

@omertuc
Copy link
Contributor Author

omertuc commented Jul 29, 2022

/retest

@omertuc omertuc force-pushed the skip branch 3 times, most recently from 0d63dcf to 4db7f30 Compare July 30, 2022 14:51
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 30, 2022
@omertuc omertuc force-pushed the skip branch 4 times, most recently from a100351 to 270f20b Compare July 31, 2022 21:30
@omertuc omertuc changed the title WIP - MGMT-10542: Allow users to skip writing over specific disks - SaaS MGMT-10542: Allow users to skip writing over specific disks - SaaS Jul 31, 2022
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 31, 2022
@omertuc omertuc force-pushed the skip branch 3 times, most recently from 59a253a to 520dd32 Compare August 3, 2022 09:25
@omertuc
Copy link
Contributor Author

omertuc commented Aug 3, 2022

/retest

@omertuc
Copy link
Contributor Author

omertuc commented Aug 3, 2022

/retest

Copy link
Contributor

@avishayt avishayt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but otherwise really nice job

swagger.yaml Outdated
Comment on lines 4069 to 4071
A comma seperated list of host disks that the service will avoid
formatting no matter what. This parameter can be modified via
v2UpdateHost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A comma seperated list of host disks that the service will avoid
formatting no matter what. This parameter can be modified via
v2UpdateHost
A comma-seperated list of host disks that the service will avoid
formatting.

swagger.yaml Outdated
Comment on lines 4060 to 4064
A comma separated list of disks that are automatically planned to
undergo formatting once installation begins, unless otherwise set to
be skipped by skip_formatting_disks. This means that this list also
includes disks that appear in skip_formatting_disks. This parameter
is managed by the service and cannot be modified by the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A comma separated list of disks that are automatically planned to
undergo formatting once installation begins, unless otherwise set to
be skipped by skip_formatting_disks. This means that this list also
includes disks that appear in skip_formatting_disks. This parameter
is managed by the service and cannot be modified by the user.
A comma-separated list of disks will be formatted
once installation begins, unless otherwise set to
be skipped by skip_formatting_disks. This means that this list also
includes disks that appear in skip_formatting_disks. This property
is managed by the service and cannot be modified by the user.

func (v *validator) noSkipInstallationDisk(c *validationContext) (ValidationStatus, string) {
const (
successMessage string = "No request to skip formatting of the installation disk"
failureMessage string = "Requesting to skip the formatting of the installation disk is not allowed. The installation disk has to be formatted. Please either change this host's installation disk or don't skip the formatting of the installation disk"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
failureMessage string = "Requesting to skip the formatting of the installation disk is not allowed. The installation disk has to be formatted. Please either change this host's installation disk or don't skip the formatting of the installation disk"
failureMessage string = "Requesting to skip the formatting of the installation disk is not allowed. The installation disk must be formatted. Please either change this host's installation disk or do not skip the formatting of the installation disk"

return ValidationFailure, failureMessage
}

func (v *validator) noSkipMissingDisk(c *validationContext) (ValidationStatus, string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should never happen because the identifiers that we now use are hardware-based, but I appreciate your paranoia :)

tl;dr This is an implementation of a feature that gives users the
ability to tell Assisted to not format disks that it otherwise would.

# Background

After the assisted-installer writes the OCP operating system to the
host's installation disk and then reboots the host, we can only hope
that the disk that will boot after that reboot is the installation disk
and not some other device.

After the reboot, one of the following three things could happen:

1) The discovery ISO media gets booted, again. This is bad, but we can
detect this and let the user know.

2) Some disk other than the installation disk gets booted. This is bad,
and there's nothing we can do about it. We can't even tell for sure if
that happened.

3) The installation disk boots and everything is fine and installation
/ cluster initialization proceeds as expected.

On UEFI based systems we can usually make (3) happen using `efibootmgr`,
on BIOS based systems we cannot influence the boot process in the same
way. However, there is one thing yet we try to do to avoid (2) - during
installation the assisted-service asks the assisted-installer to attempt
to format the boot partition of some of the disks that the
assisted-agent recognized as "Bootable" (these are disks that have an
MBR bootloader as their first sector).

By attempting to clear the boot sector of these disks, we hope that BIOS
will no longer recognize these disks as bootable and so will not try to
automatically boot from them, thus preventing (2).

# The issue

ALL disks that we recognize as bootable (save for some special
exceptions) get formatted, all for the sake of ensuring the user won't
run into confusing boot order issues. This is an opinionated decision we
made to improve the ease of use of the Assisted Installer, but it comes
at a cost of destroying user disks, some of which could potentially be
important and might be used by the cluster / server hardware. Some users
may be unhappy with this decision and may want to have more control.

# The solution

Give users an abiltiy to opt out of formatting particular disk(s). When
users choose to opt out of formatting, the UI (not as part of this PR)
will show them a warning explaining that it is on them to ensure they
won't run into any boot order issues, and it's up to them to take care
of it.

# Implementation

Two new API changes have been introduced -

* New read-only `disks_to_be_formatted` column added to every host. This
  column is managed by the service. It's a comma separated list of disks
  that are automatically planned to undergo formatting once installation
  begins, unless otherwise set to be skipped by `skip_formatting_disks`.
  The reason this new API was added is because we would like the new UI
  for this feature to also show the users which disks are planned to
  undergo formatting.

* New user-updatable `skip_formatting_disks` column. Can be modified
  through a newly added host UPDATE API. The structure of this new
  UPDATE API is a list of disks, and for each disk, whether the user
  wants it removed-from or added-to the `skip_formatting_disks` list. In
  order to make sure that direct API users didn't make a typo (which
  could lead to potentially harmful formatting), this API would also
  validate that the disk ID that the user asked to skip actually appears
  in the inventory. This also ensures the user doesn't choose to skip a
  disk before the inventory arrives - as that could lead to the
  installation disk being chosen to be one of the skipped disks, which
  would be awkward.

Finally, when determining which disks the assisted-service should ask
the assisted-installer to format, the assisted-service will consult both
`disks_to_be_formatted` and `skip_formatting_disks` to decide on the
final list (the final list is a simple `set(disks_to_be_formatted)`
minus `set(skip_formatting_disks)`)

# Installation disk

I've also added a blocking validation to hosts which checks that the
installation disk of the host does not appear in the host's
`skip_formatting_disks` list. This validation would be the simplest way
to block the user from asking to "skip the formatting of the
installation disk" (which would obviously be non-sensical, as the
installation disk has to be formatted). Another way to solve this would
be to block the user during the host UPDATE API call, but that would be
mixing / tightly coupling the installation disk choice API with the new
skip formatting disks API. An "offline" validation happening in the
background is simpler.

# Missing skip-formatting disks validation

If for some reason one of the disks that is set to be skipped no longer
appears in the inventory, a new validation will block installation. This
is to ensure that if the user reboots the system and somehow the disk
they asked to skip the formatting of changed ID, then they would have to
manually and consciously remove it from the `skip_formatting_disks` list.
@avishayt
Copy link
Contributor

avishayt commented Aug 4, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 4, 2022
@gamli75
Copy link
Contributor

gamli75 commented Aug 4, 2022

/test edge-e2e-ai-operator-ztp

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 2 against base HEAD d31baa6 and 8 for PR HEAD 032b258 in total

@openshift-ci
Copy link

openshift-ci bot commented Aug 4, 2022

@omertuc: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit b0d095d into openshift:master Aug 4, 2022
swagger.yaml Show resolved Hide resolved
danielerez pushed a commit to danielerez/assisted-service that referenced this pull request Oct 15, 2023
…penshift#4199)

tl;dr This is an implementation of a feature that gives users the
ability to tell Assisted to not format disks that it otherwise would.

# Background

After the assisted-installer writes the OCP operating system to the
host's installation disk and then reboots the host, we can only hope
that the disk that will boot after that reboot is the installation disk
and not some other device.

After the reboot, one of the following three things could happen:

1) The discovery ISO media gets booted, again. This is bad, but we can
detect this and let the user know.

2) Some disk other than the installation disk gets booted. This is bad,
and there's nothing we can do about it. We can't even tell for sure if
that happened.

3) The installation disk boots and everything is fine and installation
/ cluster initialization proceeds as expected.

On UEFI based systems we can usually make (3) happen using `efibootmgr`,
on BIOS based systems we cannot influence the boot process in the same
way. However, there is one thing yet we try to do to avoid (2) - during
installation the assisted-service asks the assisted-installer to attempt
to format the boot partition of some of the disks that the
assisted-agent recognized as "Bootable" (these are disks that have an
MBR bootloader as their first sector).

By attempting to clear the boot sector of these disks, we hope that BIOS
will no longer recognize these disks as bootable and so will not try to
automatically boot from them, thus preventing (2).

# The issue

ALL disks that we recognize as bootable (save for some special
exceptions) get formatted, all for the sake of ensuring the user won't
run into confusing boot order issues. This is an opinionated decision we
made to improve the ease of use of the Assisted Installer, but it comes
at a cost of destroying user disks, some of which could potentially be
important and might be used by the cluster / server hardware. Some users
may be unhappy with this decision and may want to have more control.

# The solution

Give users an abiltiy to opt out of formatting particular disk(s). When
users choose to opt out of formatting, the UI (not as part of this PR)
will show them a warning explaining that it is on them to ensure they
won't run into any boot order issues, and it's up to them to take care
of it.

# Implementation

Two new API changes have been introduced -

* New read-only `disks_to_be_formatted` column added to every host. This
  column is managed by the service. It's a comma separated list of disks
  that are automatically planned to undergo formatting once installation
  begins, unless otherwise set to be skipped by `skip_formatting_disks`.
  The reason this new API was added is because we would like the new UI
  for this feature to also show the users which disks are planned to
  undergo formatting.

* New user-updatable `skip_formatting_disks` column. Can be modified
  through a newly added host UPDATE API. The structure of this new
  UPDATE API is a list of disks, and for each disk, whether the user
  wants it removed-from or added-to the `skip_formatting_disks` list. In
  order to make sure that direct API users didn't make a typo (which
  could lead to potentially harmful formatting), this API would also
  validate that the disk ID that the user asked to skip actually appears
  in the inventory. This also ensures the user doesn't choose to skip a
  disk before the inventory arrives - as that could lead to the
  installation disk being chosen to be one of the skipped disks, which
  would be awkward.

Finally, when determining which disks the assisted-service should ask
the assisted-installer to format, the assisted-service will consult both
`disks_to_be_formatted` and `skip_formatting_disks` to decide on the
final list (the final list is a simple `set(disks_to_be_formatted)`
minus `set(skip_formatting_disks)`)

# Installation disk

I've also added a blocking validation to hosts which checks that the
installation disk of the host does not appear in the host's
`skip_formatting_disks` list. This validation would be the simplest way
to block the user from asking to "skip the formatting of the
installation disk" (which would obviously be non-sensical, as the
installation disk has to be formatted). Another way to solve this would
be to block the user during the host UPDATE API call, but that would be
mixing / tightly coupling the installation disk choice API with the new
skip formatting disks API. An "offline" validation happening in the
background is simpler.

# Missing skip-formatting disks validation

If for some reason one of the disks that is set to be skipped no longer
appears in the inventory, a new validation will block installation. This
is to ensure that if the user reboots the system and somehow the disk
they asked to skip the formatting of changed ID, then they would have to
manually and consciously remove it from the `skip_formatting_disks` list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants