Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerator profiles and Habana integration - phase one - draft #45

Merged

Conversation

chtyler
Copy link
Contributor

@chtyler chtyler commented Oct 18, 2023

Description

Initial draft on accelerator profiles and Habana integration with ODH.

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

* SciPy 1.10
* Kfp-tekton 1.5
* PyTorch 2.0
* Elyra 3.15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be also nice to have all these sorted alphabetically. But that's an optional thing and probably should be done in a separate commit. I'll try to propose this in upstream of the project too.

Copy link
Contributor Author

@chtyler chtyler Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure Jan, because of time constraints, we can do this in a forthcoming release. I'd prefer to create a separate Jira for this work, just to keep it separate from this.

modules/creating-an-accelerator-profile.adoc Outdated Show resolved Hide resolved
modules/creating-an-accelerator-profile.adoc Outdated Show resolved Hide resolved
modules/deleting-an-accelerator-profile.adoc Outdated Show resolved Hide resolved
modules/deleting-an-accelerator-profile.adoc Outdated Show resolved Hide resolved
modules/deleting-an-accelerator-profile.adoc Outdated Show resolved Hide resolved
modules/enabling-habana-gaudi-devices.adoc Show resolved Hide resolved
@chtyler chtyler force-pushed the DS-11819-accelerator-profiles-phase-1 branch from 2c71b07 to 8e84015 Compare October 18, 2023 17:04
Copy link
Collaborator

@grainnejenningsRH grainnejenningsRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions and style guide suggestions.

modules/creating-a-project-workbench.adoc Outdated Show resolved Hide resolved
modules/creating-a-project-workbench.adoc Outdated Show resolved Hide resolved
modules/creating-an-accelerator-profile.adoc Outdated Show resolved Hide resolved
modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved
modules/creating-a-project-workbench.adoc Outdated Show resolved Hide resolved
modules/configuring-a-recommended-accelerator-tag.adoc Outdated Show resolved Hide resolved
modules/configuring-a-recommended-accelerator-tag.adoc Outdated Show resolved Hide resolved
@chtyler
Copy link
Contributor Author

chtyler commented Oct 25, 2023

@FedeAlonso - please see the latest changes from the most recent commit. I have addressed your feedback. If you notice anything else, please let me know. @jstourac - your suggested changes should now be applied.

Copy link
Contributor

@jstourac jstourac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Chris, I've put some more comments. Hopefully it's all. Except Habana v1.11. We'll see what happens in next hours.

modules/enabling-habana-gaudi-devices.adoc Outdated Show resolved Hide resolved
modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved

For accelerators that are new to your deployment, you must manually configure an accelerator profile for the accelerator. If your deployment contained NVIDIA GPUs before upgrading {productname-short}, an accelerator profile generates automatically after you upgrade to the latest version of {productname-short}, and resides in the `Instances` section of the `AcceleratorProfile` custom resource definition (CRD).

If you add NVIDIA GPUs to your deployment after upgrading {productname-short}, or you add them to your deployment after you install a new version of the {productname-short} Operator, you must manually create an accelerator profile for your NVIDIA GPUs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/ I think we should not mention NVIDIA GPU specifically in this doc unless we are giving some particular example. We should rather use some generic term like accelerator, hardware accelerator, hardware accelerator unit or similar?

2/ These two upgrade information - it's just a one time thing for the upgrade between two particular releases. We should put this note into some Upgrade Notes - that AcceleratorProfile will be created automatically if your deployment has had the hardware accelerator unit pre-configured before the upgrade already.

In general, here we should only have some message like this:

For hardware accelerators that are new to your deployment, you must manually configure an accelerator profile for that particular hardware accelerator. If your deployment contained a hardware accelerator before the upgrade, it will be preserved after the upgrade.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworded the parapgraph based on your suggestion. I have used the term "accelerators".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the downstream RHODS upgrade guides with information such as:

For hardware accelerators that are new to your deployment, you must manually configure an accelerator profile for that particular hardware accelerator. If your deployment contained a hardware accelerator before the upgrade, it will be preserved after the upgrade.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🙂

modules/overview-of-accelerators.adoc Outdated Show resolved Hide resolved
modules/habana-gaudi-integration.adoc Outdated Show resolved Hide resolved
@chtyler chtyler force-pushed the DS-11819-accelerator-profiles-phase-1 branch from 01bc844 to ebbd80f Compare October 31, 2023 15:52
Copy link
Contributor

@jstourac jstourac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, some more comments. LGTM otherwise.

modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved
modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved
modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved
@chtyler chtyler force-pushed the DS-11819-accelerator-profiles-phase-1 branch from 2c8386b to 403a740 Compare October 31, 2023 17:27
Copy link
Contributor

@jstourac jstourac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you 🙂

@chtyler
Copy link
Contributor Author

chtyler commented Oct 31, 2023

Merging.

@chtyler chtyler merged commit 2d39359 into opendatahub-io:main Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants