Accelerator profiles and Habana integration - phase one - draft #45

chtyler · 2023-10-18T10:13:32Z

Description

Initial draft on accelerator profiles and Habana integration with ODH.

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

modules/options-for-notebook-server-environments.adoc

jstourac · 2023-10-18T12:41:15Z

modules/options-for-notebook-server-environments.adoc

+* SciPy 1.10
+* Kfp-tekton 1.5
+* PyTorch 2.0
+* Elyra 3.15


Would be also nice to have all these sorted alphabetically. But that's an optional thing and probably should be done in a separate commit. I'll try to propose this in upstream of the project too.

Sure Jan, because of time constraints, we can do this in a forthcoming release. I'd prefer to create a separate Jira for this work, just to keep it separate from this.

modules/adding-a-model-server-to-your-data-science-project.adoc

modules/creating-an-accelerator-profile.adoc

modules/deleting-an-accelerator-profile.adoc

modules/enabling-habana-gaudi-devices.adoc

modules/working-with-accelerator-profiles.adoc

grainnejenningsRH

A few questions and style guide suggestions.

modules/adding-a-model-server-to-your-data-science-project.adoc

modules/creating-a-project-workbench.adoc

modules/creating-an-accelerator-profile.adoc

modules/options-for-notebook-server-environments.adoc

modules/launching-jupyter-and-starting-a-notebook-server.adoc

modules/adding-a-model-server-to-your-data-science-project.adoc

modules/creating-a-project-workbench.adoc

modules/configuring-a-recommended-accelerator-tag.adoc

chtyler · 2023-10-25T20:22:21Z

@FedeAlonso - please see the latest changes from the most recent commit. I have addressed your feedback. If you notice anything else, please let me know. @jstourac - your suggested changes should now be applied.

jstourac

Hi Chris, I've put some more comments. Hopefully it's all. Except Habana v1.11. We'll see what happens in next hours.

modules/enabling-habana-gaudi-devices.adoc

modules/options-for-notebook-server-environments.adoc

jstourac · 2023-10-27T13:23:19Z

modules/working-with-accelerator-profiles.adoc

+
+For accelerators that are new to your deployment, you must manually configure an accelerator profile for the accelerator. If your deployment contained NVIDIA GPUs before upgrading {productname-short}, an accelerator profile generates automatically after you upgrade to the latest version of {productname-short}, and resides in the `Instances` section of the `AcceleratorProfile` custom resource definition (CRD).
+
+If you add NVIDIA GPUs to your deployment after upgrading {productname-short}, or you add them to your deployment after you install a new version of the {productname-short} Operator, you must manually create an accelerator profile for your NVIDIA GPUs.


1/ I think we should not mention NVIDIA GPU specifically in this doc unless we are giving some particular example. We should rather use some generic term like accelerator, hardware accelerator, hardware accelerator unit or similar?

2/ These two upgrade information - it's just a one time thing for the upgrade between two particular releases. We should put this note into some Upgrade Notes - that AcceleratorProfile will be created automatically if your deployment has had the hardware accelerator unit pre-configured before the upgrade already.

In general, here we should only have some message like this:

For hardware accelerators that are new to your deployment, you must manually configure an accelerator profile for that particular hardware accelerator. If your deployment contained a hardware accelerator before the upgrade, it will be preserved after the upgrade.

WDYT?

I have reworded the parapgraph based on your suggestion. I have used the term "accelerators".

I will update the downstream RHODS upgrade guides with information such as:

For hardware accelerators that are new to your deployment, you must manually configure an accelerator profile for that particular hardware accelerator. If your deployment contained a hardware accelerator before the upgrade, it will be preserved after the upgrade.

Thanks! 🙂

modules/overview-of-accelerators.adoc

modules/habana-gaudi-integration.adoc

jstourac

Thank you, some more comments. LGTM otherwise.

modules/options-for-notebook-server-environments.adoc

…accelerator profiles

…na operator - addressed further qe feedback

jstourac

LGTM, thank you 🙂

chtyler · 2023-10-31T17:44:00Z

Merging.

chtyler requested a review from grainnejenningsRH October 18, 2023 10:43

jstourac reviewed Oct 18, 2023

View reviewed changes

modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved

jstourac reviewed Oct 18, 2023

View reviewed changes

modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved

jstourac reviewed Oct 18, 2023

View reviewed changes

FedeAlonso suggested changes Oct 18, 2023

View reviewed changes

chtyler force-pushed the DS-11819-accelerator-profiles-phase-1 branch from 2c71b07 to 8e84015 Compare October 18, 2023 17:04

grainnejenningsRH requested changes Oct 20, 2023

View reviewed changes

grainnejenningsRH approved these changes Oct 26, 2023

View reviewed changes

FedeAlonso approved these changes Oct 26, 2023

View reviewed changes

jstourac reviewed Oct 27, 2023

View reviewed changes

chtyler force-pushed the DS-11819-accelerator-profiles-phase-1 branch from 01bc844 to ebbd80f Compare October 31, 2023 15:52

jstourac suggested changes Oct 31, 2023

View reviewed changes

modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved

modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved

modules/options-for-notebook-server-environments.adoc Outdated Show resolved Hide resolved

chtyler added 5 commits October 31, 2023 17:26

Habana - phase one - draft

94a4537

addressed peer review feedback for DS-11819 - Habana integration and …

66ff9cc

…accelerator profiles

DS-11819-accelerator-profiles-phase-1 removed references to 1-11 Haba…

beb814e

…na operator - addressed further qe feedback

DS-11819 - corrected formatting issue in the assembly, fixed minor typo

f475f8c

DS-11819 - corrected further qe feedback

403a740

chtyler force-pushed the DS-11819-accelerator-profiles-phase-1 branch from 2c8386b to 403a740 Compare October 31, 2023 17:27

jstourac approved these changes Oct 31, 2023

View reviewed changes

chtyler merged commit 2d39359 into opendatahub-io:main Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerator profiles and Habana integration - phase one - draft #45

Accelerator profiles and Habana integration - phase one - draft #45

chtyler commented Oct 18, 2023 •

edited

jstourac Oct 18, 2023

chtyler Oct 18, 2023 •

edited

grainnejenningsRH left a comment

chtyler commented Oct 25, 2023

jstourac left a comment

jstourac Oct 27, 2023

chtyler Oct 31, 2023

chtyler Oct 31, 2023

jstourac Oct 31, 2023

jstourac left a comment

jstourac left a comment

chtyler commented Oct 31, 2023


		For accelerators that are new to your deployment, you must manually configure an accelerator profile for the accelerator. If your deployment contained NVIDIA GPUs before upgrading {productname-short}, an accelerator profile generates automatically after you upgrade to the latest version of {productname-short}, and resides in the `Instances` section of the `AcceleratorProfile` custom resource definition (CRD).

		If you add NVIDIA GPUs to your deployment after upgrading {productname-short}, or you add them to your deployment after you install a new version of the {productname-short} Operator, you must manually create an accelerator profile for your NVIDIA GPUs.

Accelerator profiles and Habana integration - phase one - draft #45

Accelerator profiles and Habana integration - phase one - draft #45

Conversation

chtyler commented Oct 18, 2023 • edited

Description

How Has This Been Tested?

Merge criteria:

jstourac Oct 18, 2023

Choose a reason for hiding this comment

chtyler Oct 18, 2023 • edited

Choose a reason for hiding this comment

grainnejenningsRH left a comment

Choose a reason for hiding this comment

chtyler commented Oct 25, 2023

jstourac left a comment

Choose a reason for hiding this comment

jstourac Oct 27, 2023

Choose a reason for hiding this comment

chtyler Oct 31, 2023

Choose a reason for hiding this comment

chtyler Oct 31, 2023

Choose a reason for hiding this comment

jstourac Oct 31, 2023

Choose a reason for hiding this comment

jstourac left a comment

Choose a reason for hiding this comment

jstourac left a comment

Choose a reason for hiding this comment

chtyler commented Oct 31, 2023

chtyler commented Oct 18, 2023 •

edited

chtyler Oct 18, 2023 •

edited