Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QA] Test the habana notebooks. #243

Closed
9 tasks done
harshad16 opened this issue Oct 6, 2023 · 5 comments
Closed
9 tasks done

[QA] Test the habana notebooks. #243

harshad16 opened this issue Oct 6, 2023 · 5 comments
Assignees
Labels
area/QA This issue is assigned to Quality assurance feature/habana

Comments

@harshad16
Copy link
Member

harshad16 commented Oct 6, 2023

Tasks.

  • How to get a cluster with HPU available.
  • Implement quality assurance
    • Set the cluster with RHODS. (from jenkin with iib value for 1.34)
    • Make sure the habana imagestream.
    • Spin the Habana notebook with HPU accelerator
      • Follow this docs for setting accelerator profile: Docs 1, Docs 2
    • Test the Habana image with tutorial

Follow this issue for additional checks: #355

@harshad16 harshad16 added the area/QA This issue is assigned to Quality assurance label Oct 16, 2023
@harshad16 harshad16 changed the title Test the habana notebooks. [QA] Test the habana notebooks. Oct 16, 2023
@jstourac
Copy link
Contributor

jstourac commented Oct 18, 2023

Optional negative testing scenarios:

  • On cluster without habana the initialization should fail
  • On cluster with habana operator that doesn’t match our image the initialization should fail

We should also perform the check on disconnected environment!

@jstourac
Copy link
Contributor

jstourac commented Oct 18, 2023

Current status - feature has been successfuly checked with the v1.10. Based on discussion, we need to check also v1.11 - relevant operator should be available in latest 4.12.x OCP release.

Update - I have checked that even with the latest 4.12.39 OCP release, there's no Habana operator in version v1.11 yet.

Link to the operator for reference.

Automation - TBD.

@jstourac
Copy link
Contributor

jstourac commented Oct 18, 2023

One more question raised up - can the users use Habana in a maintained variant of the RHODS product? I'm not sure if I correctly distinct all difference between self-managed and managed product. In case of managed in AWS, I suppose that customer should be able to perform all necessary configuration to utilize the Habana hardware, correct? But, I am not sure that he can choose for particular location where the cluster is being provisioned :thinking_face: e.g. us-east-1. At the moment the Habana hardware is restricted to just us-east-1 and us-west-2 locations.


Update - info from Erwand:

To add to this, there are 2 main was to get OpenShift on AWS: 
- IPI install of OpenShift (lots of flexibility)
- Managed OpenShift (ROSA/OSD) install (less flexibility). 

In the past, not all Instance types have been supported by managed openshift. 
I just verified and it seems that ROSA at least supports the DL1 instance type: 

❯ rosa list instance-types | grep dl
dl1.24xlarge       accelerated_computing  96         768.0 GiB

As such, it should be possible to make Habana working with both self-managed and managed installation types.

@jstourac
Copy link
Contributor

jstourac commented Oct 30, 2023

For the record - for release of RHODS 1.34, it was decided to go with Habana v1.10 image only, see #297.

@harshad16
Copy link
Member Author

Marking this as complete , thanks for the work 💯
Additional work would be tracker here: #355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/QA This issue is assigned to Quality assurance feature/habana
Projects
Status: Done
Status: No status
Archived in project
Development

No branches or pull requests

2 participants