Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Short term accelerated compute instance #4120

Closed
9 tasks done
bcrawford-moj opened this issue Apr 19, 2024 · 17 comments
Closed
9 tasks done

✨ Short term accelerated compute instance #4120

bcrawford-moj opened this issue Apr 19, 2024 · 17 comments
Assignees

Comments

@bcrawford-moj
Copy link

bcrawford-moj commented Apr 19, 2024

Describe the feature request.

Short term provisioning of an accelerated compute instance.

Describe the context.

In the BOLD programme we are producing a publication on the number of prisoners with children. We have developed a methodology involving LLMs which checks whether prison case notes imply the prisoner has a child. The output will be an Official Statistics in Development report due for publication around end of May.

We used the AP to run the LLM over the case notes but obviously it is quite slow (takes about a week to churn through them). The QA process has highlighted some changes we need to make.

Value / Purpose

This will allow us to meet our publication deadline.

We believe this will be the first time LLMs have been used in producing official statistics (and indeed one of our models was fine-tuned using generated labeled data, so we think it will also be the first time genAI has been used).

We are happy to have associated costs journaled to the BOLD programme.

User Types

Data scientists

Proposed solution:

  • Get resource requirements from requestor
  • Select instance size
  • Add GPU-enabled node-group
  • Amend Control Panel code to allow specifying GPU requirements
  • Create a vscode limited release with GPU
  • Docs updated
  • Tests Green
  • Follow-up stories raised
  • Code reviewed
@Ed-Bajo Ed-Bajo added the data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools label Apr 23, 2024
@AntFMoJ AntFMoJ mentioned this issue Apr 25, 2024
4 tasks
@AntFMoJ
Copy link
Contributor

AntFMoJ commented Apr 26, 2024

@bcrawford-moj do you have a sense of resource requirements at this time.

e.g. do you have an estimate on the file size of the input data being processed by the LLM.

@yznlp
Copy link

yznlp commented Apr 26, 2024

Hi @AntFMoJ I'm the one working with the model.

Not too familiar with how the infrastructure/resource provision works, but I imagine we just need something similar to what's available on the AP in terms of CPU and RAM, but with a GPU enabled with 16GB or even 8GB VRAM.

Reasoning: the entire model runs on the AP by splitting the data into smaller chunks. For the LLM/transformer component it's a fairly small model (44M parameters) which should fit comfortably on 8GB of VRAM. The inputs are only 128 tokens (~words) long, and we will be able to scale the model to the amount of VRAM available.

Thanks for your help and please let me know if something doesn't make sense :)

@BrianEllwood
Copy link
Contributor

Based on user resource requirements we would recommend a p3.2xlarge instance.

@BrianEllwood
Copy link
Contributor

Draft PR created to build GPU node group

@BrianEllwood
Copy link
Contributor

Hi, @yznlp @bcrawford-moj

Can you please confirm the name of the AMI you have been using in your testing ?

Thanks

@bcrawford-moj
Copy link
Author

I'm not exactly sure what the AMI name is. Is it what would appear in the dropdown on the control panel?

@BrianEllwood
Copy link
Contributor

Thanks for getting back to me, we will need to investigate further.

@AntFMoJ
Copy link
Contributor

AntFMoJ commented May 3, 2024

GPU node group created and tested configuring pod to access GPU resources which worked correctly, although this required the taint and label to be removed temporarily. Next step is to resolve issue with taint/daemonset interaction.

@BrianEllwood
Copy link
Contributor

GPU node group and pod creation tested successfully

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB           On  | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P0              23W / 300W |      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

@jacobwoffenden jacobwoffenden changed the title Short term accelerated compute instance ✨ Short term accelerated compute instance May 13, 2024
@AntFMoJ
Copy link
Contributor

AntFMoJ commented May 13, 2024

vscode deployed on the GPU-enabled node pool from the control panel dev environment.
Running ollama on the vscode terminal shows GPU resource can't be found, CUDA drivers need to be added to the vscode image to resolve this issue.

@jacobwoffenden
Copy link
Member

Summary:

Drivers aren't yet available for Ubuntu 24.04, therefore we've downgraded (cut a new release of 1.2.0, the last Ubuntu 22.04 release) to NVIDIA's CUDA base image (ministryofjustice/analytical-platform-visual-studio-code#69), this deploys and is able to run Ollama with GPU capability

@jacobwoffenden
Copy link
Member

jacobwoffenden commented May 16, 2024

Notes:

With the current taints/tolerations, only one GPU enabled workload is schedulable per node, meaning each p3.2xlarge deployed is severely underutilised (pod max memory is 12G, of 64G~ available), need to experiment with if GPU sharing is possible, if not, might as well increase the GPU release to use a lot more RAM

EDIT:

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html

https://aws.amazon.com/blogs/containers/gpu-sharing-on-amazon-eks-with-nvidia-time-slicing-and-accelerated-ec2-instances/

@jacobwoffenden

This comment has been minimized.

@BrianEllwood
Copy link
Contributor

Follow on tickets

Documentation

GPU node scale down

@bcrawford-moj
Copy link
Author

Thanks very much for your work on this! Could we provide the following users access please:

@bcrawford-moj
@carolinetudor
@yznlp
@DaoudC
@wmartin-gss

@AntFMoJ
Copy link
Contributor

AntFMoJ commented May 24, 2024

@bcrawford-moj we have added the users you provided above, you should now be able to open Visual Studio Code:1.2.0-nvidia-cuda-base (GPU-Enabled) in the control panel.

If you or any of the above users have any issues please let me know.

Just to note, there is initially a limit on how many users can use the GPU at a time, so only one or two of your team will be able to deploy the GPU-enabled VSCode on control panel. We are have raised a story to improve on this limitation and will update you as this progresses.

@bcrawford-moj
Copy link
Author

Thanks so much! Very excited to use this. Fyi we have a lot of AL over the next few weeks bc so I wouldn't expect that limitation to be an issue in the short term

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🎉 Done
Development

No branches or pull requests

8 participants