Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/gpu support extended #87

Open
wants to merge 76 commits into
base: develop
Choose a base branch
from
Open

Conversation

ksatzke
Copy link
Collaborator

@ksatzke ksatzke commented Oct 12, 2020

This PR adds the capability to execute Python KNIX functions in sandboxes using NVIDIA GPU resources for both ansible and helm deployments of KNIX. GPU nodes are detected and configured automatically. The required kubernetes configuration for deployments with GPU nodes is described in README_GPU_Installation.md

Subsumes #11, and fixes #79.

ksatzke and others added 24 commits July 10, 2020 14:38
…nding values.yml capability definition example
Comment on lines 46 to 57
# Install dlib for CUDA
RUN git clone https://github.com/davisking/dlib.git
RUN mkdir -p /dlib/build

RUN cmake -H/dlib -B/dlib/build -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1
RUN cmake --build /dlib/build

RUN cd /dlib; python3 /dlib/setup.py install

# Install the face recognition package and tensorflow
RUN pip3 install face_recognition
RUN pip3 install tensorflow==2.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we need to install all these custom libraries for the GPU usage.

If these are needed by the workflows, then they should specify it in the function requirements.

@@ -449,7 +449,7 @@ def _get_state_names_and_resource(self, desired_state_type, wf_dict):
return state_list


def add_workflow(self,name,filename=None):
def add_workflow(self,name,filename=None, gpu_usage="None"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should read: gpu_usage=None

@@ -21,7 +21,7 @@ NAMES := $(YAML:%.yaml=%)
.PHONY: $(NAMES)
default: prepare_packages install

install: init_once riak elasticsearch fluentbit datalayer sandbox management nginx
install: init_once installnvidiadocker riak elasticsearch fluentbit datalayer frontend sandbox management nginx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'frontend' component does not exist anymore.

What happens if the host does not have any Nvidia GPUs? Will the 'installnvidiadocker' still succeed?

@@ -107,6 +118,7 @@ image_java: \

push: image image_java
$(call push_image,microfn/sandbox)
$(call push_image,microfn/sandbox_gpu)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

microfn/sandbox_java_gpu?

Need to also update the dependencies for the Makefile target.

gpu_hosts[hostname] = hostip

# instruct hosts to start the sandbox and deploy workflow
if runtime=="Java" or sandbox_image_name == "microfn/sandbox": # can use any host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had the "microfn/sandbox_java_gpu" image?

ksatzke and others added 25 commits January 12, 2021 13:54
@iakkus iakkus mentioned this pull request May 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable dynamic GPU scheduling
3 participants