-
Notifications
You must be signed in to change notification settings - Fork 23
Upgrade to Cuda12 and latest versions #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we discuss to remove tox? Using a tool nobody knows from the team, which is used in a single project is not a valid option, especially since everything is realizable in side our environment using existing tools. Please remove it from the CI and how to run tests.
Feel to keep it in your personal environment but for the project we want to stay lean and in the scope of what other team members know.
I didn't do a full review.
| with: | ||
| image: inference-pytorch-gpu | ||
| dockerfile: dockerfiles/pytorch/gpu/Dockerfile | ||
| dockerfile: dockerfiles/pytorch/Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be the wrong place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have one image now. Only diff between gpu and cpu is the base image. CUDA Development is the default base image.
e.g. for CPU:
build_args: "BASE_IMAGE=ubuntu:22.04"
philschmid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. Why are using venv now? I thought we use plain python? Isn't that why we removed conda?
| converted_input = Conversation( | ||
| inputs["text"], | ||
| past_user_inputs=inputs.get("past_user_inputs", []), | ||
| generated_responses=inputs.get("generated_responses", []), | ||
| ) | ||
| prediction = pipeline(converted_input, *args, **kwargs) | ||
| return { | ||
| "generated_text": prediction.generated_responses[-1], | ||
| "conversation": { | ||
| "past_user_inputs": prediction.past_user_inputs, | ||
| "generated_responses": prediction.generated_responses, | ||
| }, | ||
| } | ||
| logging.info(f"Inputs: {inputs}") | ||
| logging.info(f"Args: {args}") | ||
| logging.info(f"KWArgs: {kwargs}") | ||
| prediction = pipeline(inputs, *args, **kwargs) | ||
| logging.info(f"Prediction: {prediction}") | ||
| return list(prediction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the conversational is not needed in that format we can remove the whole wrap_pipeline since its not used?
philschmid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost done, left some suggestion on the dockerfile and had 1 question related to the line-length in 1 file.
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
No description provided.