Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for H100s #111

Merged
merged 14 commits into from
Dec 10, 2023
Merged

add support for H100s #111

merged 14 commits into from
Dec 10, 2023

Conversation

thelinuxkid
Copy link
Contributor

What does this PR do?

add support for H100s

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Was this discussed/approved via a Github issue or the discord / slack channel? Please add a link
    to it if that's the case.
  • Did you write any new necessary tests?

Who can review?

@arnavgarg1
@tgaddair

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

POC

image

@flozi00
Copy link
Collaborator

flozi00 commented Dec 7, 2023

https://github.com/enricai/lorax/blob/58b3c0c7a97f3e43355af4965afc09668e58f571/Dockerfile#L40

Could you set the pytorch version in docker to 2.1.1 too ?

@thelinuxkid
Copy link
Contributor Author

https://github.com/enricai/lorax/blob/58b3c0c7a97f3e43355af4965afc09668e58f571/Dockerfile#L40

Could you set the pytorch version in docker to 2.1.1 too ?

@flozi00 Done!!

@thelinuxkid
Copy link
Contributor Author

Build issue should be fixed now

Dockerfile Outdated Show resolved Hide resolved
Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'll go ahead and land, and then do some robustness tests after the new image goes out.

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thelinuxkid looks like based on the tests we actually need to add back peft as a dependency.

@tgaddair
Copy link
Contributor

tgaddair commented Dec 9, 2023

@thelinuxkid alternatively, would you mind giving me edit access to the branch / PR so I can make these changes while I test things out?

@tgaddair tgaddair mentioned this pull request Dec 9, 2023
@thelinuxkid
Copy link
Contributor Author

@tgaddair just added, will wait for tests to see if they turn out OK this time

@tgaddair
Copy link
Contributor

Hey @thelinuxkid, apologies, but can you add these to pyproject.toml as well and update the other dep files:

boto3 = "^1.28.34"
urllib3 = "<=1.26.18"

@thelinuxkid
Copy link
Contributor Author

@tgaddair Done! Also gave you access as maintainer to our fork

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran a few tests earlier today with these changes. Everything looks good on my end. Merging!

@tgaddair tgaddair merged commit 49af104 into predibase:main Dec 10, 2023
1 check failed
@thelinuxkid
Copy link
Contributor Author

Thanks @tgaddair ! When will this be available in the docker image? I'm not able to run with flash attn successfully. I build it but I still get an error that it doesn't exist when running it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants