-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update xpu-max1100.yml with new config and add some tests #5668
base: master
Are you sure you want to change the base?
Conversation
python -c "import torch; import intel_extension_for_pytorch; print('XPU available:', torch.xpu.is_available())" | ||
apt-get update | ||
apt-get install clinfo libaio-dev python3-pip -y | ||
pip install torch==2.1.0.post2 -f https://developer.intel.com/ipex-whl-stable-xpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the versions from this index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verisions from this index is closer to the release of intel_extension_for_pytorch and wheels can be downloaded stable and fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the slow response on this - would it be possible to get torch from the main pytorch index and leave the Intel specific wheels from developer.intel?
Hi @Liangliang-Ma - could you resolve the merge conflicts so we can re-run and work on getting this merged? |
@loadams Thanks for reminding me. Didn't notice that confliction. |
.github/workflows/xpu-max1100.yml
Outdated
pip install intel-extension-for-pytorch-deepspeed==2.1.30 -f https://developer.intel.com/ipex-whl-stable-xpu | ||
pip install oneccl_bind_pt==2.1.300+xpu -f https://developer.intel.com/ipex-whl-stable-xpu | ||
pip install torchvision==0.16.0.post2 -f https://developer.intel.com/ipex-whl-stable-xpu | ||
pip install py-cpuinfo pytest pytest-timeout tabulate tensorboard wandb transformers accelerate comet_ml mup numpy==1.26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get these from the DeepSpeed requirements? e.g. pip install .[dev,autotuning] down below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure. I removed redundant packages.
cd tests/unit | ||
pytest --verbose accelerator/* | ||
pytest --verbose autotuning/* | ||
pytest --verbose checkpoint/test_reshape_checkpoint.py | ||
pytest --verbose checkpoint/test_moe_checkpoint.py | ||
pytest --verbose checkpoint/test_shared_weights.py | ||
pytest --verbose launcher/test_ds_arguments.py launcher/test_run.py | ||
pytest --verbose moe/test_moe_tp.py | ||
pytest --verbose model_parallelism/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Liangliang-Ma - how much total time do these add to the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loadams I modified the content of tests in CI for some error. It's because fusedAdam in current main Deepspeed is compatible with IPEX 2.3, but this CI env is for 2.1.3. And IPEX 2.3 will be released in a month roughly speaking. We can bring these test back and add more then.
Now this workflow takes 20m to run. May I ask how long it supposes to take that is appropriate?
This PR:
1.Change the container
2.Update the software version (align with docker compiler)
3. Add some tests