Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update xpu-max1100.yml with new config and add some tests #5668

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Liangliang-Ma
Copy link
Contributor

This PR:
1.Change the container
2.Update the software version (align with docker compiler)
3. Add some tests

python -c "import torch; import intel_extension_for_pytorch; print('XPU available:', torch.xpu.is_available())"
apt-get update
apt-get install clinfo libaio-dev python3-pip -y
pip install torch==2.1.0.post2 -f https://developer.intel.com/ipex-whl-stable-xpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the versions from this index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verisions from this index is closer to the release of intel_extension_for_pytorch and wheels can be downloaded stable and fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow response on this - would it be possible to get torch from the main pytorch index and leave the Intel specific wheels from developer.intel?

@loadams
Copy link
Contributor

loadams commented Jul 2, 2024

Hi @Liangliang-Ma - could you resolve the merge conflicts so we can re-run and work on getting this merged?

@Liangliang-Ma
Copy link
Contributor Author

@loadams Thanks for reminding me. Didn't notice that confliction.

pip install intel-extension-for-pytorch-deepspeed==2.1.30 -f https://developer.intel.com/ipex-whl-stable-xpu
pip install oneccl_bind_pt==2.1.300+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
pip install torchvision==0.16.0.post2 -f https://developer.intel.com/ipex-whl-stable-xpu
pip install py-cpuinfo pytest pytest-timeout tabulate tensorboard wandb transformers accelerate comet_ml mup numpy==1.26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get these from the DeepSpeed requirements? e.g. pip install .[dev,autotuning] down below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure. I removed redundant packages.

cd tests/unit
pytest --verbose accelerator/*
pytest --verbose autotuning/*
pytest --verbose checkpoint/test_reshape_checkpoint.py
pytest --verbose checkpoint/test_moe_checkpoint.py
pytest --verbose checkpoint/test_shared_weights.py
pytest --verbose launcher/test_ds_arguments.py launcher/test_run.py
pytest --verbose moe/test_moe_tp.py
pytest --verbose model_parallelism/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Liangliang-Ma - how much total time do these add to the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loadams I modified the content of tests in CI for some error. It's because fusedAdam in current main Deepspeed is compatible with IPEX 2.3, but this CI env is for 2.1.3. And IPEX 2.3 will be released in a month roughly speaking. We can bring these test back and add more then.
Now this workflow takes 20m to run. May I ask how long it supposes to take that is appropriate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants