Update xpu-max1100.yml with new config and add some tests #5668

Liangliang-Ma · 2024-06-17T06:38:53Z

This PR:
1.Change the container
2.Update the software version (align with docker compiler)
3. Add some tests

loadams · 2024-06-17T20:32:21Z

.github/workflows/xpu-max1100.yml

-        python -c "import torch; import intel_extension_for_pytorch; print('XPU available:', torch.xpu.is_available())"
+        apt-get update
+        apt-get install clinfo libaio-dev python3-pip -y
+        pip install torch==2.1.0.post2 -f https://developer.intel.com/ipex-whl-stable-xpu


Why do we need the versions from this index?

Verisions from this index is closer to the release of intel_extension_for_pytorch and wheels can be downloaded stable and fast.

Apologies for the slow response on this - would it be possible to get torch from the main pytorch index and leave the Intel specific wheels from developer.intel?

.github/workflows/xpu-max1100.yml

loadams · 2024-07-02T15:42:41Z

Hi @Liangliang-Ma - could you resolve the merge conflicts so we can re-run and work on getting this merged?

Liangliang-Ma · 2024-07-10T05:58:51Z

@loadams Thanks for reminding me. Didn't notice that confliction.

loadams · 2024-07-16T16:06:32Z

.github/workflows/xpu-max1100.yml

+        pip install intel-extension-for-pytorch-deepspeed==2.1.30 -f https://developer.intel.com/ipex-whl-stable-xpu
+        pip install oneccl_bind_pt==2.1.300+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
+        pip install torchvision==0.16.0.post2 -f https://developer.intel.com/ipex-whl-stable-xpu
+        pip install py-cpuinfo pytest pytest-timeout tabulate tensorboard wandb transformers accelerate comet_ml mup numpy==1.26


Could we get these from the DeepSpeed requirements? e.g. pip install .[dev,autotuning] down below?

Yes sure. I removed redundant packages.

loadams · 2024-07-16T18:54:41Z

.github/workflows/xpu-max1100.yml

        cd tests/unit
        pytest --verbose accelerator/*
        pytest --verbose autotuning/*
        pytest --verbose checkpoint/test_reshape_checkpoint.py
        pytest --verbose checkpoint/test_moe_checkpoint.py
        pytest --verbose checkpoint/test_shared_weights.py
        pytest --verbose launcher/test_ds_arguments.py launcher/test_run.py
-        pytest --verbose moe/test_moe_tp.py
+        pytest --verbose model_parallelism/*


@Liangliang-Ma - how much total time do these add to the tests?

@loadams I modified the content of tests in CI for some error. It's because fusedAdam in current main Deepspeed is compatible with IPEX 2.3, but this CI env is for 2.1.3. And IPEX 2.3 will be released in a month roughly speaking. We can bring these test back and add more then.
Now this workflow takes 20m to run. May I ask how long it supposes to take that is appropriate?

Update xpu-max1100.yml

b28ccb9

Liangliang-Ma requested review from mrwyattii and loadams as code owners June 17, 2024 06:38

loadams reviewed Jun 17, 2024

View reviewed changes

.github/workflows/xpu-max1100.yml Outdated Show resolved Hide resolved

loadams reviewed Jun 17, 2024

View reviewed changes

.github/workflows/xpu-max1100.yml Outdated Show resolved Hide resolved

Liangliang-Ma and others added 2 commits June 18, 2024 14:15

Merge branch 'microsoft:master' into max1100

63aefab

Update xpu-max1100.yml

74f034b

Merge branch 'master' into max1100

13276b6

Merge branch 'master' into max1100

80d3d3b

loadams reviewed Jul 16, 2024

View reviewed changes

loadams added 2 commits July 16, 2024 09:06

Merge branch 'master' into max1100

267ad57

Merge branch 'master' into max1100

85cc273

loadams reviewed Jul 16, 2024

View reviewed changes

Update xpu-max1100.yml

0e376b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update xpu-max1100.yml with new config and add some tests #5668

Update xpu-max1100.yml with new config and add some tests #5668

Liangliang-Ma commented Jun 17, 2024

loadams Jun 17, 2024

Liangliang-Ma Jun 18, 2024

loadams Jul 16, 2024

loadams commented Jul 2, 2024

Liangliang-Ma commented Jul 10, 2024

loadams Jul 16, 2024

Liangliang-Ma Jul 18, 2024

loadams Jul 16, 2024

Liangliang-Ma Jul 18, 2024

Update xpu-max1100.yml with new config and add some tests #5668

Are you sure you want to change the base?

Update xpu-max1100.yml with new config and add some tests #5668

Conversation

Liangliang-Ma commented Jun 17, 2024

loadams Jun 17, 2024

Choose a reason for hiding this comment

Liangliang-Ma Jun 18, 2024

Choose a reason for hiding this comment

loadams Jul 16, 2024

Choose a reason for hiding this comment

loadams commented Jul 2, 2024

Liangliang-Ma commented Jul 10, 2024

loadams Jul 16, 2024

Choose a reason for hiding this comment

Liangliang-Ma Jul 18, 2024

Choose a reason for hiding this comment

loadams Jul 16, 2024

Choose a reason for hiding this comment

Liangliang-Ma Jul 18, 2024

Choose a reason for hiding this comment