[RFC] Add Intel GPU support into PyTorch CI/CD 

## Motivation 
As the [[RFC] Intel GPU Upstreaming](https://github.com/pytorch/pytorch/issues/114723) mentioned, to integrate the new Intel GPU device and its associated features into PyTorch, we need to implement PyTorch CI/CD tests specifically designed for Intel GPU devices. These tests will ensure the quality of incoming pull requests and gate their acceptance accordingly. 

## Design Philosophy 
For the new part CI/CD enabling, we aim to leverage the existing PyTorch CI/CD infrastructure wherever possible. Intel GPU related test will be dispatched to Intel Develop Cloud (IDC) Instances, which provides Intel GPU hardware as self-hosted runners. Based on our understanding of current PyTorch CI/CD tests, we will divide Intel GPU related tests into several categories: pull test, inductor tests and other tests. For inductor tests, we will extend the existing inductor test workflow to accommodate Intel GPU inductor testing. And a new workflow will serve as the entry point for other tests for Intel GPU, mirroring the approach used for other devices. Overall, all Intel GPU tests follow the rules below. 
- Docker based builds & tests 
- Multiple steps both for inductor and other tests 
     + Base Docker image build on AWS instance runners which provides PyTorch build and tests environment. 
     + Wheel build on AWS instance runners directly. 
     + Tests can be sharded and dispatched on IDC instance runners. 
     
## Detail
### Entrance of pull test 
For the basic build test for pull requests, we will add a new part for Intel GPU specific build in `.github/workflows/pull.yml` and triggered by each pull request.  

### Entrance of Inductor tests 
For Inductor tests, we will add a new part for Intel GPU specific tests in `.github/workflows/inductor.yml` and triggered by `ciflow/inductor`. To avoid breaking other inductor related PRs at the first stage, we plan to add a new ciflow label `ciflow/xpu` in `.github/pytorch-probot.yml`, and limit the Intel GPU inductor tests only for PR which has both `ciflow/inductor` and `ciflow/xpu`.  

### Entrance of other tests 
The Intel GPU device related remain tests will be triggered by PR with label `ciflow/xpu` or regular triggered by timer. To achieve it, we will add a new entrance workflow `.github/workflows/xpu.yml`, which like below content.
```yml
name: xpu

on:
  push:
    branches:
      - main
      - release/*
    tags:
      - ciflow/xpu/*
  workflow_dispatch:
  schedule:
    - cron: 0 0 * * *  

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }}-${{ github.event_name == 'schedule' }}
  cancel-in-progress: true

jobs:
  linux-jammy-xpu-py3_8-build:
    name: linux-jammy-xpu-py3.8
    uses: ./.github/workflows/_linux-build.yml
    with:
      build-environment: linux-jammy-xpu-py3.8
      docker-image-name: pytorch-linux-jammy-xpu-n-py3
      sync-tag: xpu-build
      test-matrix: |
        { include: [
          { config: "default", shard: 1, num_shards: 2, runner: "linux.idc.xpu" },
          { config: "default", shard: 2, num_shards: 2, runner: "linux.idc.xpu" },
        ]}

  linux-jammy-xpu-py3_8-test:
    name: linux-jammy-xpu-py3.8
    uses: ./.github/workflows/_xpu-test.yml
    needs: linux-jammy-xpu-py3_8-build
    with:
      build-environment: linux-jammy-xpu-py3.8
      docker-image: ${{ needs.linux-jammy-xpu-py3_8-build.outputs.docker-image }}
      test-matrix: ${{ needs.linux-jammy-xpu-py3_8-build.outputs.test-matrix }} 
```
 ### Build & Test 

Will add Intel GPU specific base image Dockerfile `.ci/docker/ubuntu-xpu/Dockerfile` and Intel GPU part into image build script `.ci/docker/build.sh` to support Intel GPU based image build on `linux.2xlarge runners`. 

For Pytorch wheel build, different with other devices, currently we need dispatch it to IDC instance runners. We will reuse `.github/workflows/_linux-build.yml` with Intel GPU specific build-environment and add Intel GPU part into Pytorch build script `.ci/pytorch/build.sh`. 

For the tests part, we will add a new Intel GPU test workflow `.github/workflows/_xpu-test.yml` and some necessary GitHub action such as `setup-xpu`, `teardown-xpu`, etc. We also will add a new part in test script `.ci/pytorch/test.sh` and a series utils scripts for Intel GPU. 


cc @seemethere @malfet @pytorch/pytorch-dev-infra @frank-wei @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Add Intel GPU support into PyTorch CI/CD #114850

Motivation

Design Philosophy

Detail

Entrance of pull test

Entrance of Inductor tests

Entrance of other tests

Build & Test

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Add Intel GPU support into PyTorch CI/CD #114850

Description

Motivation

Design Philosophy

Detail

Entrance of pull test

Entrance of Inductor tests

Entrance of other tests

Build & Test

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions