Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v.2.3.0] Release Tracker #121760

Closed
atalman opened this issue Mar 12, 2024 · 34 comments
Closed

[v.2.3.0] Release Tracker #121760

atalman opened this issue Mar 12, 2024 · 34 comments
Labels
triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@atalman
Copy link
Contributor

atalman commented Mar 12, 2024

🐛 Describe the bug

We cut a release branch for the 2.3.0 release.

Our plan from this point from this point is roughly:

  • Phase 1 (until 4/1/24): work on finalizing the release branch
  • Phase 2 (after 4/1/24): perform extended integration/stability/performance testing based on Release Candidate builds.

This issue is for tracking cherry-picks to the release branch.

Cherry-Pick Criteria

Phase 1 (until 4/1/24):

Only low-risk changes may be cherry-picked from main:

  1. Fixes to regressions against the most recent minor release (e.g. 2.2.x for this release; see module: regression issue list)
  2. Critical fixes for: silent correctness, backwards compatibility, crashes, deadlocks, (large) memory leaks
  3. Critical fixes to new features introduced in the most recent minor release (e.g. 2.2.x for this release)
  4. Test/CI fixes
  5. Documentation improvements
  6. Compilation fixes or ifdefs required for different versions of the compilers or third-party libraries
  7. Release branch specific changes (e.g. change version identifiers)

Any other change requires special dispensation from the release managers (currently @atalman, @osalpekar, @huydhn, @malfet). If this applies to your change please write "Special Dispensation" in the "Criteria Category:" template below and explain.

Phase 2 (after 4/1/24):

Note that changes here require us to rebuild a Release Candidate and restart extended testing (likely delaying the release). Therefore, the only accepted changes are Release-blocking critical fixes for: silent correctness, backwards compatibility, crashes, deadlocks, (large) memory leaks

Changes will likely require a discussion with the larger release team over VC or Slack.

Cherry-Pick Process

  1. Ensure your PR has landed in master. This does not apply for release-branch specific changes (see Phase 1 criteria).

  2. Create (but do not land) a PR against the release branch.

    # Find the hash of the commit you want to cherry pick
    # (for example, abcdef12345)
    git log
    
    git fetch origin release/2.3
    git checkout release/2.3
    git cherry-pick -x abcdef12345
    
    # Submit a PR based against 'release/2.3' either:
    # via the GitHub UI
    git push my-fork
    
    # via the GitHub CLI
    gh pr create --base release/2.3
  3. Make a request below with the following format:

Link to landed trunk PR (if applicable):
* 

Link to release branch PR:
* 

Criteria Category:
* 
  1. Someone from the release team will reply with approved / denied or ask for more information.
  2. If approved, someone from the release team will merge your PR once the tests pass. Do not land the release branch PR yourself.

NOTE: Our normal tools (ghstack / ghimport, etc.) do not work on the release branch.

Please note HUD Link with branch CI status and link to the HUD to be provided here.
HUD

Versions

2.3.0

@atalman atalman pinned this issue Mar 12, 2024
@snadampal
Copy link
Collaborator

snadampal commented Mar 13, 2024

Link to main PR (if applicable):

Link to 2.3 release branch PR:

Criteria Category:
The PRs improve the torch.compile performance on aarch64 linux systems substantially.


@malfet: Please do not cherry-pick those changes, we should revert it on builder on trunk and coordinate with oneDNN team to get those changes into release branch and/or wait until the next release

@atalman
Copy link
Contributor Author

atalman commented Mar 14, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fix: Without this we get mismatches between the GLIBC and GLIBCXX ABI used by conda packages vs pytorch.

@atalman merged

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 15, 2024
@NmomoN
Copy link
Contributor

NmomoN commented Mar 18, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Fixes to new features being introduced in 2.3.0 release.

@zdevito


@atalman Please provide some details on what issue you are trying to address and what is the associated feature

@shunting314
Copy link
Contributor

shunting314 commented Mar 18, 2024

Link to landed trunk PR (if applicable):
N/A

Link to release branch PR:

Criteria Category:

  • need this to make pytorch release work with the corresponding triton release

@atalman merged

@atalman
Copy link
Contributor Author

atalman commented Mar 21, 2024

Link to landed trunk PR (if applicable):

  • N/A

Link to release branch PR:

Criteria Category:

  • need this to make pytorch release work with the corresponding triton release

@atalman merged

@wz337
Copy link
Contributor

wz337 commented Mar 22, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Cherry-picking a revert. Bug fix.

@atalman merged

@SS-JIA
Copy link
Contributor

SS-JIA commented Mar 25, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • This does not quite fit into any of the cherry pick criterias, however, it implements features required for ExecuTorch alpha release. It is also low risk as it only impacts the PyTorch-Vulkan backend. cc: @huydhn

@guangy10
Copy link
Contributor

guangy10 commented Mar 25, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

A critical fix for ExecuTorch. Need to be merged so that ExecuTorch v0.2 can be compatible with PyTorch v2.3. cc: @huydhn


@huydh merged

@JacobSzwejbka
Copy link
Contributor

JacobSzwejbka commented Mar 26, 2024

Link to landed trunk PR (if applicable):

#122246

Link to release branch PR:

#122655

Criteria Category:

A critical fix for ExecuTorch: A new pass that ExecuTorch uses to allow mutable state with custom ops. Need to be merged so that ExecuTorch v0.2 can be compatible with PyTorch v2.3 cc: @huydhn


@huydhn merged

@JacobSzwejbka
Copy link
Contributor

JacobSzwejbka commented Mar 26, 2024

Link to landed trunk PR (if applicable):

#121990

Link to release branch PR:

#122654

Criteria Category:

A critical fix for ExecuTorch: Patching auto functionalized which ExecuTorch relies on for exporting llama. Need to be merged so that ExecuTorch v0.2 can be compatible with PyTorch v2.3 cc: @huydhn


@huydhn merged

@JacobSzwejbka
Copy link
Contributor

JacobSzwejbka commented Mar 26, 2024

Link to landed trunk PR (if applicable):

#122683

Link to release branch PR:

#122721

Criteria Category:

A critical fix for ExecuTorch: Makes dead code elimination pass not trash the output of capture_pre_autograd_graph if it contains kv cache mutation. Need to be merged so that ExecuTorch v0.2 can be compatible with PyTorch v2.3 cc: @huydhn


@huydhn merged

@jataylo
Copy link
Collaborator

jataylo commented Mar 27, 2024

Link to landed trunk PR (if applicable):

  • Not applicable, release specific change.

Link to release branch PR:

Criteria Category:


@atalman merged

@guangy10
Copy link
Contributor

guangy10 commented Mar 28, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

A critical fix for ExecuTorch. W/o this PR, PyTorch v2.3 will not be compatible with ExecuTorch, even the simples add module will fail. What this PR does is adding new ops to native_functions.yaml and they are covered by tests. It’s relatively low risk as these ops are not used by many component except for ExecuTorch.


@atalman wrote: @guangy10 This is all looks like feature work. We normally accept only critical fixes at this point:

cc: @malfet @albanD

@atalman I don't think it's a feature work. I would category this to the "critical fix: compatibility" because it does break the compatibility with ExecuTorch w/o this fix.


@huydhn merged

@SS-JIA
Copy link
Contributor

SS-JIA commented Mar 28, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • This does not quite fit into any of the cherry pick criterias, however, it implements fixes to PyTorch-Vulkan required for ExecuTorch alpha release. It is also low risk as it only impacts the PyTorch-Vulkan backend. cc: @huydhn

@atalman wrote: @SS-JIA For cherry-pick to be considered for a release it needs to have unit test with the cherry-pick


@SS-JIA wrote: @atalman the `linux-vulkan-focal-py3.11-clang10 / test job tests PyTorch Vulkan and is passing on the PR. Is that sufficient?

@htyu
Copy link
Contributor

htyu commented Mar 28, 2024

Link to landed trunk PR:

Link to release branch PR:

Criteria Category:
A critical fix for matmul performance on RocM.


@atalman wrote: Please provide some unit tests for this cherry-pick

@atalman atalman added this to the 2.3.0 milestone Mar 28, 2024
@jithunnair-amd
Copy link
Collaborator

jithunnair-amd commented Mar 29, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  1. A critical fix for Flash Attention performance on ROCm
  2. A fix for intermittent build issues observed on CI when building AOTriton

cc @xinyazhang @groenenboomj


@atalman merged

@chunyuan-w
Copy link
Collaborator

chunyuan-w commented Apr 1, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@wz337
Copy link
Contributor

wz337 commented Apr 1, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fix for DeviceMesh to avoid recreating nccl communicators multiple times which could lead to deadlocks.

@atalman merged

@wanchaol
Copy link
Contributor

wanchaol commented Apr 1, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes for: silent correctness on tensor subclasses

edit: @mikaylagawarecki added cherry pick of #122800 per discussion on #123106 since we are cherry picking #122755 as a critical fix for wrapper tensor subclasses


@huydhn merged

@huydhn
Copy link
Contributor

huydhn commented Apr 2, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • CI fix

@atalman merged

@ajindal1
Copy link
Contributor

ajindal1 commented Apr 3, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Backward compatibility with ONNX Runtime Training.

@atalman merged

@atalman
Copy link
Contributor Author

atalman commented Apr 3, 2024

@atalman
Copy link
Contributor Author

atalman commented Apr 3, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@atalman
Copy link
Contributor Author

atalman commented Apr 4, 2024

@BowenBao
Copy link
Collaborator

BowenBao commented Apr 4, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes: we found beartype 0.18 released on 4/2/2024 would uncover a type annotation issue in current ONNX related code and raise error for every export call. This fix changes the behavior from raising an error to warning.

@atalman @huydhn merged

@atalman
Copy link
Contributor Author

atalman commented Apr 4, 2024

@atalman
Copy link
Contributor Author

atalman commented Apr 5, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@malfet
Copy link
Contributor

malfet commented Apr 5, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@cyyever
Copy link
Collaborator

cyyever commented Apr 6, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

  • N/A

Criteria Category:

  • Critical fix: A bug fix for norm kernel when called with value = 1.

@atalman wrote: Please note, at this point we already generated final RC. Is there an issue associated with this PR ?
@cyyever wrote: No such an issue.

@WeizhuoZhang-intel
Copy link
Contributor

WeizhuoZhang-intel commented Apr 11, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fix: Add skip model by device in Dynamo test for torchbench suite. Some of those models have OOM issue and other models did not support CPU currently, which will help to improve the CPU side passrate when using runner.py.

@atalman wrote: Hi @WeizhuoZhang-intel we already built final rc. this only affects torchbench suite. Is running this on nightly is possibility ? This cherry-pick does not fit the criteria for Critical fixes for: silent correctness, backwards compatibility, crashes, deadlocks, (large) memory leaks

@huydhn
Copy link
Contributor

huydhn commented Apr 12, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • CI fix to unblock ExecuTorch build

@huydhn merged

@kiukchung kiukchung unpinned this issue Apr 22, 2024
@atalman atalman closed this as completed Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests