Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
2834 commits
Select commit Hold shift + click to select a range
f20a266
[easy] Update test/dynamo/test_utils.py (#151599)
masnesral Apr 17, 2025
1b267a5
Revert "[export] allow partially specifying keys for dynamic shapes d…
pytorchmergebot Apr 18, 2025
7ffa900
Replace perf-nightly-macos with inductor-perf-nightly-macos (#151698)
huydhn Apr 18, 2025
88b0553
[AMD] Remove fbcode limit for uuid (#151652)
xw285cornell Apr 18, 2025
b0f26e8
Use reusable binary docker build action for libtorch (#151488)
clee2000 Apr 18, 2025
a6e46fa
Use reusable binary docker build action for manywheel (#151489)
clee2000 Apr 18, 2025
6e7b6e8
[c10d][fr] Fix a bug when first rank is not zero in the script (#151683)
fduwjj Apr 18, 2025
b74be52
[CUDA][NVTX] Move nvtx3 code from cmake/public/cuda.cmake to cmake/De…
nWEIdia Apr 18, 2025
02dd096
[invoke_subgraph][fake tensor] Add finalizer on subgraph instead of t…
anijain2305 Apr 18, 2025
56d318b
[ONNX][Eazy] Update onnx program doc formatting and improve robustnes…
justinchuby Apr 18, 2025
bd77c3e
[easy] Update test/dynamo/test_structured_trace.py (#151606)
masnesral Apr 17, 2025
97d97ae
Revert "[dynamic shapes] guard_or_false for _reshape_view_helper, uti…
pytorchmergebot Apr 18, 2025
fc7d493
Overload Library::def rather than templating it (#151626)
swolchok Apr 18, 2025
704a504
Reserve vectors in FunctionSchema::cloneWithRealTypes (#151627)
swolchok Apr 18, 2025
313ceb4
Reserve vector in StringCordView ctor (#151628)
swolchok Apr 18, 2025
cac8d35
Use fmt::format for debug strings in Library init (#151629)
swolchok Apr 18, 2025
e48189c
Don't eagerly create AliasInfo in parseAliasDeclaration (#151630)
swolchok Apr 18, 2025
359e1d5
[Profiler] Remove Decref From Python Context (#151625)
sraikund16 Apr 18, 2025
adf5f38
Don't specialize min/max (#151347)
tugsbayasgalan Apr 18, 2025
cfc4d74
inductor.config.descriptive_names = False is not actually supported (…
exclamaforte Apr 19, 2025
843e4d1
[Benchmarking] Enable HF_GPT2 benchmarking on Metal (#151721)
malfet Apr 19, 2025
6261db7
Revert "inductor.config.descriptive_names = False is not actually sup…
pytorchmergebot Apr 19, 2025
414ce71
[Testing] Make test_add_complex3 run on different devices (#151732)
malfet Apr 19, 2025
2673ea4
Add api to enable/disable NaN detector per-PG (#151723)
wconstab Apr 18, 2025
f6c1cf0
[ROCm][TunableOp] Support submatrices in offline tuning (#151138)
naromero77amd Apr 19, 2025
92d0c40
Revert "Cache the value of torch_key in subproc (#151057)"
pytorchmergebot Apr 19, 2025
8e5fefe
[Easy] The event_id of torch.cuda.Event and torch.xpu.Event always is…
FFFrog Apr 19, 2025
92baeec
[Easy] Fix the function signature of torch.Event (#151221)
FFFrog Apr 19, 2025
1e1d0a4
[Easy][torch.Event] Fix and improve the docs of torch.Event (#151411)
FFFrog Apr 19, 2025
68f748a
Revert "[Testing] Make test_add_complex3 run on different devices (#1…
pytorchmergebot Apr 19, 2025
483e61b
[BE][Easy]: Simplify reversed call in graph matcher (#151674)
Skylion007 Apr 19, 2025
ed511cd
[Testing] Make test_add_complex3 run on different devices (#151732)
malfet Apr 19, 2025
9b74ea2
[Benchmarking] Run MPS benchmarks for [b]float16 (#151747)
malfet Apr 19, 2025
c448256
Revert "[Easy][torch.Event] Fix and improve the docs of torch.Event (…
pytorchmergebot Apr 19, 2025
48761e9
Revert "[Easy] Fix the function signature of torch.Event (#151221)"
pytorchmergebot Apr 19, 2025
a40e876
Support fp8 dtypes in assert_close (#150002)
exclamaforte Apr 20, 2025
6b45b6e
run lintrunner for Export d68846308 (#151725)
Camyll Apr 20, 2025
c3a7278
Use more efficient row/col computation (#151474)
aartbik Apr 20, 2025
470132c
[MPS] Add support for hermite_polynomial_he (inductor/eager). (#151754)
dcci Apr 20, 2025
fc2dd6d
[Inductor] Update should_decompose_mm condition for CPU (#151730)
hl475 Apr 21, 2025
9c2ac2b
[pytorch][triton] Enable warp spec for FlexAttention kernel (#150470)
Apr 21, 2025
8eb21df
consolidate ATen/test/dispatch_key_set_test.cpp with rest of Dispatch…
swolchok Apr 20, 2025
f7ddc51
[Easy] Fix the compilation warning of BlasKernel. (#151736)
FFFrog Apr 20, 2025
2a9afda
[Benchmarking] Add sam and stable_diffusion to MPS benchmarked models…
malfet Apr 21, 2025
bf28d1c
Expose bicubic mode for torch::nn::functional::grid_sample in LibTorc…
inventshah Apr 21, 2025
2eacdb9
Add OIDC permissions to xpu workflow (#151455)
zxiiro Apr 21, 2025
515a0f6
[ez] fix typo in comment (#151755)
bobrenjc93 Apr 20, 2025
33808f0
Revert "[Easy] The event_id of torch.cuda.Event and torch.xpu.Event a…
pytorchmergebot Apr 21, 2025
9374064
Revert "[Easy] Add more check for elapsedTime of torch.xxx.Event and …
pytorchmergebot Apr 21, 2025
cea43f7
[Testing] Unskip expm1 log1p for MPS (#151790)
malfet Apr 21, 2025
287998b
Run standalone compile tests on cpu/gpu (#151768)
oulgen Apr 21, 2025
67c2869
Unpack the output code in the standalone_compile (#151609)
oulgen Apr 21, 2025
0f8613b
Introduce unsafe way to mark functions as cacheable (#151603)
oulgen Apr 21, 2025
e2b1c06
[cutlass] Define GELU_taylor<float> only if CUTLASS version is <= 380…
henrylhtsang Apr 21, 2025
f37e138
[MPS] Enable log1p and sigmoid for int64 (#151791)
malfet Apr 21, 2025
fd04c79
Revert "[aot autograd][logging] Profile large missing gaps in compile…
pytorchmergebot Apr 21, 2025
d79144d
[BE] Move aarch64 docker build to larger node (#151808)
malfet Apr 21, 2025
9680016
[MergeBot] Update PullRequestResolved Regex (#151814)
malfet Apr 21, 2025
b7c7000
Ensure runners have the required prefix (#151815)
ZainRizvi Apr 21, 2025
2fb1326
Add dates to pages (#151602)
svekars Apr 21, 2025
1a6effc
[torch] Expose PCI info from CUDA device (#151672)
efiks Apr 21, 2025
191b023
Added to docs for out_dtype arg in torch gemms (#151704)
PaulZhang12 Apr 18, 2025
02cecd1
[inductor][test] Skip triton tests for MPS as well, also change reaso…
henrylhtsang Apr 18, 2025
1f0d764
stage 2 of depreate silent fallback of tuning gemm (#148622)
henrylhtsang Apr 18, 2025
352019b
[BE]: Better cleanup optimized code from #151474 (#151794)
Skylion007 Apr 21, 2025
25a1185
[symmem] Add some code comments to rendezvous code (#151716)
fduwjj Apr 21, 2025
c312d8c
[Dynamo] Clean up old torch function flag (#149711)
mlazos Apr 21, 2025
cd1317f
[export] suggest dynamic re-export in input constraints hook (#151624)
pianpwk Apr 21, 2025
6ea2e6a
Do not do proper const fold during tensorify_python_scalars (#151494)
laithsakka Apr 21, 2025
79a9447
FlexAttention add decorator for large test cases (#151459)
drisspg Apr 21, 2025
efdcc98
Back out "Do not propagate real tensor in extern kernel" (#151813)
yushangdi Apr 21, 2025
01f1cc4
Rename register_fake_profile to unsafe_generate_fake_kernels (#151797)
angelayi Apr 21, 2025
4d78e19
reroute index to fast implementation for indexing on 0th dimension (#…
ngimel Apr 21, 2025
99aeee2
[Inductor] Add Additional Configs for persistent+TMA version of Trito…
NikhilAPatel Apr 21, 2025
a35e73b
[c10] add #pragma once to leftright (#151710)
dolpm Apr 21, 2025
b3b1616
Add explict type info in the try-catch for dynamo logging (#151733)
houseroad Apr 21, 2025
80a3877
[easy] Fix test_dynamo_timed (#151816)
masnesral Apr 21, 2025
a02eae8
[dynamic shapes] guard_or_false for _reshape_view_helper, utils._infe…
pianpwk Apr 22, 2025
40cf49d
Revert "[Intel GPU] Allow XPU backend in Depthwise_conv2d&3d operator…
pytorchmergebot Apr 22, 2025
a4fdae5
Lift guard checking logic to AOTAutogradCache (#151563)
jamesjwu Apr 21, 2025
14e3ffb
Deprecate host allocator legacy APIs (#151437)
guangyey Apr 18, 2025
b7a7741
Non-deterministic alert in histc_cuda for floating types only (#151701)
amjames Apr 21, 2025
edba20b
[logging] Fix duration logging for dynamo_compile (#151749)
masnesral Apr 19, 2025
529f698
[logging] Put "everything" WaitCounters in dynamo_timed (#151757)
masnesral Apr 20, 2025
29811f6
[Inductor][FlexAttention] fix `vars_and_sizes` divisor error (#151634)
BoyuanFeng Apr 22, 2025
6f32712
[MKLDNN] Check that strides are positive (#151848)
malfet Apr 21, 2025
95abc0f
[c10d][fr] Fix another bug when we should continue when the op list i…
fduwjj Apr 22, 2025
0ff302e
Revert "reroute index to fast implementation for indexing on 0th dime…
pytorchmergebot Apr 22, 2025
e76c0b1
Revert "[dynamic shapes] guard_or_false for _reshape_view_helper, uti…
pytorchmergebot Apr 22, 2025
4a643af
[Hierarchical Compile] Fix small bug (#151293)
mlazos Apr 21, 2025
283884b
[Hierarchical Compile] Handle autocast ctx manager (#151294)
mlazos Apr 21, 2025
a09a3f4
[Hierarchical compile] Ensure output nodes are sorted last (#151295)
mlazos Apr 21, 2025
dfdf731
Do not generate long log messaged for suppressed data dependent error…
laithsakka Apr 21, 2025
73d9589
Do not log exception when recording is disabled or already recording …
laithsakka Apr 21, 2025
ccd0035
Log information about suppressed data dependent errors (#151041)
laithsakka Apr 21, 2025
3aeeb77
[Dynamo][Easy] Remove unreachable code (#151739)
shink Apr 22, 2025
159e2f9
[dynamo][ci] Fix recently broken test (#151877)
anijain2305 Apr 22, 2025
d778c92
[Metal][BE] Move atomic ops to c10/metal/atomic.h (#151868)
malfet Apr 22, 2025
c729f7d
[provenance_tracking][reland] Fix UT error and re-land `ExternKernel`…
YUNQIUGUO Apr 22, 2025
ed0d2eb
Revert "Non-deterministic alert in histc_cuda for floating types only…
pytorchmergebot Apr 22, 2025
f072bf2
Revert "faster gather implementation (#151490)"
pytorchmergebot Apr 22, 2025
4504910
Revert "[ez] Make relaxed constraint error message more user friendly…
pytorchmergebot Apr 22, 2025
3804aed
Revert "[Inductor] Add Additional Configs for persistent+TMA version …
pytorchmergebot Apr 22, 2025
5d316ce
Add device check for inputs (#151828)
yushangdi Apr 22, 2025
5fc1eb8
Add OIDC permissions to bazel workflow (#151456)
zxiiro Apr 22, 2025
06a3c3c
[Optimus][Observability] Improve tlparse logging (#151635)
mengluy0125 Apr 22, 2025
264e8fb
More fix for aot_export_module name collision during unlifting (#151684)
yushangdi Apr 22, 2025
2c27597
Infra for handling builtin ops (min, max, math.pow) (#151348)
tugsbayasgalan Apr 18, 2025
834a017
Optimize register_full_backward_hook description when all input no gr…
zeshengzong Apr 22, 2025
4bf0956
[EZ/Profiler] Update Submodule (#151843)
sraikund16 Apr 22, 2025
fa0f13b
Fix doc requirements install error (#151787)
zeshengzong Apr 22, 2025
982062d
Cache the value of torch_key in subproc (#151057)
oulgen Apr 11, 2025
fbd2952
[MPS] Move ops modifiers to testing utils so other tests can reuse (#…
qqaatw Apr 22, 2025
337caac
Use more efficient mask to index computation (#151372)
aartbik Apr 22, 2025
69ee6a9
[Sana][HybridCache] Fix bug in detect_attr_assignment (#151824)
tugsbayasgalan Apr 22, 2025
6cd1741
[ONNX] Update decomposition logic to loop over onnx registry (#151826)
titaiwangms Apr 22, 2025
8ca7953
[cutlass backend] delay construction of cutlass presets to when calle…
henrylhtsang Apr 22, 2025
d0d4e99
[associative_scan] Fixes for assoc_scan testcases (#149988)
bohnstingl Apr 22, 2025
0bb9b89
Revert "[compile][compile time traces] Add more dynamo traces (#151357)"
pytorchmergebot Apr 22, 2025
7e4b89a
fix spammy library deinit errors when user passes an invalid TORCH_LO…
bdhirsh Apr 22, 2025
596296f
[standalone_compile] Dynamic shape handling (#151788)
zou3519 Apr 22, 2025
a48ccf0
[Inductor] move alignment tests to a separate file (#151841)
shunting314 Apr 21, 2025
3380a46
Fix DTensorTestBase to barrier with device ids (#150896)
wanchaol Apr 21, 2025
2f74cff
Remove `reinterpret_cast`s with undefined behavior from stable/librar…
swolchok Apr 22, 2025
aaf71a4
Revert "Log information about suppressed data dependent errors (#1510…
pytorchmergebot Apr 22, 2025
459c62e
Revert "Do not log exception when recording is disabled or already re…
pytorchmergebot Apr 22, 2025
bc6c0bc
Revert "Do not generate long log messaged for suppressed data depende…
pytorchmergebot Apr 22, 2025
835413b
Revert "[Optimus][Observability] Improve tlparse logging (#151635)"
pytorchmergebot Apr 22, 2025
017a6bd
add min/max_seqlen to non_differentiable (#151750)
sumantro93 Apr 22, 2025
e05ac9b
Use folder tagged docker images for binary builds (#151706)
clee2000 Apr 22, 2025
b8f4dc5
[ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 G…
pragupta Apr 22, 2025
3aecf2d
[MPS] Extend index_put to half precision floats (#151869)
malfet Apr 22, 2025
2f851ac
[MPSInductor] Implement `atomic_add` store mode (#151871)
malfet Apr 22, 2025
c0b70f9
[Testing] Enable `test_mutations_loop_fusion_mps` (#151872)
malfet Apr 22, 2025
43de9b7
Remove mention of magma-cuda in readme.md, refactor magma_conda insta…
atalman Apr 22, 2025
6a1b820
[export] Enable symint inputs for AdditionalInputs and ShapesCollecti…
angelayi Apr 22, 2025
a7ccd96
logging start of torch elastic workers. (#150849)
aschhabra Apr 22, 2025
f4ac9a1
[fx] Filter stacktrace (#151029)
angelayi Apr 22, 2025
c98340e
[autodeps2] Replace third-party/pyyaml with third-party/pypi/pyyaml (…
kkolur76 Apr 22, 2025
4f8adde
Speed up OperatorEntry construction by avoiding updateDispatchTableFu…
swolchok Apr 22, 2025
cd576fd
[torch][fx] Add support for EXIR dialect overload ops in normalize_fu…
dulinriley Apr 22, 2025
aa61707
Fix extra heap allocation in Source constructor (#151800)
swolchok Apr 22, 2025
334aab0
Updates NCCLConfig with QOS variable (#151821)
syed-ahmed Apr 22, 2025
72f711e
Revert "[inductor] Change minimum number of SMs to 60 to let Ada use …
pytorchmergebot Apr 23, 2025
49b7ffb
[MPS] Implement _print_Trunc_to_Int (#151964)
dcci Apr 23, 2025
015b526
[MPSInductor] Warn-cast double as floats (#151963)
malfet Apr 23, 2025
68a7501
[Inductor][CPP] Fix Codegen Issue when Parallel Reduction under the v…
leslie-fang-intel Apr 22, 2025
74074fe
[inductor] handle offset in ReinterpretView for alignment (#151859)
shunting314 Apr 22, 2025
13339ce
[dynamic shapes] bound_sympy for size-oblivious min/max reasoning (#1…
pianpwk Apr 23, 2025
cd021d0
Fix circular imports (#151939)
oulgen Apr 22, 2025
2530593
[Cutlass] Implement EVT example tensor creation (#150904)
mlazos Apr 22, 2025
78bbb46
Use /var/tmp instead of /tmp for torch cache directory on fbcode (#15…
oulgen Apr 23, 2025
f9bdfe9
[MegaCache] Return None on no compilation (#151921)
oulgen Apr 22, 2025
cc793e8
[StandaloneCompile] Autotune at compile time (#151922)
oulgen Apr 22, 2025
b37fa20
[FlexAttention] Fix device test instantation (#151846)
drisspg Apr 23, 2025
54f7361
[dynamic shapes] guard_or_false for _reshape_view_helper, utils._infe…
pianpwk Apr 23, 2025
b247e5d
[Inductor][CPU] Add GEMM templates for _weight_int4pack_mm_for_cpu wi…
Xia-Weiwen Apr 22, 2025
097faa9
[audio hash update] update the pinned audio hash (#151729)
pytorchupdatebot Apr 23, 2025
7c97720
[dynamic shapes] rewrite expand with guard_or_false (#150236)
pianpwk Apr 23, 2025
ee81fe4
Support regexes in dynamic sources allowlist (#151766)
bobrenjc93 Apr 23, 2025
62b5649
[Inductor] Test ND block pointers with dynamic shapes (#151646)
blaine-rister Apr 23, 2025
5b9df57
[dynamo] context manager/decorator for dynamo config patching during …
williamwen42 Apr 22, 2025
6d28d61
[CI] Remove protobuf from docker image (#151933)
clee2000 Apr 23, 2025
b32b002
[BE] Replace `std::runtime_error` with `TORCH_CHECK` [1/N] (#151880)
shink Apr 23, 2025
21b0ef5
[Easy] Remove redundant code (#151883)
FFFrog Apr 22, 2025
7310049
Revert "[FlexAttention] Fix device test instantation (#151846)"
pytorchmergebot Apr 23, 2025
dcc32ff
[CUDA][cuBLAS][cuBLASLt] Opt-in unified cuBLAS + cuBLASLt workspaces …
eqy Apr 23, 2025
e31e2d2
Turn on static cuda launcher in OSS (#151691)
jamesjwu Apr 22, 2025
0511467
[ROCm] AtomicAdd specialization on AMD for fp64. (#151724)
naromero77amd Apr 23, 2025
a560216
Update description for torch.random.fork_rng (#151881)
FFFrog Apr 23, 2025
9422e24
[MPS] Fix test_neg_index_mps (#151966)
dcci Apr 23, 2025
2ab752d
Make `torch.jit.Error` inherit from Exception (#151947)
alanhdu Apr 23, 2025
348272e
Revert "[invoke_subgraph][fake tensor] Add finalizer on subgraph inst…
pytorchmergebot Apr 23, 2025
9344da8
Revert "[fake tensor cache] Support index with non bool/int8 indices …
pytorchmergebot Apr 23, 2025
5f63789
[torchbind] fix error message when attr is a real tensor. (#151944)
ydwu4 Apr 23, 2025
aa285e6
Revert "[cutlass backend] delay construction of cutlass presets to wh…
pytorchmergebot Apr 23, 2025
3c1a17a
[Dynamo] Use LazyVariableTracker in base VT (#151847)
anijain2305 Apr 22, 2025
5acc3e2
[Inductor] Add Additional Configs for persistent+TMA version of Trito…
NikhilAPatel Apr 22, 2025
69e41ce
move find_hop_schema into _higher_order_ops/schema.py (#151147)
ydwu4 Apr 22, 2025
99ae7d4
Reland fast gather and index implementation (#151917)
ngimel Apr 23, 2025
c1f51cf
[map] defer importing AOTConfig and create_joint dependency (#151479)
ydwu4 Apr 23, 2025
98c53d8
Revert "[MPS] Fix test_neg_index_mps (#151966)"
pytorchmergebot Apr 23, 2025
5623285
Revert "Turn on static cuda launcher in OSS (#151691)"
pytorchmergebot Apr 23, 2025
dccb7a9
[pytorch] use a mutex in initialize_torch_libraries (#151938)
rmaz Apr 23, 2025
bd19173
[cutlass backend] Stop using GenerateSM80 for SM90 and SM100 (#150781)
henrylhtsang Apr 7, 2025
47ad351
[DRAFT] INitial version of sticky export (#151047)
tugsbayasgalan Apr 23, 2025
fd3d339
[dynamic shapes] be less aggressive with runtime assert CSE for bound…
pianpwk Apr 23, 2025
4d2d833
[CI] Update sleef submodule to v3.8 (#151955)
malfet Apr 23, 2025
8172397
Revert "Update torch-xpu-ops commit pin (#150827)"
pytorchmergebot Apr 24, 2025
f2cfeb2
[Environment Variable][7/N] Use thread-safe getenv functions (#140211)
cyyever Apr 24, 2025
2455ded
[FlexAttention] Fix device test instantation (#151846)
drisspg Apr 23, 2025
4e1d433
[FlexAttention] Remove Old Constraint on lastdim strides (#151959)
drisspg Apr 23, 2025
f39a1a4
Fix typos in meta.rst (#151979)
Stonesjtu Apr 24, 2025
c91acad
[Easy] Add more check for elapsedTime of torch.xxx.Event and torch.Ev…
FFFrog Apr 23, 2025
4ac2ee5
[sigmoid] memory planner C10 deps (#151275)
dolpm Apr 24, 2025
d703f06
[MPS] Adjust test_sum_dtypes so it can run on MPS. (#152064)
dcci Apr 24, 2025
2ee8de5
[dynamic shapes] user-code friendly statically_known_true, has_static…
pianpwk Apr 24, 2025
e2cf60f
[MPS] Fix test_neg_index_mps (#151966)
dcci Apr 24, 2025
43f1b60
Revert "[MPS] Adjust test_sum_dtypes so it can run on MPS. (#152064)"
pytorchmergebot Apr 24, 2025
5de92e6
Don't copy DynamicType argument to DynamicType::create (#151801)
swolchok Apr 24, 2025
fabbcdd
Create and use DynamicTypes for check in DispatchKeyExtractor::makeBi…
swolchok Apr 24, 2025
0559741
Fix return type of TypeFactoryBase<c10::DynamicType>::get (#151803)
swolchok Apr 24, 2025
89a85d0
Add & use Token::text_view() (which returns a string_view unlike text…
swolchok Apr 24, 2025
b237211
Fix easy missing moves in function_schema_parser (#151805)
swolchok Apr 24, 2025
68454b9
Fix a missed c10::TypeFactory::create spot in function_schema_parser …
swolchok Apr 24, 2025
76cc379
Fix missing moves in SchemaTypeParser::parseFakeAndRealType (#151807)
swolchok Apr 24, 2025
2a58d2a
StringCordView: make iterator fast when there is only one piece (#151…
swolchok Apr 24, 2025
2102b3b
[FSDP1] print fqns when debug FlatParamHandle (#151336)
weifengpy Apr 15, 2025
a389835
[MPS] Adjust test_sum_dtypes so it can run on MPS. (#152064)
dcci Apr 24, 2025
5e9bdc9
[MPS] layernorm forward kernel (#152010)
Isalia20 Apr 24, 2025
2ea8653
[vec128] Fix fmsub NEON defintion (#152075)
malfet Apr 24, 2025
78953ee
[pytorch] reland of [cutlass backend] delay construction of cutlass p…
henrylhtsang Apr 24, 2025
5b368fa
Add torch.cuda._compile_kernel() (#151484)
msaroufim Apr 24, 2025
5e320ee
[BE] follow autoformating and linter (#151507)
XilunWu Apr 24, 2025
3278ddd
[invoke_subgraph] Compile time traces (#151409)
anijain2305 Apr 24, 2025
41285f2
[invoke_subgraph][fake tensor] Add finalizer on subgraph instead of t…
anijain2305 Apr 24, 2025
1d73b64
[fake tensor cache] Support index with non bool/int8 indices (#151477)
anijain2305 Apr 24, 2025
d743a7b
[invoke_subgraph] Cache fake tensor if no unbacked symint in the outp…
anijain2305 Apr 24, 2025
3a170a8
Revert "[Cutlass] Implement EVT example tensor creation (#150904)"
pytorchmergebot Apr 24, 2025
56e67ba
Move verbose warning to warning_once (#152044)
wconstab Apr 23, 2025
0eb554e
Better error msg for too big to optimize (#151855)
yushangdi Apr 24, 2025
9c1bc9c
[fake tensor] Cache None, integer and SymInts in the output (#151961)
anijain2305 Apr 24, 2025
402d19c
add basic unit tests and noop config (#152036)
Lucaskabela Apr 23, 2025
03970df
Add functionality for installing free variables (#151134)
Lucaskabela Apr 23, 2025
81c4369
[dynamo] Add guard serialization for tensor matches. (#151318)
zhxchen17 Apr 23, 2025
ff075d0
Update docs dependencies for local build (#151796)
svekars Apr 24, 2025
b11c9e1
[CI][docker] Use install_cusparselt when possible in docker image (#…
clee2000 Apr 24, 2025
b1d055f
Revert "[dynamo] Add guard serialization for tensor matches. (#151318)"
pytorchmergebot Apr 24, 2025
6efc572
[CUDA][CPU] Bump system memory requirement for `test_cross_entropy_la…
eqy Apr 24, 2025
24bda01
Pin theme to a branch (#152046)
svekars Apr 24, 2025
92f125e
[export] improve error message for deserializing custom triton op (#1…
ydwu4 Apr 23, 2025
bd09d87
add Out Notes (#151306)
ILCSFNO Apr 24, 2025
dccc415
Include other accelerators in capturable docstr for optimizers (#149770)
janeyx99 Apr 24, 2025
8a9c66b
Improve stable library apis per Scott's feedback (#152040)
janeyx99 Apr 24, 2025
d78d2af
[CUDA][TF32] Account for TF32 in `test_corrcoef` (#151830)
eqy Apr 24, 2025
6ced5e6
Python 3.11 and 3.13 support for Windows Arm64 (#152109)
iremyux Apr 24, 2025
0413358
Non-deterministic alert in histc_cuda for floating types only (#151701)
amjames Apr 24, 2025
fc6e37c
[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debug…
fulvius31 Apr 24, 2025
2089b22
[xpu] set aot device flags in cpp_extension (#149459)
jingxu10 Apr 24, 2025
d70490e
[Inductor][CPP] Optimize the epilogue for int8 GEMM Template (#152000)
leslie-fang-intel Apr 24, 2025
75c71ab
[Break XPU] generalize newly introduced device bias code in Inductor …
etaf Apr 24, 2025
8313bc2
Revert "Add OIDC permissions to bazel workflow (#151456)"
pytorchmergebot Apr 25, 2025
7f28c03
Adding fbgemm to whitelist (#152079)
jimone1 Apr 25, 2025
1a6d50d
Reducer: add check on received data to avoid segfault (#152143)
d4l3k Apr 25, 2025
e2c7ae5
[ONNX] Add group_norm support from opset 21 (#152138)
justinchuby Apr 25, 2025
dda0c95
[audio hash update] update the pinned audio hash (#152149)
pytorchupdatebot Apr 25, 2025
a936d59
[Cutlass] Implement EVT example tensor creation (#150904)
mlazos Apr 25, 2025
6120cc8
[executorch hash update] update the pinned executorch hash (#151728)
pytorchupdatebot Apr 25, 2025
7b9e7b6
Generate test reports for pytest when option is given
Flamefire Feb 6, 2025
a0b2972
Use correct value for passing --save-xml to subtest
Flamefire Feb 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
2 changes: 1 addition & 1 deletion .ci/aarch64_linux/aarch64_ci_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ cd /
# on the mounted pytorch repo
git config --global --add safe.directory /pytorch
pip install -r /pytorch/requirements.txt
pip install auditwheel
pip install auditwheel==6.2.0
if [ "$DESIRED_CUDA" = "cpu" ]; then
echo "BASE_CUDA_VERSION is not set. Building cpu wheel."
#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
Expand Down
20 changes: 17 additions & 3 deletions .ci/aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def build_ArmComputeLibrary() -> None:
"clone",
"https://github.com/ARM-software/ComputeLibrary.git",
"-b",
"v24.09",
"v25.02",
"--depth",
"1",
"--shallow-submodules",
Expand Down Expand Up @@ -99,10 +99,14 @@ def update_wheel(wheel_path, desired_cuda) -> None:
if "126" in desired_cuda:
libs_to_copy += [
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.6",
"/usr/local/cuda/lib64/libcufile.so.0",
"/usr/local/cuda/lib64/libcufile_rdma.so.1",
]
elif "128" in desired_cuda:
libs_to_copy += [
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.8",
"/usr/local/cuda/lib64/libcufile.so.0",
"/usr/local/cuda/lib64/libcufile_rdma.so.1",
]
else:
libs_to_copy += [
Expand Down Expand Up @@ -132,6 +136,9 @@ def complete_wheel(folder: str) -> str:
"""
wheel_name = list_dir(f"/{folder}/dist")[0]

# Please note for cuda we don't run auditwheel since we use custom script to package
# the cuda dependencies to the wheel file using update_wheel() method.
# However we need to make sure filename reflects the correct Manylinux platform.
if "pytorch" in folder and not enable_cuda:
print("Repairing Wheel with AuditWheel")
check_call(["auditwheel", "repair", f"dist/{wheel_name}"], cwd=folder)
Expand All @@ -143,7 +150,14 @@ def complete_wheel(folder: str) -> str:
f"/{folder}/dist/{repaired_wheel_name}",
)
else:
repaired_wheel_name = wheel_name
repaired_wheel_name = wheel_name.replace(
"linux_aarch64", "manylinux_2_28_aarch64"
)
print(f"Renaming {wheel_name} wheel to {repaired_wheel_name}")
os.rename(
f"/{folder}/dist/{wheel_name}",
f"/{folder}/dist/{repaired_wheel_name}",
)

print(f"Copying {repaired_wheel_name} to artifacts")
shutil.copy2(
Expand Down Expand Up @@ -204,7 +218,7 @@ def parse_arguments():
else:
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={version}.dev{build_date} PYTORCH_BUILD_NUMBER=1 "
elif branch.startswith(("v1.", "v2.")):
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={branch[1:branch.find('-')]} PYTORCH_BUILD_NUMBER=1 "
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={branch[1 : branch.find('-')]} PYTORCH_BUILD_NUMBER=1 "

if enable_mkldnn:
build_ArmComputeLibrary()
Expand Down
20 changes: 3 additions & 17 deletions .ci/aarch64_linux/build_aarch64_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,11 @@

# AMI images for us-east-1, change the following based on your ~/.aws/config
os_amis = {
"ubuntu18_04": "ami-078eece1d8119409f", # login_name: ubuntu
"ubuntu20_04": "ami-052eac90edaa9d08f", # login_name: ubuntu
"ubuntu22_04": "ami-0c6c29c5125214c77", # login_name: ubuntu
"redhat8": "ami-0698b90665a2ddcf1", # login_name: ec2-user
}

ubuntu18_04_ami = os_amis["ubuntu18_04"]
ubuntu20_04_ami = os_amis["ubuntu20_04"]


Expand Down Expand Up @@ -329,7 +327,7 @@ def build_ArmComputeLibrary(host: RemoteHost, git_clone_flags: str = "") -> None
]
)
host.run_cmd(
f"git clone https://github.com/ARM-software/ComputeLibrary.git -b v24.09 {git_clone_flags}"
f"git clone https://github.com/ARM-software/ComputeLibrary.git -b v25.02 {git_clone_flags}"
)

host.run_cmd(f"cd ComputeLibrary && scons Werror=1 -j8 {acl_build_flags}")
Expand Down Expand Up @@ -659,18 +657,6 @@ def configure_system(
"sudo apt-get install -y python3-dev python3-yaml python3-setuptools python3-wheel python3-pip"
)
host.run_cmd("pip3 install dataclasses typing-extensions")
# Install and switch to gcc-8 on Ubuntu-18.04
if not host.using_docker() and host.ami == ubuntu18_04_ami and compiler == "gcc-8":
host.run_cmd("sudo apt-get install -y g++-8 gfortran-8")
host.run_cmd(
"sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100"
)
host.run_cmd(
"sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100"
)
host.run_cmd(
"sudo update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-8 100"
)
if not use_conda:
print("Installing Cython + numpy from PyPy")
host.run_cmd("sudo pip3 install Cython")
Expand Down Expand Up @@ -761,7 +747,7 @@ def start_build(
version = host.check_output("cat pytorch/version.txt").strip()[:-2]
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={version}.dev{build_date} PYTORCH_BUILD_NUMBER=1"
if branch.startswith(("v1.", "v2.")):
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={branch[1:branch.find('-')]} PYTORCH_BUILD_NUMBER=1"
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={branch[1 : branch.find('-')]} PYTORCH_BUILD_NUMBER=1"
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"
if enable_mkldnn:
Expand Down Expand Up @@ -1026,7 +1012,7 @@ def parse_arguments():
install_condaforge_python(host, args.python_version)
sys.exit(0)

python_version = args.python_version if args.python_version is not None else "3.8"
python_version = args.python_version if args.python_version is not None else "3.9"

if args.use_torch_from_pypi:
configure_system(host, compiler=args.compiler, python_version=python_version)
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,5 @@ See `build.sh` for valid build environments (it's the giant switch).
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

# Set flags (see build.sh) and build image
sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
sudo bash -c 'TRITON=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
```
3 changes: 3 additions & 0 deletions .ci/docker/almalinux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ FROM base as cuda
ARG CUDA_VERSION=12.4
RUN rm -rf /usr/local/cuda-*
ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
ENV CUDA_HOME=/usr/local/cuda-${CUDA_VERSION}
# Preserve CUDA_VERSION for the builds
ENV CUDA_VERSION=${CUDA_VERSION}
Expand Down
94 changes: 36 additions & 58 deletions .ci/docker/almalinux/build.sh
Original file line number Diff line number Diff line change
@@ -1,82 +1,60 @@
#!/usr/bin/env bash
# Script used only in CD pipeline

set -eou pipefail
set -exou pipefail

image="$1"
shift

if [ -z "${image}" ]; then
echo "Usage: $0 IMAGE"
echo "Usage: $0 IMAGENAME:ARCHTAG"
exit 1
fi

DOCKER_IMAGE_NAME="pytorch/${image}"
# Go from imagename:tag to tag
DOCKER_TAG_PREFIX=$(echo "${image}" | awk -F':' '{print $2}')

CUDA_VERSION=""
if [[ "${DOCKER_TAG_PREFIX}" == cuda* ]]; then
# extract cuda version from image name and tag. e.g. manylinux2_28-builder:cuda12.8 returns 12.8
CUDA_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'cuda' '{print $2}')
fi

export DOCKER_BUILDKIT=1
TOPDIR=$(git rev-parse --show-toplevel)

CUDA_VERSION=${CUDA_VERSION:-12.1}

case ${CUDA_VERSION} in
case ${DOCKER_TAG_PREFIX} in
cpu)
BASE_TARGET=base
DOCKER_TAG=cpu
;;
all)
BASE_TARGET=all_cuda
DOCKER_TAG=latest
cuda*)
BASE_TARGET=cuda${CUDA_VERSION}
;;
*)
BASE_TARGET=cuda${CUDA_VERSION}
DOCKER_TAG=cuda${CUDA_VERSION}
echo "ERROR: Unknown docker tag ${DOCKER_TAG_PREFIX}"
exit 1
;;
esac

# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker

(
set -x
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker

docker build \
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=11" \
-t ${DOCKER_IMAGE_NAME} \
$@ \
-f "${TOPDIR}/.ci/docker/almalinux/Dockerfile" \
${TOPDIR}/.ci/docker/
)

if [[ "${DOCKER_TAG}" =~ ^cuda* ]]; then
export DOCKER_BUILDKIT=1
TOPDIR=$(git rev-parse --show-toplevel)
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')

docker build \
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=11" \
-t ${tmp_tag} \
$@ \
-f "${TOPDIR}/.ci/docker/almalinux/Dockerfile" \
${TOPDIR}/.ci/docker/

if [ -n "${CUDA_VERSION}" ]; then
# Test that we're using the right CUDA compiler
(
set -x
docker run --rm "${DOCKER_IMAGE_NAME}" nvcc --version | grep "cuda_${CUDA_VERSION}"
)
fi

GITHUB_REF=${GITHUB_REF:-$(git symbolic-ref -q HEAD || git describe --tags --exact-match)}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE_NAME}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE_NAME}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH:-}" == true ]]; then
(
set -x
docker push "${DOCKER_IMAGE_NAME}"
if [[ -n ${GITHUB_REF} ]]; then
docker tag ${DOCKER_IMAGE_NAME} ${DOCKER_IMAGE_BRANCH_TAG}
docker tag ${DOCKER_IMAGE_NAME} ${DOCKER_IMAGE_SHA_TAG}
docker push "${DOCKER_IMAGE_BRANCH_TAG}"
docker push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
docker run --rm "${tmp_tag}" nvcc --version | grep "cuda_${CUDA_VERSION}"
fi
Loading
Loading