[pull] master from tensorflow:master by pull[bot] · Pull Request #62 · noaai/tensorflow

pull · 2025-01-15T23:08:02Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

…er (NaNs go last). PiperOrigin-RevId: 715826491

Moves - byte_order.h - crash_analysis.h - dynamic_annotations.h - grpc_credentials.h - intrusive_ptr.h - prefetch.h - ram_file_system.h - resource.h - resource_loader.h - rocm_rocdl_path.h - stack_frame.h PiperOrigin-RevId: 715828782

PiperOrigin-RevId: 715831514

This method was renamed but staging function kept, switch to renamed variant. PiperOrigin-RevId: 715859237

Support Lock and Unlock, instantiate MLD cl environment as singleton instance. Added CompileModel CPU test with OpenCL Tensorbuffers as inputs and outputs. PiperOrigin-RevId: 715860165

PiperOrigin-RevId: 715863802

…saBufferInterval are inclusive. Update logging in MSA to indicate as much. PiperOrigin-RevId: 715882309

We had the GIL released when constructing an nb::bytes object, which isn't allowed. In passing, also avoid an unnecessary string copy. PiperOrigin-RevId: 715886008

…calizer` `DynamicDimensionInference` expects all conditional inputs/outputs to be tuplized so that it can easily add more inputs and `RET_CHECK`-fails otherwise, but `ConditionalCanonicalizer` only canonicalizes the outputs. This CL changes the canonicalizer to tuplize the inputs of conditionals as well. PiperOrigin-RevId: 715887964

…r::MemoryAllocators. PiperOrigin-RevId: 715890862

PiperOrigin-RevId: 715898241

PiperOrigin-RevId: 715900904

…lder PiperOrigin-RevId: 715902371

PiperOrigin-RevId: 715902503

Imported from GitHub PR openxla/xla#21273 `ncclCommInitRankScalable` enables the initialization of communicators via multiple roots which improves the init performance at large scale. The maximum number of ranks associated with a root rank to initialize a NCCL communicator can be tuned via `--xla_gpu_nccl_init_max_rank_per_root_ratio`. Default is 128 ranks per root. Copybara import of the project: -- 98ef02dabc0bcb2c8206753bec4873c5f48e269f by Nicolas Castet <ncastet@nvidia.com>: [XLA:GPU] Add support for NCCL ncclCommInitRankScalable API -- f146a48fef5f1a1098b5c01ae79c5a0d9a9af8d7 by Nicolas Castet <ncastet@nvidia.com>: Address review comments -- dd6362af36a1f4d22532ad15b2007527898b5fa1 by Nicolas Castet <ncastet@nvidia.com>: Add GpuCliqueKey::GetSubKeys unit test Merging this change closes #21273 PiperOrigin-RevId: 715903412

+ Correctly (zero/value-)initialize PJRT_ExecuteOptions in tests and pjrt_c_api_client ``` If the number of initializer clauses is less than the number of members or initializer list is completely empty, the remaining members are value-initialized ``` Context: openxla/xla#20429 PiperOrigin-RevId: 715906024

PiperOrigin-RevId: 715909749

…lex buffer api. PiperOrigin-RevId: 715918395

…nge the function name MacOS mangling changes the function name, use less strict contains check that must work on all platforms. PiperOrigin-RevId: 715919685

…(dimensions whose size is 1). It is meaningless to partition a dimension whose size is 1. Redundant padding and unpadding may be inserted. To avoid this, we replicate the sharding on these dimensions as a pre-processing. Take the following input as example ``` ENTRY entry { %constant.785 = f32[1,8] constant({{0,1,2,3,4,5,6,7}}), sharding={devices=[1,8]<=[8]} %slice.62 = f32[1,1] slice(%constant.785), slice={[0:1], [0:1]}, sharding={devices=[1,8]<=[8]} ROOT %reshape.779 = f32[] reshape(%slice.62), sharding={replicated} } ``` Previous result with redundant instructions ``` ENTRY %entry_spmd () -> f32[] { %constant.8 = u32[8]{0} constant({0, 1, 2, 3, 4, 5, 6, 7}) %partition-id = u32[] partition-id() %dynamic-slice.3 = u32[1]{0} dynamic-slice(u32[8]{0} %constant.8, u32[] %partition-id), dynamic_slice_sizes={1} %reshape.2 = u32[] reshape(u32[1]{0} %dynamic-slice.3) %constant.9 = u32[] constant(0) %compare = pred[] compare(u32[] %reshape.2, u32[] %constant.9), direction=EQ %broadcast = pred[1,1]{1,0} broadcast(pred[] %compare), dimensions={} %constant.0 = f32[1,8]{1,0} constant({ { 0, 1, 2, 3, 4, 5, 6, 7 } }) %constant.1 = s32[] constant(0) %constant.2 = s32[8]{0} constant({0, 1, 2, 3, 4, 5, 6, 7}) %dynamic-slice = s32[1]{0} dynamic-slice(s32[8]{0} %constant.2, u32[] %partition-id), dynamic_slice_sizes={1} %reshape = s32[] reshape(s32[1]{0} %dynamic-slice) %dynamic-slice.1 = f32[1,1]{1,0} dynamic-slice(f32[1,8]{1,0} %constant.0, s32[] %constant.1, s32[] %reshape), dynamic_slice_sizes={1,1} %copy = f32[1,1]{1,0} copy(f32[1,1]{1,0} %dynamic-slice.1) %constant.10 = f32[] constant(0) %broadcast.1 = f32[1,1]{1,0} broadcast(f32[] %constant.10), dimensions={} %select = f32[1,1]{1,0} select(pred[1,1]{1,0} %broadcast, f32[1,1]{1,0} %copy, f32[1,1]{1,0} %broadcast.1) %all-reduce = f32[1,1]{1,0} all-reduce(f32[1,1]{1,0} %select), channel_id=1, replica_groups={{0,1,2,3,4,5,6,7}}, use_global_device_ids=true, to_apply=%add.clone ROOT %reshape.3 = f32[] reshape(f32[1,1]{1,0} %all-reduce) } ``` Result with this improvement ``` ENTRY %entry_spmd () -> f32[] { %constant.0 = f32[1,8]{1,0} constant({ { 0, 1, 2, 3, 4, 5, 6, 7 } }) %slice.0 = f32[1,1]{1,0} slice(f32[1,8]{1,0} %constant.0), slice={[0:1], [0:1]} ROOT %reshape.1 = f32[] reshape(f32[1,1]{1,0} %slice.0) } ``` PiperOrigin-RevId: 715924899

PiperOrigin-RevId: 715934702

PiperOrigin-RevId: 715950330

thomasjoerg and others added 22 commits January 15, 2025 09:23

[XLA:GPU] Use Cub RaddixSort for f16, f32, and f64 sorts in Numpy ord…

0928aab

…er (NaNs go last). PiperOrigin-RevId: 715826491

Move various headers to xla/tsl

4b4d846

Moves - byte_order.h - crash_analysis.h - dynamic_annotations.h - grpc_credentials.h - intrusive_ptr.h - prefetch.h - ram_file_system.h - resource.h - resource_loader.h - rocm_rocdl_path.h - stack_frame.h PiperOrigin-RevId: 715828782

Support expanding ragged all-to-all dims similar to all-to-alls.

aebfc64

PiperOrigin-RevId: 715831514

Update to match upstream API change (NFC).

837b770

This method was renamed but staging function kept, switch to renamed variant. PiperOrigin-RevId: 715859237

OpenCL tensor buffer for litert

bb4399d

Support Lock and Unlock, instantiate MLD cl environment as singleton instance. Added CompileModel CPU test with OpenCL Tensorbuffers as inputs and outputs. PiperOrigin-RevId: 715860165

Remove TFL Interpreter deprecation notice from LiteRT Interpreter.

c61956c

PiperOrigin-RevId: 715863802

AllocationRequest.end_time is inclusive. The start and end times of M…

e3f3385

…saBufferInterval are inclusive. Update logging in MSA to indicate as much. PiperOrigin-RevId: 715882309

[XLA:Python] Fix scoping of gil_release.

f07c615

We had the GIL released when constructing an nb::bytes object, which isn't allowed. In passing, also avoid an unnecessary string copy. PiperOrigin-RevId: 715886008

Create a generic tsl::Allocator that works in terms of stream_executo…

25e5c26

…r::MemoryAllocators. PiperOrigin-RevId: 715890862

Reverts 7869999

96f6c59

PiperOrigin-RevId: 715898241

Implement CopyRawToHost for TfrtCpuClient.

b42138d

PiperOrigin-RevId: 715900904

[xla:cpu:benchmarks] Move benchmarks to the new //xla/backends/cpu fo…

27ebded

…lder PiperOrigin-RevId: 715902371

Reverts a7703e7

854a426

PiperOrigin-RevId: 715902503

Reverts ad41d6f

1893c10

PiperOrigin-RevId: 715909749

Add functions for working with dispatch op custom options using the f…

d92a503

…lex buffer api. PiperOrigin-RevId: 715918395

[xla:cpu] Use StringRef::contains in test as mangling rules might cha…

06e5b3b

…nge the function name MacOS mangling changes the function name, use less strict contains check that must work on all platforms. PiperOrigin-RevId: 715919685

No public description

3bc00d7

PiperOrigin-RevId: 715934702

Make ML Drift's fingerprinting logic into a helper function

583b4aa

PiperOrigin-RevId: 715950330

pull bot added the ⤵️ pull label Jan 15, 2025

pull bot merged commit 583b4aa into noaai:master Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from tensorflow:master#62

[pull] master from tensorflow:master#62
pull[bot] merged 22 commits intonoaai:masterfrom
tensorflow:master

pull bot commented Jan 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Conversation

pull bot commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

pull bot commented Jan 15, 2025 •

edited

Loading