Skip to content

Conversation

@cantonios
Copy link
Collaborator

Replaces custom pmap+pmax with the special-purpose multihost_utils.process_allgather. Tested in a pseudo multihost (multiprocess) test.

Replaces custom pmap+pmax with the special-purpose multihost_utils.process_allgather.  Tested in a pseudo multihost (multiprocess) test.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @cantonios, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the CPU statistics aggregation mechanism within a JAX distributed embedding layer. The primary goal is to enhance the robustness and maintainability of the multihost setup by replacing a manually implemented pmap and pmax pattern with JAX's dedicated multihost_utils.process_allgather function. This change streamlines how statistics are collected and processed across multiple hosts, ensuring more efficient and reliable operation in distributed JAX environments.

Highlights

  • CPU Stats Aggregation: Replaced a custom pmap and pmax based aggregation logic for CPU statistics with the specialized jax.experimental.multihost_utils.process_allgather utility.
  • Code Simplification: The change simplifies the aggregation process for JAX multihost environments, making the code more concise and leveraging a purpose-built JAX utility.
  • Variable Renaming: The variable full_stats was renamed to aggregated_stats to reflect the new aggregation method and maintain clarity.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request replaces a custom pmap+pmax implementation with multihost_utils.process_allgather for aggregating CPU stats in a JAX multihost setup. This change simplifies the code and leverages a specialized utility for multihost communication. I have identified a potential issue with the incrementing of the step count in the backward pass, which could lead to incorrect training behavior.

Comment on lines 603 to +607
# Aggregate stats across all processes/devices via pmax.
num_local_cpu_devices = jax.local_device_count("cpu")

def pmax_aggregate(x: Any) -> Any:
if not hasattr(x, "ndim"):
x = np.array(x)
tiled_x = np.tile(x, (num_local_cpu_devices, *([1] * x.ndim)))
return jax.pmap(
lambda y: jax.lax.pmax(y, "all_cpus"), # type: ignore[no-untyped-call]
axis_name="all_cpus",
backend="cpu",
)(tiled_x)[0]

full_stats = jax.tree.map(pmax_aggregate, stats)
all_stats = multihost_utils.process_allgather(stats)
aggregated_stats = jax.tree.map(
lambda x: jnp.max(x, axis=0), all_stats
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The original code used jax.pmap with jax.lax.pmax to aggregate statistics across devices. This change replaces that with multihost_utils.process_allgather followed by jnp.max. This seems like a good simplification, leveraging a dedicated utility for multihost aggregation.

However, it's important to ensure that process_allgather correctly handles the data sharding and aggregation across multiple hosts in your specific environment. Double-check that the resulting aggregated_stats contains the expected maximum values across all processes.

Copy link
Collaborator

@hertschuh hertschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@hertschuh hertschuh merged commit 6887dcb into keras-team:main Oct 22, 2025
7 checks passed
@abheesht17
Copy link
Collaborator

Thanks! This works for a dummy dataset on #144. For the real dataset though, it freezes before training. Trying to work through this, maybe something wrong with my processing code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants