Skip to content

Add Pathways Integration#31

Merged
jeffcarp merged 12 commits intomainfrom
pathways-1
Feb 20, 2026
Merged

Add Pathways Integration#31
jeffcarp merged 12 commits intomainfrom
pathways-1

Conversation

@divyashreepathihalli
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli commented Feb 19, 2026

  • Pathways Backend Integration: Introduced a new PathwaysBackend to support ML Pathways (LeaderWorkerSet) for remote job execution.
  • Automatic Backend Selection: The @keras_remote.run decorator now automatically selects the 'pathways' backend for multi-host TPU configurations, while retaining 'gke' as the default for other scenarios.
  • New Pathways Client: A dedicated 'pathways_client.py' module was added to handle the submission, monitoring, and cleanup of LeaderWorkerSet jobs on Kubernetes.
  • LeaderWorkerSet (LWS) Installation: Added utilities and constants to facilitate the installation of the LeaderWorkerSet Custom Resource Definition (CRD) on GKE clusters.
  • Pathways Example: Included a new example (pathways_example.py) demonstrating how to train a simple Keras model remotely using the Pathways integration with a JAX backend.

Example execution.
image

Fixes #19

@divyashreepathihalli divyashreepathihalli marked this pull request as draft February 19, 2026 21:17
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @divyashreepathihalli, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for ML Pathways (LeaderWorkerSet) as an alternative backend for keras_remote job execution. It enables users to leverage Pathways for distributed training, particularly for multi-host TPU setups, by providing a new backend implementation, client-side logic for managing LeaderWorkerSet resources, and an updated 'run' decorator with intelligent backend auto-detection. This enhancement expands the capabilities of keras_remote for scalable machine learning workloads on Kubernetes.

Highlights

  • Pathways Backend Integration: Introduced a new PathwaysBackend to support ML Pathways (LeaderWorkerSet) for remote job execution.
  • Automatic Backend Selection: The @keras_remote.run decorator now automatically selects the 'pathways' backend for multi-host TPU configurations, while retaining 'gke' as the default for other scenarios.
  • New Pathways Client: A dedicated 'pathways_client.py' module was added to handle the submission, monitoring, and cleanup of LeaderWorkerSet jobs on Kubernetes.
  • LeaderWorkerSet (LWS) Installation: Added utilities and constants to facilitate the installation of the LeaderWorkerSet Custom Resource Definition (CRD) on GKE clusters.
  • Pathways Example: Included a new example (pathways_example.py) demonstrating how to train a simple Keras model remotely using the Pathways integration with a JAX backend.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • examples/pathways_example.py
    • Added a new example script demonstrating remote Keras model training using the Pathways backend with a JAX environment.
  • keras_remote/backend/execution.py
    • Imported 'pathways_client'.
    • Refactored 'GKEBackend' to inherit from a new 'BaseK8sBackend' class.
    • Introduced 'PathwaysBackend' class with methods for submitting, waiting for, and cleaning up LeaderWorkerSet jobs.
  • keras_remote/backend/pathways_client.py
    • Added a new module containing functions for interacting with the Kubernetes LeaderWorkerSet API, including '_get_lws_version', 'submit_pathways_job', 'wait_for_job', 'cleanup_job', and '_create_lws_spec'.
  • keras_remote/cli/constants.py
    • Added 'LWS_INSTALL_URL' constant pointing to the LeaderWorkerSet CRD manifest.
  • keras_remote/cli/infra/post_deploy.py
    • Imported 'LWS_INSTALL_URL'.
    • Added 'install_lws' function to apply the LeaderWorkerSet CRD to a Kubernetes cluster.
  • keras_remote/core/core.py
    • Imported 'PathwaysBackend' and 'accelerators'.
    • Modified the 'run' decorator to accept an optional 'backend' parameter.
    • Implemented logic within 'run' to automatically select 'pathways' backend for multi-host TPUs or fall back to 'gke'.
    • Added '_execute_on_pathways' function to delegate execution to the 'PathwaysBackend'.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Pathways integration as a new backend for remote execution, which is a great addition. The changes are well-organized, with a new PathwaysBackend and pathways_client to handle LeaderWorkerSet (LWS) jobs on GKE. The auto-detection of the backend based on the accelerator type in core.py is a nice touch that aligns with the goal of minimizing user configuration. I have a couple of suggestions to improve maintainability and consistency.

@divyashreepathihalli divyashreepathihalli marked this pull request as ready for review February 19, 2026 22:41
Copy link
Collaborator

@JyotinderSingh JyotinderSingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes! Left a few comments.

Also, I'm a little unfamiliar with Pathways, but from my understanding we might need to add a couple of environment variables for collective communication (MEGASCALE_COORDINATOR_ADDRESS, TPU_WORKER_ID, etc.). The leader and worker templates are identical for now, they might need to integrate this information.

Comment on lines +70 to +78
def install_lws():
"""Install the LeaderWorkerSet custom resource controller.

This enables Pathways scheduling on the GKE cluster.
"""
subprocess.run(
["kubectl", "apply", "--server-side", "-f", LWS_INSTALL_URL],
check=True,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not calling this function anywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, missed adding this to up.py - done

@divyashreepathihalli
Copy link
Collaborator Author

Thanks for the review Jyotinder. Updated the code and re-ran the example to verify execution.

Copy link
Member

@jeffcarp jeffcarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Lgtm after comments

history = model.fit(x, y, epochs=5, batch_size=32, validation_split=0.2)

print("\nTraining completed successfully on Pathways!")
return history.history
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, there are no user code changes to run on Pathways within their remote function? All it needs is backend="pathways"? That's pretty cool

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah if it is None. It auto detects if the user requested for a multi node TPU and picks the pathways backend

)
logger.info(f"Deleted LeaderWorkerSet: {job_name}")
except ApiException as e:
if e.status != 404:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it is 404? Should it be re-raised?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this. A 404 Not Found during deletion means the LeaderWorkerSet has already been deleted. we should not re raise this as it would crash the clean up routine.

@jeffcarp
Copy link
Member

Thanks!

@jeffcarp jeffcarp merged commit c6b4dad into main Feb 20, 2026
2 checks passed
@JyotinderSingh JyotinderSingh deleted the pathways-1 branch February 25, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate Pathways on top of GKE standard

3 participants