Skip to content

sequence parallel #26

Merged
meichangsu1 merged 1 commit intodevfrom
sp_ljl_dev
Feb 3, 2026
Merged

sequence parallel #26
meichangsu1 merged 1 commit intodevfrom
sp_ljl_dev

Conversation

@meichangsu1
Copy link
Copy Markdown
Collaborator

use ray in cookbook single_controller_sp.py

@meichangsu1 meichangsu1 changed the base branch from main to dev February 3, 2026 03:11
@meichangsu1 meichangsu1 merged commit 7b4acd9 into dev Feb 3, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @meichangsu1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the project's capabilities by establishing a robust, distributed training ecosystem for large-scale AI models. It introduces a client-server architecture for interacting with Ray-based services, alongside a rich set of tools for managing model parallelism, efficient data pipelines, and multimodal model support. The changes lay a strong foundation for scalable and flexible model development and deployment.

Highlights

  • Distributed Training Framework: Introduced a comprehensive distributed training framework leveraging Ray for orchestrating large language models (LLMs) and multimodal LLMs (MLLMs), supporting various parallelism strategies including Tensor Parallelism (TP), Pipeline Parallelism (PP), Data Parallelism (DP), Context Parallelism (CP), Expert Parallelism (EP), and Sequence Parallelism (SP).
  • Megatron-Core Integration: Deeply integrated Megatron-Core, providing MegatronModel and MultiLoraMegatronModel classes, along with a GPTBridge for seamless weight conversion between HuggingFace and Megatron formats, and custom tuners for Megatron-compatible LoRA.
  • Multi-LoRA and PEFT Support: Implemented robust support for multi-LoRA training, allowing multiple LoRA adapters to be managed and switched dynamically, with specialized LoraParallelLinear for Megatron-Core and enhanced MultiLoraTransformersModel.
  • Advanced Data Handling: Enhanced data loading and processing capabilities with new DataLoader features including retry mechanisms, device mesh-aware sampling, and advanced dataset types like IterableDataset, PackingDataset, and LazyDataset for efficient data streaming and binpacking.
  • Extensive Cookbook Examples: Added a wide array of cookbook examples demonstrating various training scenarios, including SFT, GRPO, and VLM LoRA training, across both Transformers and Megatron backends, with local and Ray execution modes.
  • Modular Architecture: Refactored the project into a more modular architecture with new base classes for TwinkleModel, Loss, Metric, InputProcessor, and Template, facilitating extensibility and maintainability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .github/copilot-instructions.md
    • Added new guidelines for AI coding agents.
  • .gitignore
    • Added new entries for src/twinkle_client subdirectories, .locks/, .qoder, and test_cookbook/ to ignore generated files and temporary artifacts.
  • .pre-commit-config.yaml
    • Updated versions for flake8, isort, yapf, and pre-commit-hooks to their latest stable releases.
    • Removed the fix-encoding-pragma hook.
  • ROADMAP.md
    • Added a new project roadmap outlining key development goals including SFT, Ray SFT, HTTP SFT, multimodal model support, Megatron integration, Multi-LoRA, and API optimizations.
  • client_tools/client_generator.py
    • Added a new script to automatically generate client-side Python wrappers for various twinkle modules, enabling client-server interaction via HTTP.
  • cookbook/client/tinker/megatron/lora.py
    • Added a new cookbook example demonstrating LoRA training for Megatron models using the Tinker client.
  • cookbook/client/tinker/megatron/server.py
    • Added a new Ray Serve configuration script for deploying Megatron models with the Tinker client.
  • cookbook/client/tinker/megatron/server_config.yaml
    • Added a new YAML configuration file for the Tinker client's Megatron server deployment.
  • cookbook/client/tinker/transformer/lora.py
    • Added a new cookbook example demonstrating LoRA training for Transformers models using the Tinker client.
  • cookbook/client/tinker/transformer/sample.py
    • Added a new cookbook example demonstrating sampling from Transformers models using the Tinker client.
  • cookbook/client/tinker/transformer/server.py
    • Added a new Ray Serve configuration script for deploying Transformers models with the Tinker client.
  • cookbook/client/tinker/transformer/server_config.yaml
    • Added a new YAML configuration file for the Tinker client's Transformers server deployment.
  • cookbook/client/twinkle/megatron/lora.py
    • Added a new cookbook example demonstrating LoRA training for Megatron models using the Twinkle client.
  • cookbook/client/twinkle/megatron/server.py
    • Added a new Ray Serve configuration script for deploying Megatron models with the Twinkle client.
  • cookbook/client/twinkle/megatron/server_config.yaml
    • Added a new YAML configuration file for the Twinkle client's Megatron server deployment.
  • cookbook/client/twinkle/transformer/grpo_lora.py
    • Added a new cookbook example demonstrating GRPO LoRA training for Transformers models using the Twinkle client.
  • cookbook/client/twinkle/transformer/lora.py
    • Added a new cookbook example demonstrating LoRA training for Transformers models using the Twinkle client.
  • cookbook/client/twinkle/transformer/server.py
    • Added a new Ray Serve configuration script for deploying Transformers models with the Twinkle client.
  • cookbook/client/twinkle/transformer/server_config.yaml
    • Added a new YAML configuration file for the Twinkle client's Transformers server deployment.
  • cookbook/grpo/lora.py
    • Added a new cookbook example for GRPO LoRA training using Transformers in local mode.
  • cookbook/grpo/lora_gpu.py
    • Added a new cookbook example for GRPO LoRA training on GPU, supporting local and Ray modes with VLLMSampler/TorchSampler.
  • cookbook/grpo/lora_npu.py
    • Added a new cookbook example for GRPO LoRA training on NPU, supporting Ray mode with VLLMSampler/TorchSampler.
  • cookbook/megatron/lora.py
    • Added a new cookbook example for Megatron-Core LoRA training, supporting local (torchrun) and Ray modes with sequence parallelism.
  • cookbook/megatron/moe_lora.py
    • Added a new cookbook example for Megatron-Core MoE LoRA training, supporting Expert Parallelism in local and Ray modes.
  • cookbook/megatron/vlm_lora.py
    • Added a new cookbook example for Megatron-Core VLM LoRA training.
  • cookbook/remote/tinker/lora.py
    • Added a new cookbook example for remote Tinker client LoRA training.
  • cookbook/remote/tinker/server.py
    • Added a new Ray Serve configuration script for remote Tinker client deployment.
  • cookbook/remote/tinker/server_config.yaml
    • Added a new YAML configuration file for the remote Tinker client server deployment.
  • cookbook/remote/twinkle/lora.py
    • Added a new cookbook example for remote Twinkle client LoRA training.
  • cookbook/remote/twinkle/server.py
    • Added a new Ray Serve configuration script for remote Twinkle client deployment.
  • cookbook/remote/twinkle/server_config.yaml
    • Added a new YAML configuration file for the remote Twinkle client server deployment.
  • cookbook/sft/full_sft.py
    • Added a new cookbook example for full SFT training.
  • cookbook/sft/local_dataset.py
    • Added a new cookbook example for SFT training with a local dataset.
  • cookbook/sft/lora_npu.py
    • Added a new cookbook example for LoRA SFT training on NPU.
  • cookbook/sft/multi_lora.py
    • Added a new cookbook example for multi-LoRA SFT training.
  • cookbook/sft/single_controller.py
    • Added a new cookbook example for single controller SFT training.
  • cookbook/sft/single_controller_sp.py
    • Added a new cookbook example for single controller SFT training with sequence parallelism.
  • cookbook/sft/single_program.py
    • Added a new cookbook example for single program SFT training.
  • cookbook/sft/single_program_full.py
    • Added a new cookbook example for full single program SFT training.
  • cookbook/sft/single_program_megatron.py
    • Added a new cookbook example for single program Megatron SFT training.
  • cookbook/sft/streaming_dataset.py
    • Added a new cookbook example for SFT training with a streaming dataset.
  • cookbook/sft/vlm_lora.py
    • Added a new cookbook example for VLM LoRA training with the Transformers backend.
  • examples/expert_parallel/train_qwen3_30b_ep_fsdp_demo.py
    • Added a new example for Qwen3-30B Expert Parallelism + FSDP2 training.
  • pyproject.toml
    • Added a new Poetry configuration file, defining project metadata, dependencies, and optional dependencies for various frameworks and tools.
  • src/twinkle/init.py
    • Updated lazy import structure to include model.moe and infra modules.
  • src/twinkle/data_format/init.py
    • Added a new module for defining data formats.
  • src/twinkle/data_format/input_feature.py
    • Added InputFeature TypedDict for standardized LLM/MLLM input features.
  • src/twinkle/data_format/message.py
    • Added ToolCall, Tool, and Message TypedDicts for conversational and tool-use data formats.
  • src/twinkle/data_format/trajectory.py
    • Added Trajectory TypedDict for representing interaction trajectories in RL algorithms.
  • src/twinkle/dataloader/init.py
    • Added DeviceMeshIterableFetcher to the dataloader module.
  • src/twinkle/dataloader/dataloader.py
    • Added a new DataLoader wrapper with retry mechanisms and device mesh-aware data distribution.
  • src/twinkle/dataloader/device_mesh_fetcher.py
    • Added DeviceMeshIterableFetcher for fetching data from iterable datasets based on device mesh configuration.
  • src/twinkle/dataloader/device_mesh_sampler.py
    • Added DeviceMeshSampler to shard data batches according to the current data parallel rank.
  • src/twinkle/dataloader/retry_sampler.py
    • Added RetrySampler to re-sample data points that fail during processing.
  • src/twinkle/dataset/init.py
    • Added IterableDataset, IterablePackingDataset, and PackingDataset to the dataset module.
  • src/twinkle/dataset/base.py
    • Added a new base Dataset class with DatasetMeta for loading, mapping, filtering, and mixing datasets.
  • src/twinkle/dataset/iterable_dataset.py
    • Added IterableDataset wrapper for streaming datasets.
  • src/twinkle/dataset/iterable_packing_dataset.py
    • Added IterablePackingDataset for efficient binpacking of streaming dataset rows.
  • src/twinkle/dataset/lazy_dataset.py
    • Added LazyDataset for lazy tokenization, particularly useful for multimodal datasets to prevent OOM errors.
  • src/twinkle/dataset/packing_dataset.py
    • Added PackingDataset for binpacking dataset rows into batches of optimal length.
  • src/twinkle/gym/init.py
    • Added a new module for Gym-like environments.
  • src/twinkle/gym/base.py
    • Added a base Gym class.
  • src/twinkle/hub/init.py
    • Added a new module for hub operations.
  • src/twinkle/hub/hub.py
    • Added HubOperation, MSHub, and HFHub classes for unified management of models and datasets from various hubs.
  • src/twinkle/infra/ray/init.py
    • Renamed to tests/infra/__init__.py.
  • src/twinkle/kernel/README.md
    • Added comprehensive documentation for the Twinkle Kernel Module, detailing layer-level and function-level kernel replacement.
  • src/twinkle/kernel/init.py
    • Updated kernel registration and application functions.
  • src/twinkle/kernel/base.py
    • Added base classes, environment variable handling, and device detection utilities for the kernel module.
  • src/twinkle/kernel/function.py
    • Added utilities for function-level kernel replacement and registration.
  • src/twinkle/kernel/layer.py
    • Added utilities for layer-level kernel replacement and registration.
  • src/twinkle/kernel/registry.py
    • Added a registry system for managing layer and function kernel registrations.
  • src/twinkle/loss/init.py
    • Expanded the loss module to include various new loss functions: GRPOLoss, GSPOLoss, SAPOLoss, CISPOLoss, BNPOLoss, DRGRPOLoss, VocabParallelCrossEntropyLoss, ChunkedCrossEntropyLoss, CosineSimilarityLoss, GenerativeRerankerLoss, ListwiseGenerativeRerankerLoss, ListwiseRerankerLoss, MSELoss, OnlineContrastiveLoss, RerankerLoss, and CrossEntropyLoss.
  • src/twinkle/loss/base.py
    • Added a base Loss abstract class.
  • src/twinkle/loss/chunked_cross_entropy.py
    • Added ChunkedCrossEntropyLoss for memory-efficient cross-entropy computation.
  • src/twinkle/loss/contrastive_loss.py
    • Added ContrastiveLoss for contrastive learning objectives.
  • src/twinkle/loss/cosine_similarity.py
    • Added CosineSimilarityLoss for similarity-based learning.
  • src/twinkle/loss/cross_entropy.py
    • Added CrossEntropyLoss for standard classification tasks.
  • src/twinkle/loss/generative_reranker.py
    • Added GenerativeRerankerLoss for generative reranking tasks.
  • src/twinkle/loss/grpo.py
    • Added GRPOLoss and its variants (GSPOLoss, SAPOLoss, CISPOLoss, BNPOLoss, DRGRPOLoss) for reinforcement learning policy optimization.
  • src/twinkle/loss/infonce.py
    • Added InfoNCELoss for information-theoretic contrastive learning.
  • src/twinkle/loss/listwise_generative_reranker.py
    • Added ListwiseGenerativeRerankerLoss for list-wise generative reranking.
  • src/twinkle/loss/listwise_reranker.py
    • Added ListwiseRerankerLoss for list-wise reranking tasks.
  • src/twinkle/loss/mse.py
    • Added MSELoss for regression tasks.
  • src/twinkle/loss/online_contrastive_loss.py
    • Added OnlineContrastiveLoss for online contrastive learning.
  • src/twinkle/loss/reranker.py
    • Added RerankerLoss for reranking tasks.
  • src/twinkle/loss/vocab_parallel_cross_entropy.py
    • Added VocabParallelCrossEntropyLoss for Megatron-Core's tensor parallel vocabulary sharding.
  • src/twinkle/loss_scale/init.py
    • Added a new module for loss scaling functionalities.
  • src/twinkle/loss_scale/base.py
    • Added a base LossScale class.
  • src/twinkle/metric/init.py
    • Added a new module for metrics.
  • src/twinkle/metric/accuracy.py
    • Added Accuracy metric for classification tasks.
  • src/twinkle/metric/base.py
    • Added a base Metric abstract class.
  • src/twinkle/metric/loss.py
    • Added LossMetric for tracking and aggregating loss values.
  • src/twinkle/metric/train_metric.py
    • Added TrainMetric for tracking training progress like learning rate and speed.
  • src/twinkle/model/init.py
    • Updated lazy import structure to include megatron and transformers submodules.
  • src/twinkle/model/base.py
    • Added a base TwinkleModel abstract class defining the common interface for all models.
  • src/twinkle/model/megatron/init.py
    • Added a new module for Megatron-Core models.
  • src/twinkle/model/megatron/args.py
    • Added TwinkleMegatronArgs for configuring Megatron models, compatible with Megatron's get_args().
  • src/twinkle/model/megatron/megatron.py
    • Added MegatronModel for training models with Megatron-Core, including distributed optimizer and LoRA support.
  • src/twinkle/model/megatron/model/init.py
    • Added components for Megatron model architecture.
  • src/twinkle/model/megatron/model/constant.py
    • Added constants for LLM and MLLM model types.
  • src/twinkle/model/megatron/model/gpt_bridge.py
    • Added GPTBridge for converting weights between HuggingFace and Megatron formats, and MultimodalGPTBridge for multimodal models.
  • src/twinkle/model/megatron/model/gpt_model.py
    • Added GPTModel (inheriting from Megatron-Core's GPTModel) with custom OutputLayerLinear and RoPE scaling support.
  • src/twinkle/model/megatron/model/gpts/init.py
    • Registered GPT model types for Megatron-Core.
  • src/twinkle/model/megatron/model/mm_gpt_model.py
    • Added MultimodalGPTModel for handling multimodal inputs within the Megatron framework.
  • src/twinkle/model/megatron/model/mm_gpts/init.py
    • Added components for multimodal GPT models.
  • src/twinkle/model/megatron/model/mm_gpts/qwen.py
    • Added Qwen2_5VL_Vit and Qwen2_5VLBridge for Qwen2.5-VL model support.
  • src/twinkle/model/megatron/model/mm_gpts/qwen3_vl.py
    • Added Qwen3VLTransformerBlock, Qwen3VLGPTModel, Qwen3OmniBridge, and Qwen3VL_Vit for Qwen3-VL model support.
  • src/twinkle/model/megatron/model/mm_gpts/utils.py
    • Added utilities for patching HuggingFace modules for Megatron compatibility.
  • src/twinkle/model/megatron/model/register.py
    • Added a registry system for Megatron models.
  • src/twinkle/model/megatron/model/rope.py
    • Added utilities for dynamic RoPE scaling and frequency updates.
  • src/twinkle/model/megatron/multi_lora_megatron.py
    • Added MultiLoraMegatronModel for multi-LoRA training with Megatron-Core, enabling dynamic adapter management.
  • src/twinkle/model/megatron/strategy/init.py
    • Added Megatron strategy module.
  • src/twinkle/model/megatron/strategy/megatron.py
    • Added MegatronStrategy for wrapping models with Megatron's Distributed Data Parallel (DDP) and handling parallelism configurations.
  • src/twinkle/model/megatron/tuners/init.py
    • Added tuners module.
  • src/twinkle/model/megatron/tuners/lora.py
    • Added LoraParallelLinear for Megatron-compatible LoRA layers, supporting tensor parallel sharding.
  • src/twinkle/model/megatron/tuners/utils.py
    • Added utility functions for Megatron-Core integration, including finding linear/router/embedding layers and patching deepcopy.
  • src/twinkle/model/megatron/utils/init.py
    • Added Megatron utilities module.
  • src/twinkle/model/megatron/utils/config.py
    • Added utilities for converting HuggingFace model configurations to Megatron-Core compatible arguments.
  • src/twinkle/model/megatron/utils/utils.py
    • Added utilities for splitting inputs for Context Parallelism.
  • src/twinkle/model/moe/init.py
    • Added a new module for Mixture-of-Experts (MoE) functionalities.
  • src/twinkle/model/moe/expert_parallel.py
    • Added apply_expert_parallel for sharding experts across devices in MoE models.
  • src/twinkle/model/transformers/init.py
    • Added Transformers model components.
  • src/twinkle/model/transformers/multi_lora_transformers.py
    • Added MultiLoraTransformersModel for multi-LoRA training with the HuggingFace Transformers backend.
  • src/twinkle/model/transformers/strategy/init.py
    • Added Transformers strategies module.
  • src/twinkle/model/transformers/strategy/accelerate.py
    • Added AccelerateStrategy for wrapping models using the Accelerate library.
  • src/twinkle/model/transformers/strategy/native_fsdp.py
    • Added NativeFSDPStrategy for FSDP2 with explicit process group control.
  • src/twinkle/plugin/init.py
    • Renamed to tests/kernel/__init__.py.
Activity
  • The pull request introduces a vast amount of new code, including new modules for data formats, dataloaders, datasets, Gym, Hub operations, kernel management, loss functions, metrics, and extensive model implementations for both Megatron-Core and HuggingFace Transformers.
  • Numerous cookbook examples have been added to demonstrate the usage of the new distributed training features, covering various parallelism strategies and LoRA configurations.
  • The project's build system has been updated with a new pyproject.toml file, and code quality tools have been revised in .pre-commit-config.yaml.
  • A ROADMAP.md file has been added, indicating a clear future direction for the project's development.
  • The core twinkle/__init__.py has been updated to reflect the new modular structure and lazy imports.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the single_controller_sp.py example to use Ray for distributed execution, abstracting away the distributed setup into the twinkle library. The changes are generally good and improve the clarity and scalability of the example. I've found one area for improvement in the error handling within the new distributed initialization logic to prevent silent failures, which could be difficult to debug.

Comment on lines +501 to +502
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a bare except Exception: pass can hide critical initialization errors. If dist.init_process_group fails, it will do so silently, leading to hard-to-diagnose failures later in the distributed workflow. It's much safer to log any exceptions that occur during this process to aid in debugging.

Suggested change
except Exception:
pass
except Exception as e:
import logging
logging.warning(f"Failed to initialize torch.distributed in Ray worker: {e}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant