Skip to content

Tahoe-X1 Model Addition#290

Merged
maxiallard merged 53 commits intomainfrom
tahoe1x
Dec 9, 2025
Merged

Tahoe-X1 Model Addition#290
maxiallard merged 53 commits intomainfrom
tahoe1x

Conversation

@maxiallard
Copy link
Contributor

@maxiallard maxiallard commented Nov 13, 2025

This pull request introduces support and documentation for the Tahoe-X1 model in the codebase, along with improvements to installation instructions and new tests for Tahoe-related components. The changes are grouped into documentation enhancements, workflow improvements, and new test coverage for Tahoe functionality.
IMPORTANT: Some Package Versions have been updated!

Documentation and Model Support:

  • Added detailed installation instructions for Helical with GPU support, Flash Attention, Mamba-SSM, Evo2, and Tahoe-X1, including CUDA compatibility notes and links to model cards. Also updated the README to reference the Tahoe-X1 model and its license. [1] [2] [3] [4] [5]

Workflow and Dependency Management:

  • Improved dependency installation steps in .github/workflows/main.yml by switching to python -m pip commands and adding comments for alternative installation scenarios, as well as using --no-cache-dir for mamba-ssm installation to ensure clean environments. [1] [2]

Testing and Validation for Tahoe-X1:

  • Added a comprehensive test suite for the TahoeConfig class to validate default and custom configuration parameters, immutability, and HuggingFace repo ID settings.
  • Introduced extensive tests for the Tahoe-X1 data collator, binning logic, gene expression processing, tokenization, sequence length handling, and data validation to ensure robust preprocessing and batching behavior.This pull request adds full integration and documentation for the Tahoe-X1 model within the Helical library. It introduces new configuration and usage examples, updates installation instructions, and ensures proper licensing and attribution for the Tahoe-X1 code. The changes make it easy to use Tahoe-X1 directly from Helical without requiring separate installation of the original Tahoe-X1 package, and provide clear guidance on its features and dependencies.

Tahoe-X1 Model Integration:

  • Added the helical/models/tahoe directory, including Tahoe, TahoeConfig, and the tahoe_x1 submodule, allowing users to run Tahoe-X1 models natively in Helical. The integration is self-contained and follows Helical's API patterns. [1] [2]
  • Created a comprehensive README.md for the Tahoe-X1 integration, detailing usage, features, attention implementations, dependencies, and model details.
  • Provided a sample configuration file (examples/run_models/configs/tahoe_config.yaml) and a runnable example script (examples/run_models/run_tahoe.py) demonstrating how to use the Tahoe model and extract embeddings and attention weights. [1] [2]

Documentation and Installation Updates:

  • Updated the main README.md to include instructions for installing Helical with GPU and Flash Attention support, and added sections for installing Tahoe-X1 and other supported models. Also added references to Tahoe-X1 in the list of supported models and licenses. [1] [2] [3] [4] [5]

Licensing and Attribution:

  • Added the Apache 2.0 license for Tahoe-X1 in helical/models/tahoe/LICENSE and clarified copyright and adaptation details in the Tahoe-X1 README. [1] [2]

@maxiallard maxiallard marked this pull request as draft November 13, 2025 15:52
@maxiallard maxiallard requested a review from raschedh November 14, 2025 15:21
@raschedh
Copy link
Contributor

do we need the various loss functions, dataset classes if finetuning not supported? also no .ipynb?

@maxiallard maxiallard requested a review from bputzeys November 24, 2025 08:17
@raschedh
Copy link
Contributor

raschedh commented Dec 2, 2025

retry tests when cuda is available - tests pass locally

@raschedh
Copy link
Contributor

raschedh commented Dec 3, 2025

Ignore the entirety of the llm_foundry folder. It added about 4k lines. But we have no new dependencies now.

@bputzeys
Copy link
Collaborator

bputzeys commented Dec 3, 2025

If you want, you can exclude llm foundry and the other package files from our test coverage: https://github.com/helicalAI/helical/blob/release/.github/workflows/main.yml#L32

@maxiallard maxiallard merged commit 66ab960 into main Dec 9, 2025
8 of 9 checks passed
@maxiallard maxiallard deleted the tahoe1x branch December 9, 2025 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants