Tahoe-X1 Model Addition by maxiallard · Pull Request #290 · helicalAI/helical

maxiallard · 2025-11-13T15:52:12Z

This pull request introduces support and documentation for the Tahoe-X1 model in the codebase, along with improvements to installation instructions and new tests for Tahoe-related components. The changes are grouped into documentation enhancements, workflow improvements, and new test coverage for Tahoe functionality.
IMPORTANT: Some Package Versions have been updated!

Documentation and Model Support:

Added detailed installation instructions for Helical with GPU support, Flash Attention, Mamba-SSM, Evo2, and Tahoe-X1, including CUDA compatibility notes and links to model cards. Also updated the README to reference the Tahoe-X1 model and its license. [1] [2] [3] [4] [5]

Workflow and Dependency Management:

Improved dependency installation steps in .github/workflows/main.yml by switching to python -m pip commands and adding comments for alternative installation scenarios, as well as using --no-cache-dir for mamba-ssm installation to ensure clean environments. [1] [2]

Testing and Validation for Tahoe-X1:

Added a comprehensive test suite for the TahoeConfig class to validate default and custom configuration parameters, immutability, and HuggingFace repo ID settings.
Introduced extensive tests for the Tahoe-X1 data collator, binning logic, gene expression processing, tokenization, sequence length handling, and data validation to ensure robust preprocessing and batching behavior.This pull request adds full integration and documentation for the Tahoe-X1 model within the Helical library. It introduces new configuration and usage examples, updates installation instructions, and ensures proper licensing and attribution for the Tahoe-X1 code. The changes make it easy to use Tahoe-X1 directly from Helical without requiring separate installation of the original Tahoe-X1 package, and provide clear guidance on its features and dependencies.

Tahoe-X1 Model Integration:

Added the helical/models/tahoe directory, including Tahoe, TahoeConfig, and the tahoe_x1 submodule, allowing users to run Tahoe-X1 models natively in Helical. The integration is self-contained and follows Helical's API patterns. [1] [2]
Created a comprehensive README.md for the Tahoe-X1 integration, detailing usage, features, attention implementations, dependencies, and model details.
Provided a sample configuration file (examples/run_models/configs/tahoe_config.yaml) and a runnable example script (examples/run_models/run_tahoe.py) demonstrating how to use the Tahoe model and extract embeddings and attention weights. [1] [2]

Documentation and Installation Updates:

Updated the main README.md to include instructions for installing Helical with GPU and Flash Attention support, and added sections for installing Tahoe-X1 and other supported models. Also added references to Tahoe-X1 in the list of supported models and licenses. [1] [2] [3] [4] [5]

Licensing and Attribution:

Added the Apache 2.0 license for Tahoe-X1 in helical/models/tahoe/LICENSE and clarified copyright and adaptation details in the Tahoe-X1 README. [1] [2]

raschedh · 2025-11-14T16:20:49Z

do we need the various loss functions, dataset classes if finetuning not supported? also no .ipynb?

.github/workflows/main.yml

pyproject.toml

helical/models/tahoe/tahoe_x1/utils/util.py

helical/models/tahoe/tahoe_x1/model/model.py

helical/models/tahoe/tahoe_x1/data/dataloader.py

helical/models/tahoe/tahoe_x1/data/collator.py

raschedh · 2025-12-02T16:10:51Z

retry tests when cuda is available - tests pass locally

raschedh · 2025-12-03T11:14:57Z

Ignore the entirety of the llm_foundry folder. It added about 4k lines. But we have no new dependencies now.

bputzeys · 2025-12-03T11:22:51Z

If you want, you can exclude llm foundry and the other package files from our test coverage: https://github.com/helicalAI/helical/blob/release/.github/workflows/main.yml#L32

docs/model_cards/tahoe.md

docs/notebooks/Tahoe-x1-Tutorial.ipynb

helical/models/tahoe/tahoe_x1/minimal_llm_foundry/composer_model.py

README.md

maxiallard added 10 commits November 13, 2025 09:04

first test

f44f38a

package integration for tahoe1x

49526e8

adding run file

6d15af9

code reduction

d53577e

cleaning up separation for dataloader

b5bbcb2

added attn outputs

1afa425

tested attention maps wiith torch

a0adcb5

added readme info

917add0

changed versions in pyproject.toml

8943c06

fixing flash_attn

7f85335

maxiallard marked this pull request as draft November 13, 2025 15:52

maxiallard added 16 commits November 13, 2025 17:11

removing more files

7585827

adding tests

29f05a4

new testiing instal"

cc4c54d

new testiing instal"

863b05e

new testiing instal"

8b61eca

new testiing instal"

79e209e

fixing versions for CI

27254ad

fixing versions for CI

063f6eb

fixing versions for CI

336a0fe

uninstalling torch before CI"

45a3372

uninstalling torch before CI"

8d39d06

uninstalling torch before CI"

da2d050

installing wheels directly

31a1755

fixed tests

ab4f258

checking versions

2968598

adjsusted sckit-misc version

1faf52a

maxiallard requested a review from raschedh November 14, 2025 15:21

removed sqlite

1760f53

maxiallard added 4 commits November 21, 2025 14:41

fixing test

008b73d

decoder added

f2787c6

decoder added

d94f4cf

taking out unnecessary code

14025e9

maxiallard requested a review from bputzeys November 24, 2025 08:17

bputzeys requested changes Nov 24, 2025

View reviewed changes

maxiallard added 4 commits November 27, 2025 08:42

removed s3 download

7c339e5

changing logger name

07601ae

updated testing tahoe

96bd9a1

fixing logger

2968407

raschedh reviewed Dec 2, 2025

View reviewed changes

helical/models/tahoe/tahoe_x1/data/dataloader.py Show resolved Hide resolved

raschedh reviewed Dec 2, 2025

View reviewed changes

helical/models/tahoe/tahoe_x1/data/collator.py Show resolved Hide resolved

raschedh and others added 5 commits December 2, 2025 16:24

tahoe

8e2e1bf

tahoe docs

2d3f7a3

updated imports

10ac0f1

docs

4661871

Merge branch 'main' into tahoe1x

cce2a8e

added minimal llm foundry

9f6f212

coverage file

0a84779

bputzeys reviewed Dec 4, 2025

View reviewed changes

docs/model_cards/tahoe.md Show resolved Hide resolved

bputzeys reviewed Dec 4, 2025

View reviewed changes

docs/notebooks/Tahoe-x1-Tutorial.ipynb Show resolved Hide resolved

maxiallard commented Dec 8, 2025

View reviewed changes

helical/models/tahoe/tahoe_x1/minimal_llm_foundry/composer_model.py Show resolved Hide resolved

README.md Show resolved Hide resolved

bputzeys added 2 commits December 8, 2025 15:46

Replace notebooks in docs with symlinks

7ad158c

Add license text to each mosaicml files

33be088

bputzeys approved these changes Dec 8, 2025

View reviewed changes

maxiallard merged commit 66ab960 into main Dec 9, 2025
8 of 9 checks passed

maxiallard deleted the tahoe1x branch December 9, 2025 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tahoe-X1 Model Addition#290

Tahoe-X1 Model Addition#290
maxiallard merged 53 commits intomainfrom
tahoe1x

maxiallard commented Nov 13, 2025 •

edited

Loading

Uh oh!

raschedh commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

raschedh commented Dec 2, 2025

Uh oh!

raschedh commented Dec 3, 2025 •

edited

Loading

Uh oh!

bputzeys commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maxiallard commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raschedh commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

raschedh commented Dec 2, 2025

Uh oh!

raschedh commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bputzeys commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxiallard commented Nov 13, 2025 •

edited

Loading

raschedh commented Dec 3, 2025 •

edited

Loading