feat(fast_seqfunc)✨: Add support for custom alphabets and integer sequences#5
Merged
feat(fast_seqfunc)✨: Add support for custom alphabets and integer sequences#5
Conversation
Owner
ericmjl
commented
Mar 26, 2025
- Introduced the Alphabet class for handling custom sequence alphabets.
- Enhanced OneHotEmbedder to support custom alphabets.
- Added synthetic data generation for integer sequences.
- Implemented tests for new features and functionalities.
…uences - Introduced the Alphabet class for handling custom sequence alphabets. - Enhanced OneHotEmbedder to support custom alphabets. - Added synthetic data generation for integer sequences. - Implemented tests for new features and functionalities.
…eling - Introduced a new example script demonstrating sequence-function modeling with mixed amino acids. - The script includes data generation, model training, and prediction functionalities. - Provides visualization and evaluation of model performance.
…dling and compatibility - Introduced getter and setter for the alphabet property to enhance encapsulation. - Updated tests to align with the new alphabet handling logic. - Refactored test cases to dynamically calculate expected dimensions.
…lass - Updated tests to dynamically verify alphabet size and token mappings. - Improved parameterized tests for encoding and decoding sequences. - Added checks for sequence padding and truncation scenarios.
- Adjusted expected embedding shapes in tests to account for the updated token type count. - Modified assertions to validate the new token type structure, including gap values and characters. - Ensured all test cases align with the revised alphabet size and token handling.
- Updated the GitHub Actions workflow to include a test matrix for fast and slow tests. - Introduced a new pytest marker for slow tests in the configuration. - Added slow test markers to relevant test cases in the test suite.
…els directly - Updated the prediction function to directly use scikit-learn models if available. - Simplified the test cases to focus on embedding and serialization without PyCaret dependencies.
…ction logic for improved maintainability - Unified test execution logic in CI workflows to reduce redundancy. - Simplified prediction logic in core.py by consolidating conditional branches.
…flow - Removed manual test output capturing and exit code handling. - Utilized pytest's cache for counting test results. - Streamlined the test summary creation process.
- Updated the test result parsing logic to extract counts from pytest output. - Replaced direct cache file parsing with output analysis for better reliability.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.