Skip to content

Merge to main#141

Merged
tastelikefeet merged 12 commits intomainfrom
release/0.2
Apr 6, 2026
Merged

Merge to main#141
tastelikefeet merged 12 commits intomainfrom
release/0.2

Conversation

@tastelikefeet
Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

@tastelikefeet tastelikefeet merged commit 42c642b into main Apr 6, 2026
2 of 6 checks passed
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request prepares the project for version 0.2.0 by updating dependencies, Docker configurations, and installation scripts. Key changes include switching from tag-based to branch-based checkouts in the Dockerfile, adjusting GPU process counts in configurations, and introducing a TWINKLE_TRUST_REMOTE_CODE environment variable to control remote code execution during dataset loading. Review feedback identified critical logic errors in boolean environment variable parsing and missing os imports in src/twinkle/dataset/base.py, as well as a potential IndexError in the tensor conversion logic within src/twinkle/processor/base.py.

"""

def __init__(self, dataset_meta: DatasetMeta, **kwargs):
trust_remote_code = bool(os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1'))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues with this line:

  1. Logic Error: In Python, bool() on any non-empty string returns True. Since os.environ.get returns a string, bool("0") will be True, meaning the environment variable cannot be used to set trust_remote_code to False as intended.
  2. NameError: The os module is not imported in this file (only os.path is), so accessing os.environ will raise a NameError at runtime.
Suggested change
trust_remote_code = bool(os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1'))
trust_remote_code = os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1') == '1'

Args:
dataset_meta: The dataset_meta information of the loaded dataset.
"""
trust_remote_code = bool(os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1'))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Same issues as in the constructor: the bool() conversion on a string does not work for boolean flags, and the os module is missing from the imports.

Suggested change
trust_remote_code = bool(os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1'))
trust_remote_code = os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1') == '1'

Comment on lines +100 to +101
elif (isinstance(value, list) and isinstance(value[0],
(int, float, np.number))) or key == 'position_ids':
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Accessing value[0] without verifying that the list is non-empty will raise an IndexError if value is []. It is safer to check the list's truthiness (which is False for empty lists) before indexing.

Suggested change
elif (isinstance(value, list) and isinstance(value[0],
(int, float, np.number))) or key == 'position_ids':
elif (isinstance(value, list) and value and isinstance(value[0],
(int, float, np.number))) or key == 'position_ids':

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant