Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dataset concatenation #1193

Merged
merged 6 commits into from
Jun 20, 2024

Conversation

tastelikefeet
Copy link
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Dataset concatenation may raise following errors:

_check_if_features_can_be_aligned
    raise ValueError(
ValueError: The features can't be aligned because the key history of features {'system': Value(dtype='null', id=None), 'history': Sequence(feature=Sequence(feature=Value(dtype='string', id=None), length=-1, id=None), length=-1, id=None), 'query': Value(dtype='string', id=None), 'response': Value(dtype='string', id=None)} has unexpected type - Sequence(feature=Sequence(feature=Value(dtype='string', id=None), length=-1, id=None), length=-1, id=None) (expected either Sequence(feature=Value(dtype='null', id=None), length=-1, id=None) or Value("null").

This is because some dataset has empty values and None values, and another one has normal history values, so the arrow_dataset will treat them as difference types.

How to solve:

reduce column after the dataset instantiated, and before the concatenation.

Experiment results

Paste your experiment result here(if needed).

@tastelikefeet tastelikefeet merged commit 101109b into modelscope:main Jun 20, 2024
1 of 2 checks passed
tastelikefeet added a commit to tastelikefeet/swift that referenced this pull request Jun 21, 2024
…set-0620

* commit '295681590db2b41faab3b847d11a889088df1851':
  fix glm4v images (modelscope#1194)
  fix glm4v dataloader (modelscope#1183)
  Fix dataset concatenation (modelscope#1193)
tastelikefeet added a commit to tastelikefeet/swift that referenced this pull request Jun 21, 2024
* commit '295681590db2b41faab3b847d11a889088df1851':
  fix glm4v images (modelscope#1194)
  fix glm4v dataloader (modelscope#1183)
  Fix dataset concatenation (modelscope#1193)
hjh0119 pushed a commit to hjh0119/swift that referenced this pull request Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants