Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding 2 new decoders: #1196

Merged
merged 5 commits into from
Mar 23, 2023
Merged

Adding 2 new decoders: #1196

merged 5 commits into from
Mar 23, 2023

Conversation

Narsil
Copy link
Collaborator

@Narsil Narsil commented Mar 23, 2023

  • Fuse will simply concatenate all tokens into 1 string
  • Strip will remove n char from left or right

Sequence(Replace("_", " "), Fuse(), Strip(1, 0)) should be what we want
for the Metaspace thing.

  • Note: Added a new dependency from better parsing of decoders.
    This is due to untagged enums which can match anything the MustBe
    ensure there's no issue between Fuse and ByteFallback.
    Since both are new the chances for backward incompatibility is low.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 23, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

assert Strip(left=0, right=0) is not None
assert isinstance(Strip(left=0, right=0), Decoder)
assert isinstance(Strip(left=0, right=0), Strip)
# assert isinstance(pickle.loads(pickle.dumps(Strip(left=0, right=0))), Strip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, to remove or fix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoped it would go unoticed :)

Pickling is a nice touch, but cumbersome.

- Fuse will simply concatenate all tokens into 1 string
- Strip will remove n char from left or right

Sequence(Replace("_", " "), Fuse(), Strip(1, 0)) should be what we want
for the `Metaspace` thing.

- Note: Added a new dependency from better parsing of decoders.
This is due to untagged enums which can match anything the `MustBe`
ensure there's no issue between Fuse and ByteFallback.
Since both are new the chances for backward incompatibility is low.
@Narsil Narsil merged commit e4aea89 into main Mar 23, 2023
@Narsil Narsil deleted the lstrip_fuse_decoder branch March 23, 2023 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants