Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SplitDelimiterBehavior to Punctuation constructor #657

Merged
merged 1 commit into from
Aug 13, 2021

Conversation

vladdy
Copy link
Contributor

@vladdy vladdy commented Mar 17, 2021

This change allows to customize SplitDelimiterBehavior for the Punctuation pre-tokenizer. For backwards compatibility, SplitDelimiterBehavior::Isolated is set the default. Python and NodeJS bindings were updated accordingly.

Resolves: #642

@n1t0
Copy link
Member

n1t0 commented Mar 18, 2021

Thank you for taking care of this @vladdy! Let me know if you need any help!

@n1t0
Copy link
Member

n1t0 commented May 20, 2021

Hey @vladdy! Do you want me to take over from here? If you'd like to continue but need some help with the node bindings or anything else, I'd be happy to provide you with some guidance. Let me know!

@vladdy
Copy link
Contributor Author

vladdy commented May 21, 2021

Sorry, got a bit busy at work! I'll try to get more progress within a few next days and let you know if I have any questions.

@vladdy vladdy marked this pull request as ready for review June 15, 2021 01:21
@vladdy
Copy link
Contributor Author

vladdy commented Jun 15, 2021

@n1t0 Thank you for your patience! This is ready for the review. Let me know if you want me to change/add anything.

If you are fine with the proposed changes, please approve running the workflows so that I could check their results.

Copy link
Member

@n1t0 n1t0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladdy Thank you for taking care of this! From a quick glance, it looks great! I approved the workflows, and there just seems to be a typo.

Can you also modify the CHANGELOG.md file under bindings/python with this modification?

bindings/python/tests/bindings/test_pre_tokenizers.py Outdated Show resolved Hide resolved
@vladdy vladdy force-pushed the add-split-delimeter branch 7 times, most recently from f28177e to 6de3d23 Compare June 17, 2021 11:34
@vladdy
Copy link
Contributor Author

vladdy commented Jun 17, 2021

@n1t0 I've added a CHANGELOG entry under new 0.10.4 version. Please let me know if you had any other plans about it.

@vladdy vladdy requested a review from n1t0 June 17, 2021 11:35
bindings/python/CHANGELOG.md Outdated Show resolved Hide resolved
@vladdy
Copy link
Contributor Author

vladdy commented Jun 25, 2021

@n1t0, just checking if you still need me to change anything else here.

Copy link
Member

@n1t0 n1t0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vladdy! Everything looks good

@n1t0 n1t0 merged commit e2bf8da into huggingface:master Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SplitDelimiterBehavior to Punctuation constructor
2 participants