Skip to content

Convert custom user_data to token extension format for Japanese tokenizer#5652

Merged
honnibal merged 2 commits intoexplosion:masterfrom
adrianeboyd:feature/ja-tokenizer-features
Jun 29, 2020
Merged

Convert custom user_data to token extension format for Japanese tokenizer#5652
honnibal merged 2 commits intoexplosion:masterfrom
adrianeboyd:feature/ja-tokenizer-features

Conversation

@adrianeboyd
Copy link
Contributor

Description

Convert the user_data values so that they can be loaded as custom token extensions for inflection, reading_form, sub_tokens, and lemma.

Types of change

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

Convert the user_data values so that they can be loaded as custom token
extensions for `inflection`, `reading_form`, `sub_tokens`, and `lemma`.
@adrianeboyd adrianeboyd added enhancement Feature requests and improvements lang / ja Japanese language data and models labels Jun 26, 2020
@adrianeboyd adrianeboyd changed the title Convert custom user_data to token extension format Convert custom user_data to token extension format for Japanese tokenizer Jun 26, 2020
@honnibal honnibal merged commit 1dd3819 into explosion:master Jun 29, 2020
honnibal added a commit that referenced this pull request Jun 29, 2020
honnibal added a commit that referenced this pull request Jun 29, 2020
@Techno-coder
Copy link

Was there a reason why this PR was reverted? I can't seem to find any discussions on it 😅. I would have thought accessing inflections by Token would be more natural.

@adrianeboyd
Copy link
Contributor Author

Yes, I agree that it seems more natural, but we have a policy that the core spacy library never sets custom extensions so that there's no chance for it to introduce conflicts with users' custom extensions, which can potentially have the same names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Feature requests and improvements lang / ja Japanese language data and models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants