Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds start and end character positions to tag structure - available to tag transformers #151

Conversation

SoftwareEngineerChris
Copy link

This change introduces tag character positions relative to the original string as part of the Tag structure. These can then be used by TagTransformer transformation functions.

It may not be immediately obvious why this change may be useful, but I have found it to be quite useful for extracting content that wouldn't be suitable for attributed string transformation from within content that is suitable for attributed string transformation.

For example, if the html being transformed is mostly transformable content, but contains an iframe tag, or a Twitter blockquote somewhere within it, the positions of these tags (opening and/or closing) have been useful in order to split, extract, and treat them accordingly.

I've used emojis with variations to include grapheme clusters in the unit test to ensure the String.Index values handle these correctly (via UTF16).

@psharanda
Copy link
Owner

@SoftwareEngineerChris FYI V5 was introduced recently and included new TagTuning API

@psharanda psharanda closed this Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants