Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nonTaggableTags option #32

Closed
dsmiley opened this issue Jul 31, 2014 · 0 comments
Closed

nonTaggableTags option #32

dsmiley opened this issue Jul 31, 2014 · 0 comments

Comments

@dsmiley
Copy link
Member

dsmiley commented Jul 31, 2014

Sometimes when submitting HTML markup to tag, you don't want tags to enclose certain elements (also called "tags" confusingly). The elements "script" and "style" are already stripped out by Lucene's HTMLStripCharFilter. But you might want to not tag text in "a" (anchor) link elements because your application is going to insert links and doesn't want such links to interfere with existing ones (no overlaps).

I'll add a nonTaggableTags option that is a comma-delimited list of HTML element (tag) names that, if found to overlap with a candidate tagger tag, will cause that tagger tag to be omitted. For now, this option will only work when htmlOffsetAdjust is true, but could be easily modified later for xmlOffsetAdjust likewise.

dsmiley added a commit that referenced this issue Jul 31, 2014
Closes #32.
RE the bug, happens when:  taggText<open> blah </open>  whereas taggText is adjacent to an open element that follows it.
@dsmiley dsmiley closed this as completed Jul 31, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant