-
Notifications
You must be signed in to change notification settings - Fork 3
Closed
Labels
Description
I currently have:
<body>
{ <p @text:content /> }
</body>
Obvious this matches all p tags in body at any level. I however want something like:
<body>
{ <p|h[1-6] @text:content /> }
</body>
or more explicitly:
<body>
{ <p|h1|h2|h3|h4|h5|h6 @text:content /> }
</body>
I mean I also want to match h1 through h6, not just p. This doesn't seem to be supported by hext at this time. This is an important and urgent use case for me for extracting text from an HTML article for machine learning purposes. I don't however want to match any other tags at this time. Is there any way to do this?
Currently, to use hext for this purpose, I have to first use a string replacement to replace all h1-h6 tags with p tags, which is a hacky thing to do via string manipulation, risking errors.