Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include custom HTML attributes #170

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a21978f
html attribute block proposal
legenderrys Dec 9, 2022
f421f62
updated readme to describe this fork
legenderrys Dec 9, 2022
1b70e87
added new block token for HTMLAttributes, fix logic to work within sy…
legenderrys Dec 9, 2022
2ef77d1
added test, refined htmlAttribute class to serialize props, added aut…
legenderrys Dec 10, 2022
11d94f6
updated readme
legenderrys Dec 10, 2022
744f471
updated readme
legenderrys Dec 10, 2022
9c36c49
updated readme
legenderrys Dec 10, 2022
3e66005
✅w.i.p - experimenting with nested attribute id
legenderrys Dec 10, 2022
f56fc9f
added tabindex and unique ids for nested elements
legenderrys Dec 10, 2022
5b5f92e
wip
legenderrys Dec 10, 2022
9685f72
🎉 Implemented HtmlAttribute renderer - working with nested list elem…
legenderrys Dec 11, 2022
16400ec
clean up test, updated readme
legenderrys Dec 11, 2022
656ba0a
added configure method + doc strings
legenderrys Dec 11, 2022
0227f85
reverted to upstream/master readme
legenderrys Dec 18, 2022
6acbb2b
fixed whitespace code formatting + reverted to upstream/master
legenderrys Dec 18, 2022
ba1fbcb
added features doc + readme link
legenderrys Dec 18, 2022
41ae409
fixed doc string for HTMLAttrs + added Double spaces between class
legenderrys Dec 18, 2022
e8256f6
updated exception handling to propagate
legenderrys Dec 18, 2022
5cf4764
fixed return statement format
legenderrys Dec 18, 2022
4e5f947
changed expression for list types
legenderrys Dec 18, 2022
0236b95
fixed format string + updated default id value
legenderrys Dec 18, 2022
54960fc
updated condition check + propagate exceptions
legenderrys Dec 18, 2022
2fc495b
updated html attrs test output
legenderrys Dec 18, 2022
e68273d
change inheirtance + unused code cleanup
legenderrys Dec 18, 2022
af4fb87
remove old html attrs property
legenderrys Dec 18, 2022
e090547
updated test output format
legenderrys Dec 24, 2022
48c8134
added link to 508 definition
legenderrys Dec 30, 2022
cd6ff44
code clean up to exclude redundant extension values
legenderrys Dec 30, 2022
43bf8ed
fixed mapping_delimeter name spelling
legenderrys Dec 30, 2022
c6a9ed2
updated arguments passed into renderer
legenderrys Dec 30, 2022
a7828be
omit non-block elements that doesnt require 508 or attributes.
legenderrys Dec 30, 2022
389474c
remove render method for html_attr
legenderrys Jan 28, 2023
4cdc0c4
remove try / catch for performance
legenderrys Jan 28, 2023
89be175
fixed failing test
legenderrys Mar 25, 2024
82944df
Merge remote-tracking branch 'upstream/master'
legenderrys Oct 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ Copyright & License
[python-markdown]: https://github.com/waylan/Python-Markdown
[python-markdown2]: https://github.com/trentm/python-markdown2
[commonmark-py]: https://github.com/rtfd/CommonMark-py
[features]: features.md
[performance]: performance.md
[oilshell]: https://www.oilshell.org/blog/2018/02/14.html
[commonmark]: https://spec.commonmark.org/
Expand Down
86 changes: 86 additions & 0 deletions features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
HTMLAttributesRenderer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice docs!

I'd suggest to change the name of the file to something along the lines of "Extension to attach custom html attributes" to make it more clear what it's about. "Features" is so very generic.

You could also describe the feature in a paragraph in the README.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we rename the file name from features.md into extensions.md
and within extensions.md we link to any available docs for new extensions or just add extension details to a single file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we really should think this through, I think there is no docs like this in mistletoe yet... Maybe keeping most of the description right at the class itself (in pydoc) could also be the way (although I have suggested creating a dedicated md myself firstly, I know)?

----------------------

This feature allows you to write Markdown that will render [508 compliant](https://www.section508.gov/manage/laws-and-policies/) html attributes.


**HTMLAttributesRenderer Block syntax**

Contents within the following characters `${...}` will describe how the HTMLAttributesRenderer will process and include attributes.

`${ ..................... }`

The content string is partitioned by the optional ` > ` character (whitespace included) will separate parent attributes from child attributes. Attributes defined on the left will apply to root parent element and the right side applies to children.

`${ id:some-parent > class:our-code our-love }`

Multiple attribute pairs are delimited using comma space. `, `

`${ class:our-code our-love, aria-label:spread-love }`

Multiple attributes values are delimited using a single space. ` `

`${ class:our-code our-love }`

example:


**How to Use HTMLAttributesRenderer**

```python
import mistletoe
from mistletoe.html_attributes_renderer import HTMLAttributesRenderer
txt = """\
${class:foobar}
# Mistletoe is Awesome

${id:todos, tabindex:100 > class:list-item}
- Push Code
- Get Groceries
- Veggies
- Fruits
- apples
- oranges
- Hang up the mistletoe

${class:img-sm}
![foo](https://cdn.rawgit.com/miyuchina/mistletoe/master/resources/logo.svg "toof")

${ > class:btn-link, onclick:event.preventDefault();console.log(this,'button clicked');}
[some link](https://cdn.rawgit.com/miyuchina/mistletoe/master/resources/logo.svg "toof")\
"""

# Optional: Configure HTMLAttributesRenderer
HTMLAttributesRenderer.configure({...})

# Render the markdown into html
rendered = mistletoe.markdown(txt, HTMLAttributesRenderer)
```

OUTPUT

```html
<h1 class="foobar" id="mistletoe-is-awesome" tabindex="1">Mistletoe is Awesome</h1>
<ul id="todos" tabindex="100">
<li class="list-item" tabindex="1">Push Code</li>
<li class="list-item" tabindex="1">Get Groceries
<ul id="todos-0" tabindex="1">
<li class="list-item" tabindex="1">Veggies</li>
<li class="list-item" tabindex="1">Fruits
<ul id="todos-0-1" tabindex="1">
<li class="list-item" tabindex="1">apples</li>
<li class="list-item" tabindex="1">oranges</li>
</ul>
</li>
</ul>
</li>
<li class="list-item" tabindex="1">Hang up the mistletoe</li>
</ul>
<p class="img-sm" tabindex="1">
<img src="https://cdn.rawgit.com/miyuchina/mistletoe/master/resources/logo.svg" alt="foo" title="toof" tabindex="1" />
</p>
<p tabindex="1">
<a href="https://cdn.rawgit.com/miyuchina/mistletoe/master/resources/logo.svg" title="toof" class="btn-link"
onclick="event.preventDefault();console.log(this,'button clicked');" tabindex="1">some link</a>
</p>
```
2 changes: 1 addition & 1 deletion mistletoe/base_renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,4 +202,4 @@ def render_thematic_break(self, token: block_token.ThematicBreak) -> str:
return self.render_inner(token)

def render_document(self, token: block_token.Document) -> str:
return self.render_inner(token)
return self.render_inner(token)
Copy link
Collaborator

@pbodnar pbodnar Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware of leaving EOL at the EOF everywhere. This will also decrease number of changed files - like this one.

105 changes: 105 additions & 0 deletions mistletoe/block_token.py
Original file line number Diff line number Diff line change
Expand Up @@ -1020,6 +1020,111 @@ def read(lines):
return [next(lines)]


class HTMLAttributes(BlockToken):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, we switched to "strict/simple camel case convention" globally, so it needs to be like this now:

Suggested change
class HTMLAttributes(BlockToken):
class HtmlAttributes(BlockToken):

The same applies to the new HTMLAttributesRenderer.

"""
Block-level HTMLAttributes token.

Attributes:
raw_attr_str (str): the raw HTML attributes.
parent_props (dict): parsed from raw_attr_str
child_props (dict): parsed from raw_attr_str
"""

# Configurable properties
start_str = "${"
end_str = "}"
parent_child_partition_str = " > "
mapping_delimeter = ":"
allow_auto_ids = ['Heading']
enable_auto_ids = False
enable_auto_tabindex = True
tabindex = 1;
id_index = -1;

def __init__(self, line: str):
pattr,_,cattr = line.partition(self.parent_child_partition_str)
self.raw_attr_str: str = line.strip()
self.parent_props: dict = self.set_props(pattr)
self.child_props: dict = self.set_props(cattr)

def set_props(self, attr_str: str):
"""Parses raw attribute string into dicts"""
def get_props(prop):
if self.mapping_delimeter in prop:
key, _, value = prop.partition(self.mapping_delimeter)
return key, value
return None, None
attr_map = {}
for prop in attr_str.split(', '):
k, v = get_props(prop.strip())
if k and v: attr_map[k] = v
return attr_map

def apply_props(self, token, is_child: bool = None):
"""Applies props recursively to parent and child tokens"""

has_nested_children = self.check_for_children(token)
token_props = self.parent_props if not is_child else self.child_props
auto_id = self.get_auto_id(token)
token.html_props = self.serialize(token_props, auto_id)
if not has_nested_children:
return
for chld in token.children:
is_child_key = not isinstance(chld, List)
if not is_child_key:
self.id_index += 1
token_props = self.parent_props
child_key = "{}-{}".format(token_props.get("id","item"), str(self.id_index))
token_props['id'] = child_key
self.apply_props(chld, is_child_key)

def serialize(self, props: dict, auto_id: str = '') -> str:
"""Serializes the props into html attribute strings"""
if auto_id and not props.get('id') and self.enable_auto_ids:
props['id'] = auto_id
if HTMLAttributes.enable_auto_tabindex:
props['tabindex'] = props.get('tabindex', 1)
propstr = "".join([f' {k}="{v}"' for k, v in props.items()])
if HTMLAttributes.enable_auto_tabindex: del props['tabindex']
return propstr

@classmethod
def configure(cls, options: dict) -> str:
"""Override default class configuration fields"""
only_fields = ("start_str", "end_str", "parent_child_partition_str", "mapping_delimeter", "allow_auto_ids", "enable_auto_ids", "enable_auto_tabindex")
for k, v in options.items():
if k not in only_fields: continue
setattr(cls, k, v)

@classmethod
def get_auto_id(cls, token) -> str:
"""Automatically generate ids for Heading elements or any specified token type"""
allow_auto_id = hasattr(token, 'content') and cls.enable_auto_ids and token.__class__.__name__ in cls.allow_auto_ids
auto_id = token.content.lower().replace(' ','-') if allow_auto_id else ''
return auto_id


@classmethod
def check_for_children(cls, token):
return hasattr(token, "children") and token.__class__.__name__ != "RawText"

@classmethod
def clear(cls):
cls.id_index = -1
cls.tabindex = 1

@classmethod
def start(cls, line):
return line.strip().startswith(cls.start_str) and line.strip().endswith(cls.end_str)

@classmethod
def read(cls, lines):
line = lines.peek()
l = line.strip().lstrip(cls.start_str).rstrip(cls.end_str)
next(lines)
return l


class HtmlBlock(BlockToken):
"""
Block-level HTML token.
Expand Down
183 changes: 183 additions & 0 deletions mistletoe/html_attributes_renderer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
"""
HTML Attributes renderer for mistletoe.
"""

import html
from mistletoe import block_token
from mistletoe import span_token
from mistletoe.block_token import HTMLAttributes
from mistletoe.html_renderer import HTMLRenderer


class HTMLAttributesRenderer(HTMLRenderer):
"""
HTML Attributes renderer class.

See mistletoe.html_renderer module for more info.
"""
def __init__(self, *extras):
"""
Args:
extras (list): allows subclasses to add even more custom tokens.
"""
super().__init__(HTMLAttributes, *extras)
self.RENDERER_START = False

def render(self, token):
"""
Grabs the class name from input token and finds its corresponding
render function.

Basically a janky way to do polymorphism.

Arguments:
token: whose __class__.__name__ is in self.render_map.
"""
# reconcile our htmlattributes
if not self.RENDERER_START:
self.reconcile_attrs(token)
return self.render_map[token.__class__.__name__](token)

def reconcile_attrs(self, doc_token):
"""Traverse token children while assigning html attributes if available"""
self.RENDERER_START = True
recon_tokens = []
htmlAttributesToken: block_token.HTMLAttributes = None
for token_type in doc_token.children:
if 'HTMLAttributes' == token_type.__class__.__name__:
htmlAttributesToken = token_type
continue
if htmlAttributesToken:
htmlAttributesToken.apply_props(token_type)
htmlAttributesToken.clear()
htmlAttributesToken = None
recon_tokens.append(token_type)
doc_token.children = recon_tokens

def render_html_attributes(self, token: block_token) -> str:
return '' if not hasattr(token,'html_props') else token.html_props

def render_image(self, token: span_token.Image) -> str:
template = '<img src="{}" alt="{}"{}{attrs} />'
title = ' title="{}"'.format(html.escape(token.title)) if token.title else ''
attrs = self.render_html_attributes(token)
return template.format(token.src, self.render_to_plain(token), title, attrs=attrs)

def render_link(self, token: span_token.Link) -> str:
template = '<a href="{target}"{title}{attr}>{inner}</a>'
target = self.escape_url(token.target)
if token.title:
title = ' title="{}"'.format(html.escape(token.title))
else:
title = ''
inner = self.render_inner(token)
attr = '' if not hasattr(token,'html_props') else token.html_props
return template.format(target=target, title=title, inner=inner, attr=attr)

def render_auto_link(self, token: span_token.AutoLink) -> str:
template = '<a href="{target}"{attr}>{inner}</a>'
if token.mailto:
target = 'mailto:{}'.format(token.target)
else:
target = self.escape_url(token.target)
inner = self.render_inner(token)
attr = '' if not hasattr(token,'html_props') else token.html_props
return template.format(target=target, inner=inner, attr=attr)

def render_heading(self, token: block_token.Heading) -> str:
template = '<h{level}{attr}>{inner}</h{level}>'
inner = self.render_inner(token)
attr = '' if not hasattr(token,'html_props') else token.html_props
return template.format(level=token.level, attr=attr, inner=inner)

def render_quote(self, token: block_token.Quote) -> str:
attr = '' if not hasattr(token,'html_props') else token.html_props
elements = [f'<blockquote{attr}>']
self._suppress_ptag_stack.append(False)
elements.extend([self.render(child) for child in token.children])
self._suppress_ptag_stack.pop()
elements.append('</blockquote>')
return '\n'.join(elements)

def render_paragraph(self, token: block_token.Paragraph) -> str:
if self._suppress_ptag_stack[-1]:
return '{}'.format(self.render_inner(token))
attrs = '' if not hasattr(token,'html_props') else token.html_props
return '<p{attrs}>{}</p>'.format(self.render_inner(token), attrs=attrs)

# def render_block_code(self, token: block_token.BlockCode) -> str:
# template = '<pre{attrs}><code{attr}>{inner}</code></pre>'
# if token.language:
# attr = ' class="{}"'.format('language-{}'.format(html.escape(token.language)))
# else:
# attr = ''
# inner = html.escape(token.children[0].content)
# attrs = '' if not hasattr(token,'html_props') else token.html_props
# return template.format(attr=attr, inner=inner, attrs=attrs)

def render_list(self, token: block_token.List) -> str:
template = '<{tag}{olattr}{attrs}>\n{inner}\n</{tag}>'
attrs = '' if not hasattr(token,'html_props') else token.html_props
tag = 'ol' if token.start is not None else 'ul'
olattr = ' start="{}"'.format(token.start) if tag == 'ol' else ''
self._suppress_ptag_stack.append(not token.loose)
inner = '\n'.join([self.render(child) for child in token.children])
self._suppress_ptag_stack.pop()
return template.format(tag=tag, olattr=olattr, attrs=attrs, inner=inner)

def render_list_item(self, token: block_token.ListItem) -> str:
if len(token.children) == 0:
return '<li></li>'
inner = '\n'.join([self.render(child) for child in token.children])
inner_template = '\n{}\n'
if self._suppress_ptag_stack[-1]:
if token.children[0].__class__.__name__ == 'Paragraph':
inner_template = inner_template[1:]
if token.children[-1].__class__.__name__ == 'Paragraph':
inner_template = inner_template[:-1]
attrs = '' if not hasattr(token,'html_props') else token.html_props
return '<li{attrs}>{}</li>'.format(inner_template.format(inner), attrs=attrs)

def render_table(self, token: block_token.Table) -> str:
# This is actually gross and I wonder if there's a better way to do it.
#
# The primary difficulty seems to be passing down alignment options to
# reach individual cells.
template = '<table{attrs}>\n{inner}</table>'
if hasattr(token, 'header'):
head_template = '<thead>\n{inner}</thead>\n'
head_inner = self.render_table_row(token.header, is_header=True)
head_rendered = head_template.format(inner=head_inner)
else: head_rendered = ''
body_template = '<tbody>\n{inner}</tbody>\n'
body_inner = self.render_inner(token)
body_rendered = body_template.format(inner=body_inner)
attrs = '' if not hasattr(token,'html_props') else token.html_props
return template.format(inner=head_rendered+body_rendered, attrs=attrs)

def render_table_row(self, token: block_token.TableRow, is_header=False) -> str:
template = '<tr{attrs}>\n{inner}</tr>\n'
inner = ''.join([self.render_table_cell(child, is_header)
for child in token.children])
attrs = '' if not hasattr(token,'html_props') else token.html_props
return template.format(inner=inner, attrs=attrs)

def render_table_cell(self, token: block_token.TableCell, in_header=False) -> str:
template = '<{tag}{attr}>{inner}</{tag}>\n'
tag = 'th' if in_header else 'td'
if token.align is None:
align = 'left'
elif token.align == 0:
align = 'center'
elif token.align == 1:
align = 'right'
attr = ' align="{}"'.format(align)
inner = self.render_inner(token)
return template.format(tag=tag, attr=attr, inner=inner)

def render_document(self, token: block_token.Document) -> str:
self.footnotes.update(token.footnotes)
inner = '\n'.join([self.render(child) for child in token.children])
doc_html = '{}\n'.format(inner) if inner else ''
self.RENDERER_START = False
return doc_html
Loading