Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‼️ BREAKING: Change Token.attrs to a dict #144

Merged
merged 6 commits into from
Mar 17, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .mypy.ini
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[mypy]
show_error_codes = True
warn_unused_ignores = True
warn_redundant_casts = True
no_implicit_optional = True
Expand Down
12 changes: 4 additions & 8 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,10 @@ vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)')

def render_vimeo(self, tokens, idx, options, env):
token = tokens[idx]
aIndex = token.attrIndex('src')
if (vimeoRE.match(token.attrs[aIndex][1])):

ident = vimeoRE.match(token.attrs[aIndex][1])[2]
if vimeoRE.match(token.attrs["src"]):

ident = vimeoRE.match(token.attrs["src"])[2]

return ('<div class="embed-responsive embed-responsive-16by9">\n' +
' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' +
Expand All @@ -140,11 +140,7 @@ Here is another example, how to add `target="_blank"` to all links:
from markdown_it import MarkdownIt

def render_blank_link(self, tokens, idx, options, env):
aIndex = tokens[idx].attrIndex('target')
if (aIndex < 0):
tokens[idx].attrPush(['target', '_blank']) # add new attribute
else:
tokens[idx].attrs[aIndex][1] = '_blank' # replace value of existing attr
tokens[idx].attrSet("target", "_blank")

# pass token to default renderer.
return self.renderToken(tokens, idx, options, env)
Expand Down
72 changes: 34 additions & 38 deletions docs/using.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,17 @@ then these are converted to other formats using 'renderers'.

The simplest way to understand how text will be parsed is using:

```{code-cell}
```{code-cell} python
from pprint import pprint
from markdown_it import MarkdownIt
```

```{code-cell}
```{code-cell} python
md = MarkdownIt()
md.render("some *text*")
```

```{code-cell}
```{code-cell} python
for token in md.parse("some *text*"):
print(token)
print()
Expand All @@ -59,48 +59,48 @@ You can define this configuration *via* directly supplying a dictionary or a pre
Compared to `commonmark`, it enables the table, strikethrough and linkify components.
**Important**, to use this configuration you must have `linkify-it-py` installed.

```{code-cell}
```{code-cell} python
from markdown_it.presets import zero
zero.make()
```

```{code-cell}
```{code-cell} python
md = MarkdownIt("zero")
md.options
```

You can also override specific options:

```{code-cell}
```{code-cell} python
md = MarkdownIt("zero", {"maxNesting": 99})
md.options
```

```{code-cell}
```{code-cell} python
pprint(md.get_active_rules())
```

You can find all the parsing rules in the source code:
`parser_core.py`, `parser_block.py`,
`parser_inline.py`.

```{code-cell}
```{code-cell} python
pprint(md.get_all_rules())
```

Any of the parsing rules can be enabled/disabled, and these methods are "chainable":

```{code-cell}
```{code-cell} python
md.render("- __*emphasise this*__")
```

```{code-cell}
```{code-cell} python
md.enable(["list", "emphasis"]).render("- __*emphasise this*__")
```

You can temporarily modify rules with the `reset_rules` context manager.

```{code-cell}
```{code-cell} python
with md.reset_rules():
md.disable("emphasis")
print(md.render("__*emphasise this*__"))
Expand All @@ -109,7 +109,7 @@ md.render("__*emphasise this*__")

Additionally `renderInline` runs the parser with all block syntax rules disabled.

```{code-cell}
```{code-cell} python
md.renderInline("__*emphasise this*__")
```

Expand Down Expand Up @@ -140,7 +140,7 @@ The `smartquotes` and `replacements` components are intended to improve typograp

Both of these components require typography to be turned on, as well as the components enabled:

```{code-cell}
```{code-cell} python
md = MarkdownIt("commonmark", {"typographer": True})
md.enable(["replacements", "smartquotes"])
md.render("'single quotes' (c)")
Expand All @@ -151,7 +151,7 @@ md.render("'single quotes' (c)")
The `linkify` component requires that [linkify-it-py](https://github.com/tsutsu3/linkify-it-py) be installed (e.g. *via* `pip install markdown-it-py[linkify]`).
This allows URI autolinks to be identified, without the need for enclosing in `<>` brackets:

```{code-cell}
```{code-cell} python
md = MarkdownIt("commonmark", {"linkify": True})
md.enable(["linkify"])
md.render("github.com")
Expand All @@ -161,7 +161,7 @@ md.render("github.com")

Plugins load collections of additional syntax rules and render methods into the parser

```{code-cell}
```{code-cell} python
from markdown_it import MarkdownIt
from markdown_it.extensions.front_matter import front_matter_plugin
from markdown_it.extensions.footnote import footnote_plugin
Expand Down Expand Up @@ -194,7 +194,7 @@ md.render(text)

Before rendering, the text is parsed to a flat token stream of block level syntax elements, with nesting defined by opening (1) and closing (-1) attributes:

```{code-cell}
```{code-cell} python
md = MarkdownIt("commonmark")
tokens = md.parse("""
Here's some *text*
Expand All @@ -208,37 +208,37 @@ Here's some *text*
Naturally all openings should eventually be closed,
such that:

```{code-cell}
```{code-cell} python
sum([t.nesting for t in tokens]) == 0
```

All tokens are the same class, which can also be created outside the parser:

```{code-cell}
```{code-cell} python
tokens[0]
```

```{code-cell}
```{code-cell} python
from markdown_it.token import Token
token = Token("paragraph_open", "p", 1, block=True, map=[1, 2])
token == tokens[0]
```

The `'inline'` type token contain the inline tokens as children:

```{code-cell}
```{code-cell} python
tokens[1]
```

You can serialize a token (and its children) to a JSONable dictionary using:

```{code-cell}
```{code-cell} python
print(tokens[1].as_dict())
```

This dictionary can also be deserialized:

```{code-cell}
```{code-cell} python
Token.from_dict(tokens[1].as_dict())
```

Expand All @@ -251,7 +251,7 @@ Token.from_dict(tokens[1].as_dict())
In some use cases it may be useful to convert the token stream into a syntax tree,
with opening/closing tokens collapsed into a single token that contains children.

```{code-cell}
```{code-cell} python
from markdown_it.tree import SyntaxTreeNode

md = MarkdownIt("commonmark")
Expand All @@ -271,11 +271,11 @@ print(node.pretty(indent=2, show_text=True))

You can then use methods to traverse the tree

```{code-cell}
```{code-cell} python
node.children
```

```{code-cell}
```{code-cell} python
print(node[0])
node[0].next_sibling
```
Expand All @@ -299,7 +299,7 @@ def function(renderer, tokens, idx, options, env):

You can inject render methods into the instantiated render class.

```{code-cell}
```{code-cell} python
md = MarkdownIt("commonmark")

def render_em_open(self, tokens, idx, options, env):
Expand All @@ -316,7 +316,7 @@ Also `add_render_rule` method is specific to Python, rather than adding directly

You can also subclass a render and add the method there:

```{code-cell}
```{code-cell} python
from markdown_it.renderer import RendererHTML

class MyRenderer(RendererHTML):
Expand All @@ -329,7 +329,7 @@ md.render("*a*")

Plugins can support multiple render types, using the `__ouput__` attribute (this is currently a Python only feature).

```{code-cell}
```{code-cell} python
from markdown_it.renderer import RendererHTML

class MyRenderer1(RendererHTML):
Expand All @@ -355,18 +355,18 @@ print(md.render("*a*"))

Here's a more concrete example; let's replace images with vimeo links to player's iframe:

```{code-cell}
```{code-cell} python
import re
from markdown_it import MarkdownIt

vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)')

def render_vimeo(self, tokens, idx, options, env):
token = tokens[idx]
aIndex = token.attrIndex('src')
if (vimeoRE.match(token.attrs[aIndex][1])):

ident = vimeoRE.match(token.attrs[aIndex][1])[2]
if vimeoRE.match(token.attrs["src"]):

ident = vimeoRE.match(token.attrs["src"])[2]

return ('<div class="embed-responsive embed-responsive-16by9">\n' +
' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' +
Expand All @@ -381,15 +381,11 @@ print(md.render("![](https://www.vimeo.com/123)"))

Here is another example, how to add `target="_blank"` to all links:

```{code-cell}
```{code-cell} python
from markdown_it import MarkdownIt

def render_blank_link(self, tokens, idx, options, env):
aIndex = tokens[idx].attrIndex('target')
if (aIndex < 0):
tokens[idx].attrPush(['target', '_blank']) # add new attribute
else:
tokens[idx].attrs[aIndex][1] = '_blank' # replace value of existing attr
tokens[idx].attrSet("target", "_blank")

# pass token to default renderer.
return self.renderToken(tokens, idx, options, env)
Expand Down
8 changes: 7 additions & 1 deletion markdown_it/port.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,15 @@
- `len` -> `length`
- `str` -> `string`
- |
Convert JS for loops -to while loops
Convert JS `for` loops to `while` loops
this is generally the main difference between the codes,
because in python you can't do e.g. `for {i=1;i<x;i++} {}`
- |
`Token.attrs` is a dictionary, instead of a list of lists.
Upstream the list format is only used to guarantee order: https://github.com/markdown-it/markdown-it/issues/142,
but in Python 3.7+ order of dictionaries is guaranteed.
One should anyhow use the `attrGet`, `attrSet`, `attrPush` and `attrJoin` methods
to manipulate `Token.attrs`, which have an identical signature to those upstream.
- Use python version of `charCodeAt`
- |
Reduce use of charCodeAt() by storing char codes in a srcCharCodes attribute for state
Expand Down
37 changes: 11 additions & 26 deletions markdown_it/renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ class Renderer
copy of rules. Those can be rewritten with ease. Also, you can add new
rules if you create plugin and adds new token types.
"""
import copy
import inspect
from typing import Optional, Sequence

Expand Down Expand Up @@ -153,19 +154,10 @@ def renderToken(
@staticmethod
def renderAttrs(token: Token) -> str:
"""Render token attributes to string."""
if not token.attrs:
return ""

result = ""

for token_attr in token.attrs:
result += (
" "
+ escapeHtml(str(token_attr[0]))
+ '="'
+ escapeHtml(str(token_attr[1]))
+ '"'
)
for key, value in token.attrItems():
result += " " + escapeHtml(str(key)) + '="' + escapeHtml(str(value)) + '"'
chrisjsewell marked this conversation as resolved.
Show resolved Hide resolved

return result

Expand Down Expand Up @@ -241,17 +233,9 @@ def fence(self, tokens: Sequence[Token], idx: int, options, env) -> str:
# May be, one day we will add .deepClone() for token and simplify this part, but
# now we prefer to keep things local.
if info:
i = token.attrIndex("class")
tmpAttrs = token.attrs[:] if token.attrs else []

if i < 0:
tmpAttrs.append(["class", options.langPrefix + langName])
else:
tmpAttrs[i] = tmpAttrs[i][:]
tmpAttrs[i][1] += " " + options.langPrefix + langName

# Fake token just to render attributes
tmpToken = Token(type="", tag="", nesting=0, attrs=tmpAttrs)
tmpToken = Token(type="", tag="", nesting=0, attrs=copy.copy(token.attrs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict also has dict.copy() method if you want to avoid import copy 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

tmpToken.attrJoin("class", options.langPrefix + langName)

return (
"<pre><code"
Expand All @@ -271,16 +255,17 @@ def fence(self, tokens: Sequence[Token], idx: int, options, env) -> str:

def image(self, tokens: Sequence[Token], idx: int, options, env) -> str:
token = tokens[idx]
assert token.attrs is not None, '"image" token\'s attrs must not be `None`'

# "alt" attr MUST be set, even if empty. Because it's mandatory and
# should be placed on proper position for tests.
#

assert (
token.attrs and "alt" in token.attrs
), '"image" token\'s attrs must contain `alt`'

# Replace content with actual value

token.attrs[token.attrIndex("alt")][1] = self.renderInlineAsText(
token.children, options, env
)
token.attrSet("alt", self.renderInlineAsText(token.children, options, env))

return self.renderToken(tokens, idx, options, env)

Expand Down
2 changes: 1 addition & 1 deletion markdown_it/rules_block/list.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def list_block(state: StateBlock, startLine: int, endLine: int, silent: bool):
if isOrdered:
token = state.push("ordered_list_open", "ol", 1)
if markerValue != 1:
token.attrs = [["start", markerValue]]
token.attrs = {"start": markerValue}

else:
token = state.push("bullet_list_open", "ul", 1)
Expand Down
Loading