Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‼️ BREAKING: Change Token.attrs to a dict #144

Merged
merged 6 commits into from
Mar 17, 2021
Merged

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Mar 17, 2021

As discussed in #102, upstream the list format is only used to guarantee order: markdown-it/markdown-it#142, but in Python 3.7+ dictionary order is now guaranteed by the specification (in Python 3.6 is also preserved as an implementation detail).
This change improves typing and performance.

To minimize how breaking this change is, auto-conversion is done on Token initiation, i.e. you can still use Token("type", "tag", 0, attrs=[["key", "value"]]),
and also Token.as_dict(as_upstream=True) converts the dict back to null/list, so that they can still be directly compared to those produced in the debug tab of https://markdown-it.github.io/.

One should anyhow generally use the attrGet, attrSet, attrPush and attrJoin methods
to manipulate Token.attrs, which all have an identical signature to those upstream.

I also added the meta_serializer option to Token.as_dict, which now ensures that this method is always able to produce valid JSON.

@codecov
Copy link

codecov bot commented Mar 17, 2021

Codecov Report

Merging #144 (cc75081) into master (00a28a6) will increase coverage by 0.03%.
The diff coverage is 93.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #144      +/-   ##
==========================================
+ Coverage   96.38%   96.41%   +0.03%     
==========================================
  Files          72       72              
  Lines        3205     3204       -1     
==========================================
  Hits         3089     3089              
+ Misses        116      115       -1     
Flag Coverage Δ
pytests 96.41% <93.22%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
markdown_it/tree.py 92.35% <75.00%> (-0.63%) ⬇️
markdown_it/token.py 92.55% <92.30%> (+0.59%) ⬆️
markdown_it/renderer.py 100.00% <100.00%> (+1.80%) ⬆️
markdown_it/rules_block/list.py 98.84% <100.00%> (ø)
markdown_it/rules_block/table.py 100.00% <100.00%> (ø)
markdown_it/rules_core/linkify.py 98.87% <100.00%> (ø)
markdown_it/rules_inline/autolink.py 96.22% <100.00%> (ø)
markdown_it/rules_inline/image.py 98.86% <100.00%> (ø)
markdown_it/rules_inline/link.py 98.83% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00a28a6...cc75081. Read the comment docs.

@chrisjsewell
Copy link
Member Author

Should finally be the last thing before a v1.0.0a1 release @choldgraf!

Copy link
Member

@hukkin hukkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I added some mainly type annotation related suggestions.

# Fake token just to render attributes
tmpToken = Token(type="", tag="", nesting=0, attrs=tmpAttrs)
tmpToken = Token(type="", tag="", nesting=0, attrs=copy.copy(token.attrs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict also has dict.copy() method if you want to avoid import copy 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

markdown_it/renderer.py Outdated Show resolved Hide resolved
import warnings

import attr


def convert_attrs(value: Any) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the typing be something like

Suggested change
def convert_attrs(value: Any) -> Any:
def convert_attrs(value: Union[List[list], Dict[str, Union[str, float]], None]) -> Dict[str, Union[str, float]]:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no, because conversion happens before validation in attrs, so technically it should be able to take anything and not fail

# Html attributes. Format: `[ [ name1, value1 ], [ name2, value2 ] ]`
attrs: Optional[List[list]] = attr.ib(default=None)
# Html attributes. Note this differs from the upstream "list of lists" format
attrs: Dict[str, Union[str, int, float]] = attr.ib(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int is implied by float so is not required. Is float used anywhere however? I know int is.

Suggested change
attrs: Dict[str, Union[str, int, float]] = attr.ib(
attrs: Dict[str, Union[str, float]] = attr.ib(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually would be less painful to just use Any instead of the Union[str, float]] in case someone gets an item directly without attrGet. I've added suggestions also elsewhere to annotate read API with Any and write API with Union[str, float] because of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why you suggest this, because isinstance(1, float) equates to False so int are certainly not a subclass of float.
As you mention, int is used in core otherwise really I was thinking that perhaps it could just be str only, since all these values always anyway have to be converted to strings in HTML attributes <div key="value">.
I can't think of any reason, or HTML tag, that would require it not to be one of str, int, or float and this restriction ensures that Token is always "JSONable"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in PEP 484 (https://www.python.org/dev/peps/pep-0484/#the-numeric-tower):

when an argument is annotated as having type float, an argument of type int is acceptable

The argument typing system is not one to one with isinstance, it is more "duck typed".

it could just be str

I like this idea but only if we take the liberty to diverge from upstream and actually always set str values only, i.e. convert before assigning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument typing system is not one to one with isinstance, it is more "duck typed".

Fair, although I think it is better for human readability/understanding to include int specifically. Bit weird IMO, that typing should diverge from isinstance

but only if we take the liberty to diverge from upstream

yeh thats why I ultimately decided against it, just not worth the hassle

markdown_it/token.py Outdated Show resolved Hide resolved
markdown_it/token.py Outdated Show resolved Hide resolved
@@ -272,13 +272,13 @@ def tag(self) -> str:
return self._attribute_token().tag

@property
def attrs(self) -> Dict[str, Any]:
def attrs(self) -> Dict[str, Union[str, int, float]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def attrs(self) -> Dict[str, Union[str, int, float]]:
def attrs(self) -> Dict[str, Any]:

return dict(token_attrs) # type: ignore
return self._attribute_token().attrs

def attrGet(self, name: str) -> Any:
Copy link
Member

@hukkin hukkin Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to not add this method, but use attrs.get(name) or attrs[name] instead. Much more pythonic, and since this class is new API, there is nothing forcing us to copy the Token classes attrs setter and getter methods (which only exist because attrs was not a mapping in JS upstream).

@hukkin
Copy link
Member

hukkin commented Mar 17, 2021

Should finally be the last thing before a v1.0.0a1 release

If you planning v1 already, could you have a look at #66 before releasing?

Out of all the issues we have, I think that is the only one including anything breaking, so if we decide to do it would be nice to do so before v1 I think

chrisjsewell and others added 3 commits March 17, 2021 12:02
Co-authored-by: Taneli Hukkinen <hukkinj1@users.noreply.github.com>
@chrisjsewell chrisjsewell merged commit 6204674 into master Mar 17, 2021
@chrisjsewell chrisjsewell deleted the Token.attrs branch March 17, 2021 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants