Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document side-effects of renderers' initialisation #56

Closed
Rogdham opened this issue Aug 31, 2018 · 6 comments
Closed

Document side-effects of renderers' initialisation #56

Rogdham opened this issue Aug 31, 2018 · 6 comments
Assignees

Comments

@Rogdham
Copy link
Contributor

Rogdham commented Aug 31, 2018

Hello, this is possibly an issue concerning the doc and not the code.

  • Parsing outside of the renderer's context manager:
d = Document('a <b> c')
with HTMLRenderer() as r:
    print(r.render(d))  # <p>a &lt;b&gt; c</p>
  • Parsing inside of the renderer's context manager:
with HTMLRenderer() as r:
    d = Document('a <b> c')
    print(r.render(d))  # <p>a <b> c</p>

Not sure where the difference in output comes from. CommonMark asks for the second output though, which seems to be what is performed in mistletoe.markdown and by the mistletoe command line.


$ python -V
Python 3.7.0
$ pip freeze
mistletoe==0.7.1
@Rogdham
Copy link
Contributor Author

Rogdham commented Aug 31, 2018

It seems that in fact it is not the use of a context manager that changes things, but the initialisation of the HTMLRenderer before the call to Document:

HTMLRenderer()
d = Document('a <b> c')
with HTMLRenderer() as r:
    print(r.render(d))  # <p>a <b> c</p>

@Rogdham Rogdham changed the title Parsing outside of renderer's context manager HTMLRenderer initialisation side-effect on HTML parsing Aug 31, 2018
@Rogdham
Copy link
Contributor Author

Rogdham commented Aug 31, 2018

Ok so as to why this is happening, if I'm not mistaken:

  • HTMLRenderer.__init__ calls BaseRenderer.__init__ with (HTMLBlock, HTMLSpan) as extra
  • BaseRenderer.__init__ will then register HTMLBlock and HTMLSpan with e.g. block_token.add_token
  • so HTMLBlock and HTMLSpan will now be tokenized

And if the renderers are always called inside a context manager, it's all good since the __exit__ takes care of the cleaning up.

I suggest updating the README to always use the mistletoe.markdown function, and never call the renderers directly. This may avoid future confusion.

Moreover, this means that the ASTRenderer and LaTeXRenderer do not comply with CommonMark since the do not register HTMLBlock and HTMLSpan. I guess it makes sense for LaTeXRenderer, not sure for ASTRenderer. Maybe add a note somewhere about that?

@Rogdham Rogdham changed the title HTMLRenderer initialisation side-effect on HTML parsing Document side-effects of renderers' initialisation Aug 31, 2018
@miyuchina miyuchina self-assigned this Sep 3, 2018
@miyuchina
Copy link
Owner

Hey there, thanks for this!

I agree with your points, and yes it is a bit confusing to have side-effects in renderer initializations. I'm actually considering getting rid of this behavior, and thus removing the necessity of using context managers entirely. Would that be a better solution? I'll also allow HTMLBlock and HTMLSpan to be tokenized by default in ASTRenderer.

I'll see if I can do so fast enough, but if not I'll add a note to the README.

Thanks again!

@Rogdham
Copy link
Contributor Author

Rogdham commented Sep 3, 2018

I'm actually considering getting rid of this behavior, and thus removing the necessity of using context managers entirely. Would that be a better solution?

I guess so.

For what it's worth, here is what I was expecting when I first encountered missletoe last week:

  • Specify which tokens to be generated when doing the parsing phase: given that it seems to be done by Document, I would expect to write something like Document(md, extra_tokens=[GithubWiki])
  • Then, the returned value could be rendered by different renderers without needing to parse the markdown text again
  • At that point, it would not be needed to tell the renderer about extra_tokens: I would expect the renderer to just throw an exception if it doesn't know about some token type

So something like this:

md = 'Hello, *world*!'
tokens = Document(md, extra_tokens=[GithubWiki])
print(ASTRenderer().render(tokens))
print(HTMLRenderer().render(tokens))

In other words, I would expect the parsing and the rendering phases to be independent. Since the types of tokens has to be specified in the parsing phase, it seems odds to me to specify it in the rendering phase instead.

Just my two cents. Obviously, you do what you want with the project, and what I mentioned above is likely to change the API and break backward compatibility.

I'll also allow HTMLBlock and HTMLSpan to be tokenized by default in ASTRenderer.

I don't really have an opinion on the matter, except that it should probably be documented if that's not the case.

That make me think of the following: it may be a good thing to be able to specify some tokens not to be generated. For example, if I want to do CommonMark and nothing more, how would I disable tables and footnotes? I'm not sure if that's really an important feature though.

@miyuchina
Copy link
Owner

The interesting thing about this is that Document is not meant to be the only entry point to parsing. My original intention is that users are free to swap out Document entirely for their own class, thus having a greater control over parsing (see, e.g., the amount of Markdown-specific logic in Document.__init__).

But separating parsing and rendering phases is a very good suggestion. I'll see what I can do with that.

... it may be a good thing to be able to specify some tokens not to be generated.

There's a nuclear option:

class MyRenderer(HTMLRenderer):
    def __init__(self):
        # ...
        block_token._token_types = []
        span_token._token_types = []
        # ...

... although manipulating an underscored variable is undocumented, and will probably change based on our discussion in this issue.

@pbodnar
Copy link
Collaborator

pbodnar commented Oct 30, 2021

Thanks for your suggestions. :)

So I've tried to document the side effects as originally requested. See f5ea6d6. I hope this will suffice for now.

Regarding separating parsing and rendering phases, I hope we will get to this one day, but let's do this in a separate ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants