options & fixes #18

Closed
wants to merge 13 commits into
from

Conversation

Projects
None yet
5 participants
Contributor

cadorn commented Jun 16, 2011

Added support for options and made some fixes.

Contributor

cadorn commented Jun 16, 2011

FYI, I am using markdown in this project: https://github.com/pinf/docs-js

@@ -197,7 +197,7 @@ Markdown.prototype.processBlock = function processBlock( block, next ) {
var res = cbs[ ord[i] ].call( this, block, next );
if ( res ) {
//D:this.debug(" matched");
- if ( !res instanceof Array || ( res.length > 0 && !( res[0] instanceof Array ) ) )
+ if ( !isArray(res) || ( res.length > 0 && !( isArray(res[0]) ) ) )
@ashb

ashb Jun 16, 2011

Collaborator

How portable is this change? If possible we'd like to keep this working in browsers too.

Also when is isArray(x) true but x instanceof Array false?

@ashb

ashb Jun 16, 2011

Collaborator

My bad - I just noticed this is added further on in the patch.

@cadorn

cadorn Jun 16, 2011

Contributor

It's portable. I use it with nodejs, jetpack and browser. The isArray was taken from narwhal/util.

lib/markdown.js
+ name: jsonml[ 1 ].anchor
+ }, ""];
+ delete jsonml[ 1 ].anchor;
+ }
@ashb

ashb Jun 16, 2011

Collaborator

Can you give me an example of the MD input and the html output for this case?

@cadorn

cadorn Jun 16, 2011

Contributor

In options.preprocessTreeNode(jsonml, references) I do:

        case "header":
            var m = jsonml[2].match(/^(.*?)\s+\[([^\]]*)\]$/);
            if (m) {
                // TODO: better args splitting
                var args = {},
                    parts
                m[2].split(",").forEach(function(pair)
                {
                    parts = pair.split("=");
                    args[parts[0]] = parts[1];
                });
                jsonml[2] = m[1];
                if (typeof args.anchor !== "undefined") {
                    jsonml[1].anchor = args.anchor;
                }
            }
            break;

which matches:

Heading [anchor=name]
=======

I needed the change in markdown to add the anchor attribute properly.

@ashb

ashb Jun 19, 2011

Collaborator

Two things:

  1. Why not have the output be <h1 id="name">Heading</h1> (#fragment will search for id's as well as by anchor name tags)
  2. I'm really not sure this belongs in here: its specific to your use case (though I'm sure it would be useful for others) but perhaps more importantly
    • it's not behaviour in base markdown/other implementations
    • this can be achieved using the preprocessTreeNode callback, can't it?
    • The Maruku dialiect has a similar functionality and is already partially/mostly(?) implemented

I'm not against the idea of doing this - I'm just slightly concerned about setting precedent by implementing 'new features' (i.e. ones not found else where in the core library - the whole point of this library is to be able to tweak and extend the behaviour without having to re-write the whole parsing code.

@cadorn

cadorn Jun 20, 2011

Contributor
  1. I did not know that #fragment will search for IDs as well. So <h1 id="name"> is fine with me.
  2. I think the library should support any type of custom markup as not every use-case aims to be 100% syntax compatible with all the other markdown implementations. The header case I had to implement at that point in the code as there was no other way to do it. I took a look at the Maruku syntax and it covers the anchored header case so I will just use that.
Contributor

cadorn commented Jun 17, 2011

Added support for HTML blocks that stay untouched.

Collaborator

ashb commented Jun 19, 2011

Can you please keep a pull request to a single feature and not group together multiple unrelated commits.

Edit: Was this something you did manually or did github automatically add the 'cadron added some commits' part?

Contributor

cadorn commented Jun 20, 2011

As soon as I push more to master the pull request gets updated. To avoid this I would need to create a branch for every new feature. Kind of a big hassle right now as these are all minor things. I was hoping we can discuss any issues and you can then merge once I fixed it all. Would be nice if github had a feature to split out a pull request into multiple pull requests by creating branches in the background.

If you want I can split this one out or I can start with the next pull request.

Contributor

cadorn commented Jul 4, 2011

The header anchor change has been reversed. A bunch of other minor fixes have been added.

kragen commented on 50e2299 Jul 28, 2011

So, while on one hand I really want this feature for my application of markdown-js for, on the other hand I really want a way to filter the HTML to keep out things like the following:

  • unclosed <blockquote>
  • <script>
  • <a onmouseover>
  • <a href="jscript:...">
  • <a href="mocha:...">
  • <a href=" javascript:...">
  • <iframe>
  • <img width=1 height=1 src="http://...">
  • other things not mentioned here.

I think I'd also be a little happier with something other than false as the tag for as-is blocks. It is JSON-serializable, I suppose... I don't suppose there's a JSONML spec for this kind of thing, is there? Last I checked, the JSONML spec wasn't even clear as to whether the contents of JSONML elements were supposed to be CDATA or PCDATA.

I think another thing that we run into trouble with is entity handling. I ought to be able to write &copy; 2011 Kragen Javier Sitaker in a Markdown document and have the © entity get passed through to the output (as you can see that it is in this comment). And the list from the spec, "<span>, <cite>, or <del>", is just a list of examples, not a complete list of span-level HTML tags; the intent is that any span-level HTML tag can be used in those contexts.

What this adds up to is that we probably need to run all the strings that are the contents of paragraphs, list items, or headers, through a more or less actual HTML parser that can be supplied with whitelists of tags, attributes, and URL schemes, so that it can successfully pass through the subset of well-formed HTML that's right for the application in question. In modern browsers, we could actually use DOMParser, but in Node we might have to use our own. It probably doesn't have to be quite as robust as a browser's parser, since many applications (and basically all applications that use some arbitrary subset of HTML) will give the user a chance to preview and fix their Markdown, so if it barfs on overlapping span-level tags (as GitHub sort of does: <b>overlapping <i>span-level</b> tags</i>) or unclosed tags, it's not a big deal.

I'm not proposing that you should do all this work for me; I was just checking out the network to see if someone had already done it. It looks like you're the one that's come closest. Would it be useful to you if I did what I'm describing? Would it remove the need for the code in this commit for you?

Edited to add: I copied the above comment onto the latest issue requesting this feature.

Owner

cadorn replied Jul 28, 2011

kragen commented on d5af0fd Jul 28, 2011

Can you explain the connection between the commit comment and the code change?

Owner

cadorn replied Jul 28, 2011

This function looks for the XX:YY pattern on the first line of the document to parse out document-wide variables.

I am using NOTE: Hello World in some headings where the heading content is processed against all blocks. For some reason block.lineNumber is undefined when the heading is pared. This assumes the lineNumber is actually set for the first line.

Cool, thanks! I wonder if there's a way to get the lineNumber to always be set — surely this won't be the only piece of code that assumes it's set.

kragen commented Jul 28, 2011

I didn't realize there was a pull request for this branch. As I pointed out on issue 16, and less explicitly above, the as-is-HTML implementation above opens XSS security holes in applications that are currently using markdown-js to parse untrusted input, so I hope that you don't pull this branch!

Contributor

cadorn commented Jul 28, 2011

@kragen This branch contains a lot of fixes. Feel free to overhaul the HTML support and I can remove it from this pull request.

kragen commented Jul 28, 2011

Sounds like I'm committed now ;)

arikon commented Sep 18, 2012

Do you plan to merge?

Collaborator

ashb commented Sep 18, 2012

As of right now this doesn't cleanly merge - if you take a look at it and get it into the state where it does, sure.

Owner

evilstreak commented Aug 22, 2013

Several of these commits have been merged in and there are some other more up-to-date pull requests dealing with HTML support.

@evilstreak evilstreak closed this Aug 22, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment