-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reflink-like extension #1562
Comments
Can you provide some markdown and resulting html to illustrate what you are looking for? |
This is an example paragraph with a reference[^ref].
[...] below
[ˆref]: This is the cite reference that will be listed at bottom of the article. Would either expose a list of <p>This is an example paragraph with a reference<sup id="backref:ref"><a href="#ref:ref">2</a></sup></p>.
<!-- [...] -->
<hr />
<ul>
<li id="ref:ref">This is the cite reference that will be listed at bottom of the article.
<a href="#backref:ref">↩</a></li>
</ul>
Edit: The goal is not to integrate such a feature into marked, but rather to ask how would be the best way to integrate such a feature (including the complexity of references) into the parsing pipeline. |
There are three ways You can change the output of marked:
It would probably be better to combine these approaches and do some preprocessing of the markdown (like parsing and removing the footnotes) before sending it to marked then convert the references to links in the tokens and adding the footnotes back after marked is done rendering the rest. |
I managed to make a custom method in front of Marked, that probably can be improved. I decided to use the following format:
I originally wanted to be able to handle multi-line blocks, but that would require to consider a paragraph as a single reference block, i.e. to consider two successive lines with a reference block as a single line. Basically, to say:
Do you think it makes sense? function marked(text) {
// Reference definition
const refblockRe = /^\^([\w\-]+): (.+)$/gm;
// Reference link
const reflinkRe = /\[\^([\w\-]+)\]/g;
// Defined references
const refs = [];
// New token list (stripped of reference blocks)
const editedToks = [];
// Lexing to remove paragraph-level blocks
const toks = _marked.lexer(text);
editedToks.links = toks.links;
for (const tok of toks) {
if (tok.type !== 'paragraph'
|| !tok.text.match(refblockRe)) {
editedToks.push(tok);
continue;
}
let matches;
while ((matches = refblockRe.exec(tok.text)) !== null) {
refs.push({
id: refs.length + 1,
selector: matches[1],
paragraph: _marked(matches[2]),
});
}
}
let parsedHtml = _marked.parser(editedToks);
const errors = {
refToUndefinedSelector: [], // Reference link to undefined block
unusedSelector: [], // Reference block that is never linked
};
// Every block that is defined, then used.
const usedSelectors = [];
// Every reference link that should be transformed or removed in the HTML
const reflinkTransformations = [];
// Parse and replace reflinks
let match;
while ((match = reflinkRe.exec(parsedHtml)) !== null) {
const selector = match[1];
const ref = refs.find(ref => ref.selector === selector);
// If reference to undefined selector
// (no blockref for corresponding selector)
if (!ref) {
errors.refToUndefinedSelector.push(selector);
reflinkTransformations.push({
mode: 'delete',
startIndex: match.index,
length: match.input.length,
});
continue;
}
usedSelectors.push(selector);
reflinkTransformations.push({
mode: 'replace',
startIndex: match.index,
length: match[0].length,
id: ref.id,
selector: ref.selector,
});
}
// Check for unused selectors
errors.unusedSelector = refs
.filter(ref => !usedSelectors.filter(selector => selector === ref.selector));
// Inverse-order browse to apply transformations without breaking indexes
for (const transformation of reflinkTransformations.sort((a, b) => b.id - a.id)) {
const {id, selector} = transformation;
const replacementValue = transformation.mode === 'replace'
? `<sup id="backref:${selector}"><a href="#ref:${selector}">${id}</a></sup>`
: '';
const before = parsedHtml.slice(0, transformation.startIndex);
const after = parsedHtml.slice(transformation.startIndex + transformation.length, parsedHtml.length);
parsedHtml = before + replacementValue + after;
}
return {
references: refs,
html: parsedHtml,
errors,
};
} Edit: I don't know if it is possible, but it'd be nice to be able to somehow "inject" custom routines, much like middlewares, into the compiler, to simplify extension. |
It looks like there is a spec for footnotes at markdownguide.org that uses It looks like you have the right idea with replacing
We have talked about adding some sort of If you want to create a PR I would be happy to review it. 😁 👍 |
The issue with the markdownguide.org spec is that it's interpreted as a link, something I didn't like.
That is already done, see. refs.push({
id: refs.length + 1,
selector: matches[1],
paragraph: _marked(matches[2]),
});
I may look into that once I have a bit more time to myself. Generators (two yields) or passing the lexer / parser would do for the extension? (should be discussed in another thread). |
I tried to convert my code to instead use the That would mean that the lexer would trust those blocks as links. Except that the current lexer implementation doesn't parse multi-word links (logical), so I'd need to internally change the lexer for that purpose. Such a change would probably mean I'd add a IMHO, some breaking changes should ultimately be done:
For now, I stay with my single-line |
Here is an implementation of footnotes that follows the spec.
This code removes the footnotes from the block tokens, including multi-line footnotes, and changes the references to html before parsing the tokens. After parsing it adds the footnotes back to the html. This code is in no way complete. There are probably edge cases that will fail but this should be a good start. const marked = require('marked');
const markdown = `
Here's a simple footnote,[^1] and here's a longer one.[^bignote]
[^1]: This is the first footnote.
[^bignote]: Here's one with multiple paragraphs and code.
Indent paragraphs to include them in the footnote.
\`{ my code }\`
Add as many paragraphs as you like.
`;
const footnotes = [];
const newTokens = [];
const footnoteTest = /^\[\^[^\]]+\]: /;
const footnoteMatch = /^\[\^([^\]]+)\]: ([\s\S]*)$/;
const referenceTest = /\[\^([^\]]+)\](?!\()/g;
// get block tokens
const tokens = marked.lexer(markdown);
// remove footnotes from tokens
for (let i = 0; i < tokens.length; i++) {
if (tokens[i].type !== 'paragraph' || !footnoteTest.test(tokens[i].text)) {
newTokens.push(tokens[i]);
continue;
}
const match = tokens[i].text.match(footnoteMatch);
const name = match[1].replace(/\W/g, '-');
let note = match[2];
// multiline notes will be considered indented code blocks
if (i + 2 < tokens.length && tokens[i + 2].type === 'code' && tokens[i + 2].codeBlockStyle === 'indented') {
note += '\n\n' + tokens[i + 2].text;
i += 2;
}
footnotes.push({
name,
note: `${marked(note)} <a href="#fnref:${name}">↩</a>`
});
}
// change references to superset links
for (let i = 0; i < newTokens.length; i++) {
if (newTokens[i].type === 'paragraph' || newTokens[i].type === 'text') {
newTokens[i].text = newTokens[i].text.replace(referenceTest, (ref, value) => {
const name = value.replace(/\W/g, '-');
let code = ref;
for (let j = 0; j < footnotes.length; j++) {
if (footnotes[j].name === name) {
code = `<sup id="fnref:${name}"><a href="#fn:${name}">${j + 1}</a></sup>`;
break;
}
}
return code;
});
}
}
newTokens.links = tokens.links;
let html = marked.parser(newTokens);
// add footnotes back to html
if (footnotes.length > 0) {
html += `
<hr />
<ol>
<li>${footnotes.map(f => f.note).join('</li>\n <li>')}</li>
</ol>
`;
}
console.log(html); |
Hi @UziTech, thanks for providing this solution, I just tried and it doesn't work for me using the latest version. Is that possibly related to the token changes? |
Yes, the tokens returned by |
@cyanzhong @UziTech I updated that code to work with the newer token structure. const marked = require('marked');
const markdown = `
Here's a simple footnote,[^1] and here's a longer one.[^bignote]
[^1]: This is the first footnote.
[^bignote]: Here's one with multiple paragraphs and code.
\`my code\`
Indent paragraphs to include them in the footnote.
Add as many paragraphs as you like.
`;
const footnotes = [];
const newTokens = [];
const footnoteTest = /^\[\^[^\]]+\]: /;
const footnoteMatch = /^\[\^([^\]]+)\]: ([\s\S]*)$/;
const referenceTest = /\[\^([^\]]+)\](?!\()/g;
// get block tokens
const tokens = marked.lexer(markdown);
// Check footnote
function checkFootnote (token) {
if (token.type !== 'paragraph' || !footnoteTest.test(token.text)) {
return;
}
const match = token.text.match(footnoteMatch);
const name = match[1].replace(/\W/g, '-');
let note = match[2];
footnotes.push({
name,
note: `${marked(note)} <a href="#fnref:${name}">↩</a>`
});
// remove footnotes from tokens
token.toDelete = true;
};
function checkReference(token)
{
if( token.type === 'paragraph' || token.type === 'text' )
{
token.text = token.text.replace(referenceTest, (ref, value) => {
const name = value.replace(/\W/g, '-');
let code = ref;
for (let j = 0; j < footnotes.length; j++) {
if (footnotes[j].name === name) {
code = `<sup id="fnref:${name}"><a href="#fn:${name}">${j + 1}</a></sup>`;
break;
}
}
return code;
});
if( token.type === 'paragraph')
{
// Override children
token.tokens = marked.lexer(token.text)[0].tokens;
}
}
}
function visit (tokens, fn)
{
for( var token of tokens )
{
fn( token );
// Visit children
if( token.tokens )
{
visit( token.tokens, fn)
}
}
}
visit( tokens, (token) => { checkFootnote(token); });
// Remove tokens from AST, starting with top-level
let workList = [ tokens ];
do {
let tokenList = workList.pop();
for(var i = tokenList.length-1; i >= 0 ; i--){
if(tokenList[i].toDelete){
tokenList.splice(i, 1);
}
else if( tokenList[i].tokens )
{
workList.push( tokenList[i].tokens );
}
}
} while( workList.length != 0 )
visit( tokens, (token) => { checkReference(token); });
let html = marked.parser(tokens);
if (footnotes.length > 0)
{
html += `
<hr />
<ol>
<li>${footnotes.map(f => f.note).join('</li>\n <li>')}</li>
</ol>
`;
}
console.log(html); This is the output: <p>Here's a simple footnote,<sup id="fnref:1"><a href="#fn:1">1</a></sup> and here's a longer one.<sup id="fnref:bignote"><a href="#fn:bignote">2</a></sup></p>
<hr />
<ol>
<li><p>This is the first footnote.</p>
<a href="#fnref:1">↩</a></li>
<li><p>Here's one with multiple paragraphs and code.
<code>my code</code>
Indent paragraphs to include them in the footnote.
Add as many paragraphs as you like.</p>
<a href="#fnref:bignote">↩</a></li>
</ol> |
Here is some faster code (~30% more ops/sec) that uses Marked's Warning Unlike the above code, this does not guarantee a footnote exists programmatically per reference. const footnoteMatch = /^\[\^([^\]]+)\]:([\s\S]*)$/;
const referenceMatch = /\[\^([^\]]+)\](?!\()/g;
const referencePrefix = "marked-fnref";
const footnotePrefix = "marked-fn";
const footnoteTemplate = (ref, text) => {
return `<sup id="${footnotePrefix}:${ref}">${ref}</sup>${text}`;
};
const referenceTemplate = ref => {
return `<sup id="${referencePrefix}:${ref}"><a href="#${footnotePrefix}:${ref}">${ref}</a></sup>`;
};
const interpolateReferences = (text) => {
return text.replace(referenceMatch, (_, ref) => {
return referenceTemplate(ref);
});
}
const interpolateFootnotes = (text) => {
return text.replace(footnoteMatch, (_, value, text) => {
return footnoteTemplate(value, text);
});
}
const renderer = {
paragraph(text) {
return marked.Renderer.prototype.paragraph.apply(null, [
interpolateReferences(interpolateFootnotes(text))
]);
},
text(text) {
return marked.Renderer.prototype.text.apply(null, [
interpolateReferences(interpolateFootnotes(text))
]);
}
};
marked.use({ renderer }); If you want to parse footnotes in other locations, just use the following template and place this in the renderer object. [token_type](text) {
return marked.Renderer.prototype[token_type].apply(null, [
interpolateReferences(interpolateFootnotes(text))
]);
} |
@jun-sheaf Thanks for your solution! It works great! |
Thanks a lot @jun-sheaf , can also confirm this works like a charm. Here's a version that will additionally add a section "References" (styleable with css class "marked-footnotes" see "footnoteContainerTemplate" below) around the footnotes on the bottom (I'm used to other markdown implementations doing this). const footnoteMatch = /^\[\^([^\]]+)\]:([\s\S]*)$/;
const referenceMatch = /\[\^([^\]]+)\](?!\()/g;
const referencePrefix = "marked-fnref";
const footnotePrefix = "marked-fn";
const footnoteTemplate = (ref, text) => {
return `<sup id="${footnotePrefix}:${ref}">${ref}</sup>${text}`;
};
const footnoteContainerTemplate = (text) => {
return `<div class="marked-footnotes"><h2>References</h2>${text}</div>`
}
const referenceTemplate = ref => {
return `<sup id="${referencePrefix}:${ref}"><a href="#${footnotePrefix}:${ref}">${ref}</a></sup>`;
};
const interpolateReferences = (text) => {
return text.replace(referenceMatch, (_, ref) => {
return referenceTemplate(ref);
});
}
const interpolateFootnotes = (text) => {
const found = text.match(footnoteMatch)
if (found) {
const replacedText = text.replace(footnoteMatch, (_, value, text) => {
return footnoteTemplate(value, text);
});
return footnoteContainerTemplate(replacedText)
}
return text
}
const renderer = {
paragraph(text) {
return marked.Renderer.prototype.paragraph.apply(null, [
interpolateReferences(interpolateFootnotes(text))
]);
},
text(text) {
return marked.Renderer.prototype.text.apply(null, [
interpolateReferences(interpolateFootnotes(text))
]);
}
};
marked.use({ renderer }); |
@talkdirty I'm trying to use your footnotes example #1562 (comment) with docsify. Unfortunately, it creates multiple "References" headings in my case within the same document, one for each footnote. From what I see, renderer.text gets called with the separate footnote paragraphs. Any idea? Some footnote[^1] and[^2] here.
[^1]: and here's the footnote paragraph.
[^2]: and even more foot notes. |
... Worked perfectly for me! Thanks so much, @talkdirty! Here's a slightly updated TypeScript version, in which I added an import { marked } from 'marked';
const footnoteMatch = /^\[\^([^\]]+)\]:([\s\S]*)$/;
const referenceMatch = /\[\^([^\]]+)\](?!\()/g;
const referencePrefix = 'marked-fnref';
const footnotePrefix = 'marked-fn';
const footnoteTemplate = (ref: string, text: string) => {
return `<li id="${footnotePrefix}:${ref}">${marked.parseInline(
text
)} <a href="#${referencePrefix}:${ref}">↩</a></li>`;
};
const footnoteContainerTemplate = (text: string) => {
return `<div class="marked-footnotes"><h2>Footnotes</h2><ol>${text}</ol></div>`;
};
const referenceTemplate = (ref: string) => {
return `<sup id="${referencePrefix}:${ref}"><a href="#${footnotePrefix}:${ref}">${ref}</a></sup>`;
};
const interpolateReferences = (text: string) => {
return text.replace(referenceMatch, (_, ref) => {
return referenceTemplate(ref);
});
};
const interpolateFootnotes = (text: string) => {
const found = text.match(footnoteMatch);
if (!found) {
return text;
}
const replacedText = text.replace(footnoteMatch, (_, value, text) => {
return footnoteTemplate(value, text);
});
return footnoteContainerTemplate(replacedText);
};
export const footnotes: Partial<Omit<marked.Renderer<false>, 'options'>> = {
paragraph(text) {
return marked.Renderer.prototype.paragraph.apply(null, [interpolateReferences(interpolateFootnotes(text))]);
},
text(text) {
return marked.Renderer.prototype.text.apply(null, [interpolateReferences(interpolateFootnotes(text))]);
},
}; I define my marked extensions in separate files from each other, but of course it was imported and used by marked the usual way: marked.use({ renderer: footnotes }); Note that I also had to tweak the TypeScript types, since |
Tried the code in #1562 (comment) - but it does not resolve footnotes in unnumbered lists, how to get that? Also if you use a URL in the footnote text it gets broken on a new line (actually, it seems that |
I've made some updates to my codebase since then, and simplified my footnote extension(s). Give the following a shot: import { marked } from 'marked';
const fnRefRE = /^\[\^([^\]]+)\](?!:)/;
const fnRE = /^\[\^([^\]]+)\]: /;
export const FootnoteRefExtension: marked.RendererExtension | marked.TokenizerExtension = {
name: 'FootnoteRefExtension',
level: 'inline',
start(src) {
return src.match(fnRefRE)?.index || -1;
},
tokenizer(src, tokens) {
const refMatch = fnRefRE.exec(src);
if (!refMatch) {
return;
}
const refToken: marked.Tokens.Generic = {
type: 'FootnoteRefExtension',
raw: refMatch[0],
ref: refMatch[1],
};
return refToken;
},
renderer(token) {
return `<sup><a href="#user-content-fn-${token.ref}" id="user-content-fnref-${token.ref}">${token.ref}</a></sup>`;
},
};
export const Footnotes: Partial<Omit<marked.Renderer<false>, 'options'>> = {
paragraph(text) {
const fnMatch = fnRE.exec(text);
if (!fnMatch) {
return false;
}
let returnString = '<ul><li>';
returnString += text.replace(fnMatch[0], '');
returnString += ` <a href="#user-content-fnref-${fnMatch[1]}" id="user-content-fn-${fnMatch[1]}">↩</a>`;
returnString += '</li></ul>';
return returnString;
},
}; I just confirmed that the following markdown works: * list[^1] item
* list item
---
### Footnotes
[^1]: A footnote |
I'd like to add a footnotes reference tag (
[^id]
).I assume it'd work like the link reference (reflink, in the source-code), but I don't really know how to extend the parser to add this custom tag.
How should I proceed to integrate a tag that is defined in two places, one building a reference body list and one replacing link tags with some HTML to point to the right reference?
The text was updated successfully, but these errors were encountered: