First line is not parsed as markdown #2139

Nefcanto · 2021-07-22T05:01:18Z

Marked version: 2.1.3

Describe the bug
I'm using marked in my next.js app. I have this code:

const fs = require('fs');
const marked = require("marked");

var content = fs.readFileSync('/sample/markdown/file.md', 'utf8');
content = marked(content);

And this is my file.md:

# Formal sciences

***

## Tools

- [GeoGebra](https://www.geogebra.org/)

And this is my output:

But if I only add one line to the beginning of my file.md, then I get this result:

To Reproduce
Steps to reproduce the behavior:

It works on your demo playground. But it doesn't work in my next.js app. And I only call your marked method passing my file's content. Nothing more.

Create a next.js app
Add marked
Create an html file and serve it.
Now change the extension of that HTML file to .md.
First line is not respected as markdown.

If we create .md file in the first place, things work as expected. But when we change the extension of an existing HTML file to .md, first line breaks.

Expected behavior
The first line should be respected as markdown too. I can't add an empty line to all of my files.

The text was updated successfully, but these errors were encountered:

calculuschild · 2021-07-22T06:05:12Z

Is this only for # headers or for any markdown in the first line?

Nefcanto · 2021-07-22T07:04:33Z

@calculuschild, that's weird, it's working for all other markdowns, but it's not working for #.

UziTech · 2021-07-22T12:40:21Z

If we create .md file in the first place, things work as expected. But when we change the extension of an existing HTML file to .md, first line breaks.

It sounds like something is happening in this conversion.

Can you see what content is before sending it to marked?

var content = fs.readFileSync('/sample/markdown/file.md', 'utf8');
console.log(content);
content = marked(content);

Nefcanto · 2021-07-22T15:31:11Z

Well, I did that.

And I don't see any difference. At least in the console. And since I'm running this inside next.js on the server, it prints to the terminal of course. But content is just the simple content of the file, and it starts with #.

But I used charCodeAt(0) and realized that the original .md files have character 35 at the beginning, while files that have been renamed from .html to .md have character 65279 at the beginning.

Nefcanto · 2021-07-22T15:37:45Z

I managed to solve the problem by this line of code:

                if (content.charCodeAt(0) == 65279) {
                    content = content.slice(1);
                }

But I think it's better that you should also consider handling this thing. Thank you so much for this amazing tool.

calculuschild · 2021-07-22T16:02:43Z

Aha. Yep, that looks like the Byte Order Marker that some text editors will insert into a document as a note that the document is UTF-8 encoded. Also known as a zero width no-break space.

We can probably filter it out in the start of the Lexer like we do with some other Unicode characters and replace it with a blank or a simple whitespace.

UziTech · 2021-07-22T18:03:20Z

We can probably filter it out in the start of the Lexer.

I would rather not automatically remove or replace anything more unless there is a reason to believe no one would ever want it (or it complies with the CommonMark spec). It is easier to document that users need to replace it before running it through marked. Otherwise we get people who want to keep it in there and no way for them to do so (example).

matijagaspar · 2022-10-10T14:05:17Z

I tried to make a code PR addressing this, but after going through the code and CommonMark spec, I see @UziTech is right:

Straight out removing this characters might in some cases cause problems for someone using this library and changing the regex to ignore them would not be up to CommonMark spec.

UziTech added the need more info label Jul 22, 2021

UziTech removed the need more info label Jul 22, 2021

UziTech added the category: docs Documentation changes label Jul 30, 2021

UziTech added the good first issue Something easy to get started with label Sep 10, 2022

matijagaspar added a commit to matijagaspar/marked that referenced this issue Oct 10, 2022

docs: Add note about zero width unicode characters (markedjs#2139)

bd711a9

matijagaspar mentioned this issue Oct 10, 2022

docs: Add note about zero width unicode characters (#2139) #2605

Merged

3 tasks

matijagaspar added a commit to matijagaspar/marked that referenced this issue Oct 11, 2022

docs: Add note about zero width unicode characters (markedjs#2139)

5cfb6ee

styfle closed this as completed in #2605 Oct 11, 2022

styfle pushed a commit that referenced this issue Oct 11, 2022

docs: Add note about zero width unicode characters (#2139) (#2605)

530b9ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First line is not parsed as markdown #2139

First line is not parsed as markdown #2139

Nefcanto commented Jul 22, 2021 •

edited

Loading

calculuschild commented Jul 22, 2021

Nefcanto commented Jul 22, 2021

UziTech commented Jul 22, 2021 •

edited

Loading

Nefcanto commented Jul 22, 2021

Nefcanto commented Jul 22, 2021

calculuschild commented Jul 22, 2021

UziTech commented Jul 22, 2021 •

edited

Loading

matijagaspar commented Oct 10, 2022 •

edited

Loading

First line is not parsed as markdown #2139

First line is not parsed as markdown #2139

Comments

Nefcanto commented Jul 22, 2021 • edited Loading

calculuschild commented Jul 22, 2021

Nefcanto commented Jul 22, 2021

UziTech commented Jul 22, 2021 • edited Loading

Nefcanto commented Jul 22, 2021

Nefcanto commented Jul 22, 2021

calculuschild commented Jul 22, 2021

UziTech commented Jul 22, 2021 • edited Loading

matijagaspar commented Oct 10, 2022 • edited Loading

Nefcanto commented Jul 22, 2021 •

edited

Loading

UziTech commented Jul 22, 2021 •

edited

Loading

UziTech commented Jul 22, 2021 •

edited

Loading

matijagaspar commented Oct 10, 2022 •

edited

Loading