Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First line is not parsed as markdown #2139

Closed
Nefcanto opened this issue Jul 22, 2021 · 8 comments · Fixed by #2605
Closed

First line is not parsed as markdown #2139

Nefcanto opened this issue Jul 22, 2021 · 8 comments · Fixed by #2605
Labels
category: docs Documentation changes good first issue Something easy to get started with

Comments

@Nefcanto
Copy link

Nefcanto commented Jul 22, 2021

Marked version: 2.1.3

Describe the bug
I'm using marked in my next.js app. I have this code:

const fs = require('fs');
const marked = require("marked");

var content = fs.readFileSync('/sample/markdown/file.md', 'utf8');
content = marked(content);

And this is my file.md:

# Formal sciences

***

## Tools

- [GeoGebra](https://www.geogebra.org/)

And this is my output:

imageedit_1_3630574482

But if I only add one line to the beginning of my file.md, then I get this result:

imageedit_3_9689168460

To Reproduce
Steps to reproduce the behavior:

It works on your demo playground. But it doesn't work in my next.js app. And I only call your marked method passing my file's content. Nothing more.

  • Create a next.js app
  • Add marked
  • Create an html file and serve it.
  • Now change the extension of that HTML file to .md.
  • First line is not respected as markdown.

If we create .md file in the first place, things work as expected. But when we change the extension of an existing HTML file to .md, first line breaks.

Expected behavior
The first line should be respected as markdown too. I can't add an empty line to all of my files.

@calculuschild
Copy link
Contributor

Is this only for # headers or for any markdown in the first line?

@Nefcanto
Copy link
Author

@calculuschild, that's weird, it's working for all other markdowns, but it's not working for #.

@UziTech
Copy link
Member

UziTech commented Jul 22, 2021

If we create .md file in the first place, things work as expected. But when we change the extension of an existing HTML file to .md, first line breaks.

It sounds like something is happening in this conversion.

Can you see what content is before sending it to marked?

var content = fs.readFileSync('/sample/markdown/file.md', 'utf8');
console.log(content);
content = marked(content);

@Nefcanto
Copy link
Author

Well, I did that.

And I don't see any difference. At least in the console. And since I'm running this inside next.js on the server, it prints to the terminal of course. But content is just the simple content of the file, and it starts with #.

But I used charCodeAt(0) and realized that the original .md files have character 35 at the beginning, while files that have been renamed from .html to .md have character 65279 at the beginning.

@Nefcanto
Copy link
Author

I managed to solve the problem by this line of code:

                if (content.charCodeAt(0) == 65279) {
                    content = content.slice(1);
                }

But I think it's better that you should also consider handling this thing. Thank you so much for this amazing tool.

@calculuschild
Copy link
Contributor

Aha. Yep, that looks like the Byte Order Marker that some text editors will insert into a document as a note that the document is UTF-8 encoded. Also known as a zero width no-break space.

We can probably filter it out in the start of the Lexer like we do with some other Unicode characters and replace it with a blank or a simple whitespace.

@UziTech
Copy link
Member

UziTech commented Jul 22, 2021

We can probably filter it out in the start of the Lexer.

I would rather not automatically remove or replace anything more unless there is a reason to believe no one would ever want it (or it complies with the CommonMark spec). It is easier to document that users need to replace it before running it through marked. Otherwise we get people who want to keep it in there and no way for them to do so (example).

@UziTech UziTech added the category: docs Documentation changes label Jul 30, 2021
@UziTech UziTech added the good first issue Something easy to get started with label Sep 10, 2022
matijagaspar added a commit to matijagaspar/marked that referenced this issue Oct 10, 2022
@matijagaspar
Copy link
Contributor

matijagaspar commented Oct 10, 2022

I tried to make a code PR addressing this, but after going through the code and CommonMark spec, I see @UziTech is right:

Straight out removing this characters might in some cases cause problems for someone using this library and changing the regex to ignore them would not be up to CommonMark spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs Documentation changes good first issue Something easy to get started with
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants