Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Watch mode only works once #84

Closed
MangelMaxime opened this issue Feb 18, 2021 · 16 comments
Closed

bug: Watch mode only works once #84

MangelMaxime opened this issue Feb 18, 2021 · 16 comments
Assignees
Labels

Comments

@MangelMaxime
Copy link

Context:

  • version (md-to-pdf -v): 3.2.1
  • platform (Unix, macOS, Windows): Windows
  • node version: v10.18.0

Describe the bug:

The watch mode only works the first time after it generates an empty PDF.

A clear and concise description of what the bug is. Feel free to include screenshots.

  1. Create a documentation.md file
  2. Run md2pdf documentation.md --watch
  3. Open the PDF and see that's it is generated correctly ✔️
  4. Make a change to documentation.md
  5. Open the PDF file and see that's it is generated correctly ✔️
  6. Make a change to documentation.md
  7. Open the PDF file and see that's the file empty now ❌
  8. You can repeat step 6 and 7 several time the file will always be empty now ❌

demo_bug_watch_md_to_pdf

The problem occurs also when adding --as-html.

@simonhaenisch
Copy link
Owner

Thanks, from what I can see watch mode works fine though. If you close and re-open the PDF, it should show the correct content (at least it does for me).

I'm on MacOS so I'm using Preview to view the PDF, and I have the same problem that it doesn't refresh anymore after the first change. Then I close the preview and re-open it, and then the refresh works again but only once.

I'll look into whether I'll have to close the file after each write. Assuming it has something to do with that... probably without closing it, it won't update the created-at/updated-at timestamp of the file until the process ends.

@simonhaenisch
Copy link
Owner

I've tried a few things and I think it's really an issue of the software you use to view the PDF. I've looked into Node's fs API and the call I use (writeFile from fs/promises) opens the file, (over-)writes it then closes it, so I don't think the issue is there. When I run md-to-pdf as a separate process after each change, I still have the same issue that Preview only refreshes after the first change, even though the file is updated correctly.

So I tried a different app (Skim) that supports auto-reload on file changes to preview the app, and that does reload after every change.

Kapture 2021-02-18 at 11 27 05

Not sure what PDF viewer there is for Windows that supports this as well but I'm sure you'll find one (:

@MangelMaxime
Copy link
Author

Hello @simonhaenisch,

I'm on MacOS so I'm using Preview to view the PDF, and I have the same problem that it doesn't refresh anymore after the first change. Then I close the preview and re-open it, and then the refresh works again but only once.

In the GIF, it is hard/impossible to see but I can confirm that even when the PDF is generated/shown empty Sumatra does trigger a refresh (there is a quick flickering probably removed by the GIF converter because too quick...).

For me the file is "really" not generated/written correctly. If I reopen, it is still empty.

When I run md-to-pdf as a separate process after each change, I still have the same issue that Preview only refreshes after the first change, even though the file is updated correctly.

For me, after each change the viewer is refreshing correctly the file when doing it manually.

demo_bug_watch_md_to_pdf_manual_process_check

Not sure what PDF viewer there is for Windows that supports this as well but I'm sure you'll find one (:

I am using SumatraPDF which support hot reload. At least, it was the first result when I did a google search on a PDF reader that doesn't block the *.pdf file when reading it.

An easier way to visualise the problem, is by using the html output as we can directly read the HTML file as it is just a text file.

demo_bug_watch_md_to_pdf_html_output

Note: When I am not recording the watch mode works always once only. When recording it seems like it was more random perhaps due to my computer recording my screen (so being slower?)

After, the GIF I tried 5 more times without the output being correct. I removed thuse try from the GIF as it was already long enough and didn't bring value.

@simonhaenisch simonhaenisch reopened this Feb 18, 2021
@simonhaenisch
Copy link
Owner

Ok thanks for the example with the html output, that's very strange. I'll investigate and see whether I can reproduce it.

@simonhaenisch
Copy link
Owner

Hm I just tried this a lot of times with HTML output and I can't reproduce it. You're saying that it happens more often when your pc is faster, so I've checked the source for race conditions but I can't see anything that looks wrong 🤷🏻‍♂️

I can't really do anything at this point, sorry. If you're keen to jump into the source and add some console.logs to see where it goes wrong, that would be very helpful. To get started, you just need to clone this repo and run npm link, then npm start. Then it automatically rebuilds every time you change a source file. You'll just have to restart the md-to-pdf process so it picks up the new files.

@MangelMaxime
Copy link
Author

I can't really do anything at this point, sorry.

No problem, I understand that it is hard to fix a bug you can't reproduce.

If you're keen to jump into the source and add some console.logs to see where it goes wrong, that would be very helpful.

Ahah, I was actually trying to reproduce it on my computer at home and toying with the code here are my discovery so far.

Reproducing the problem on my computer was really reliable with my reproduction step from above.

One way, I found to reproduce it was by triggering "several" changes in a short period of time. Basically, smashing Ctrl+S ~10 times instead of once for saving the file.

I am using VSCode and it seems like even if the file content didn't change it trigger the change event of chokidar.


I added some logs to show the end of the entry file and the end of the generated content.

I also added a custom check to make the process crash when the generated content was not the one expected.

My test markdown file looks this (space on the first and last line):

# Test
## Test2

<!-- TOC -->

- [Test](#test)
    - [Test2](#test2)

<!-- /TOC -->

Some content

Diff

image

Here is a case where the error occurred:

image

As you can see, it seems like it didn't generate the expected output because the entry content was empty. However, like I said before I was just pressing Ctrl+S quickly on VSCode so the file content should always be the same.

After that discovery I wanted to check the file information to see if VSCode was recreating a new file each time or something.

So I made this changes, to get back the stats object:

image

However, with that extra code it was much harder to make the program crash.

This leads me to this question:

Is it possible that, it is reading the file content too quickly when the change event occur?


Out of curiosity, I also decided to see what happened if I was doing the same thing using Notepad (the basic text editor included on Windows).

Well... even by spamming or keeping Ctrl+S pressed I wasn't able to make the program even without the stat code... This seems to leads us to the fact that VSCode or is doing something strange when writing the file.


Tomorrow, I will try to redo my tests on my computer at work to see the behaviour. See if using Notepoad works all the time on it and/or if adding a delay before the changes and reading of the file changes something.

Sorry for the really long comment, I hope that I was more or less clear in my explanations (a lot was going on) 😅

@simonhaenisch
Copy link
Owner

Thanks for all the details.

I actually tried saving the file twice so that two "generating" actions would race with each other and it was still working as expected for me (two reloads, both had the same content).

I didn't think about how VS Code writes the file, so that's some good discoveries there. It's possible that this is platform-specific. I did some searching for "VS Code atomic saves" and found this old node-sass issue, specifically this comment: sass/node-sass#1894 (comment).

I think it's pretty much the same issue: chokidar already picks up the file change before VS Code has finished closing it, so when md-to-pdf reads it it's still empty.

It's hard to tell who is to blame: VS Code could use atomic saves, then this wouldn't happen; chokidar could report the file change only after the file has been closed; md-to-pdf could check whether the file has been closed first.

Since the latter is the only thing I can do about it, i'll look into doing that.

@MangelMaxime
Copy link
Author

Hello @simonhaenisch,

I just tested on my laptop at work and when using Notepad instead of VSCode, I am not able to make the process crash using the process.exit adaptation.

It seems to confirm that this is a problem with how VSCode saves the files as you discovered.

I tried to see if the snippet shared in the discussion you linked sass/node-sass#1894 (comment) could improve the situation.

Using a setTimeout(..., 100) and it seems to improve the situations.

			if (args['--watch']) {
				console.log(chalk.bgBlue('\n watching for changes \n'));
-				new Listr([getListrTask(file)], { exitOnError: false }).run().catch(console.error),
+				watch(files).on('change', async (file) => {
+						setTimeout(() => {
+							new Listr([getListrTask(file)], { exitOnError: false }).run().catch(console.error)
+						}, 100)
+					}
+				);
			} else {
				server.close();

It is still not perfect because I can still make the process crash if I spam Ctrl+S a lot but that's is not a standard usage ^^. Also, if people use markdown extension in their IDE it's possible that it will trigger several generation because the IDE saves the files and then the extension re-save it update the TOC for example.

If I test it in a standard workflow using setTimeout seems to make the generation work all the times. And if I have a generation failing re-saving the file seems to fix it on the next generation. That's a huge improvement because before, when the generation had failed once (on my work computer) I had to kill the process and re-start it again.

md-to-pdf could check whether the file has been closed first.

Since the latter is the only thing I can do about it, i'll look into doing that.

From what I understand, it seems like this solution should be more reliable than the setTimeout but I wanted to let you know of a potential mitigation fix. If you need me to test an implementation before releasing a new package version fell free to ping me and I will build md-to-pdf locally on my computer to test it.

For now, I will use my fork with setTimeout to test it more than 30 sec and while working on a real documentation in my work codebase.

@simonhaenisch
Copy link
Owner

Ok thanks a lot for all the collaboration, I'll look into this now!

@simonhaenisch simonhaenisch self-assigned this Feb 19, 2021
@simonhaenisch
Copy link
Owner

simonhaenisch commented Feb 19, 2021

@MangelMaxime I've created a branch wait-for-file, do you think you could test it out?

Basically what it's doing is it's trying to open the file for writing first. If the file still has an open handle from another process, then this should throw an EBUSY error. Then we just keep re-trying until opening the file succeeds, and then we close the file and keep going.

@MangelMaxime
Copy link
Author

@simonhaenisch This looks really promising but... yes there is a but 😅

If I am using VSCode without any markdown extension, your solution seems to works 95% of the time when using it in a standard workflow (not spamming Ctrl+S).

To make it a bit more consistent I would increase the setTimeout delay to 25 perhaps. It should stills feel responsive.

Even, if the file can be generated "empty" if user did 2 quick saves by mistakes, re-saving generates the file correctly.

Here comes the but:

If I am using VSCode with an extension which for example update the Table of Content or something like that it will trigger 2 really quick saves all the times.

  1. VSCode save the file as the user wrote it
  2. The extension detect the change, read the file and re-save it with the updated content.

In this scenario, sometimes the generated file is ok sometimes not; lets say it is a 50/50 chances.

I think that what is happening is:

  1. A change is detected
  2. md-to-pdf checks if the file still have an open handle
  3. It detects that it's ok
  4. md-to-pdf decide to read the file content but at the same time another process decided to use the file. bad luck...

I am using the AlanWalk.markdown-toc extensions which is using onDidSaveTextDocument API instead of onWillSaveTextDocument. And I think that onWillSaveTextDocument allow the extension to modify the file before saving it on the disk. (Needs to check)

So there is probably a way to fix this extension in particular.


So for me we are at this point right now:

We consider that the an extension (from any editor) should always intercept the save command and modify the file content to have a single change trigger. In this case, your current fix seems to be working with just my proposition of extending the delay a bit more.

We consider that the problem I have with the extension can happen for another extensions or editor because it doesn't have the API to intercept a save command and then we want to have a solution for that too.

In that case, I think adding a "Debouncer" should do the trick as it would aggregate really quick changes into a single change. For example, Webpack, Nodemon, does that and expose the delay parameter so the user can customize it depending on its needs.


As a side note, yesterday was the first time I was able to use md-to-pdf with the watch mode using my hack and well it was a really pleasant experience. Thank you for your great work 👏.

@simonhaenisch
Copy link
Owner

simonhaenisch commented Feb 20, 2021

Yeah I agree that ideally extensions would not trigger additional file writes but hook into the save like a post-processor instead. I also want to make it easier to hook extensions into md-to-pdf via the config so that toc generation could be done that way (marked already allows to do that afaik, so it's just a matter of exposing it).

Anyway, I checked the chokidar options and there's no debounce option, however I found something in the docs called awaitWriteFinish (https://github.com/paulmillr/chokidar#performance), which is pretty much exactly what we need. Can you give it a try, by changing

watch(files).on('change', async (file) =>

to

watch(files, { awaitWriteFinish: { stabilityThreshold: 500, pollInterval: 100 } }).on('change', async (file) =>

(feel free to play with the values but I think the default threshold of 2000 is a bit high).

TBH I might just expose the whole chokidar.watch options, i. e. add a new watchOptions config, because the docs say that this is OS and hardware specific. Unless this doesn't work, then I can instead wrap the watch in a configurable debounce.

@MangelMaxime
Copy link
Author

I just tested using awaitWriteFinish and it seems to be what we need.

Here is the code I used, which removed the custom waitForFile.

if (args['--watch']) {
	console.log(chalk.bgBlue('\n watching for changes \n'));

	// const waitForFile = async (path: string) => {
	// 	let handle: fs.FileHandle | void;

	// 	// eslint-disable-next-line no-await-in-loop
	// 	while (!(handle = await fs.open(path, 'r+').catch(throwIfNotBusy))) {
	// 		await new Promise((resolve) => setTimeout(resolve, 5)); // eslint-disable-line no-await-in-loop
	// 	}

	// 	await handle.close();
	// };

	// const throwIfNotBusy = (error: any) => {
	// 	if (error.code !== 'EBUSY') {
	// 		throw error;
	// 	}
	// };

	watch(files, { awaitWriteFinish: { stabilityThreshold: 500, pollInterval: 100 } }).on('change', async (file) => {
		// await waitForFile(file);

		return new Listr([getListrTask(file)], { exitOnError: false }).run().catch(console.error);
	});
} else {
	server.close();
}
})

With this new option if I spam Ctrl+S it did trigger only one change (depending on the stabilityThreshold value) because the file was not being stable long enough.

TBH I might just expose the whole chokidar.watch options, i. e. add a new watchOptions config, because the docs say that this is OS and hardware specific.

Exposing the options makes sense to me.

Indeed 2000 is too high it doesn't feel responsive.

I would probably set the value to 250 to try having it work on "most" computers. On mine 100 seems to be ok but 250 would still be enough to feel responsive from the user point of view I think.

It seems like we have a stable solution to the different problems 👍

I also want to make it easier to hook extensions into md-to-pdf via the config so that toc generation could be done that way (marked already allows to do that afaik, so it's just a matter of exposing it).

Yep it is already kind of possible to customise marked used by md-to-pdf. But you have to wire everything you want in it.

const marked = require("marked");
const hjs = require('highlight.js');

const renderer = new marked.Renderer();

renderer.code = (code, language) => {

    if (language === "mermaid") {
        return '<div class="mermaid">' + code + '</div>';
    } else {

        // if the given language is not available in highlight.js, fall back to plaintext
        const languageName = language && hjs.getLanguage(language) ? language : 'plaintext';

        return `<pre><code class="hljs ${languageName}">${hjs.highlight(languageName, code).value}</code></pre>`;
    }

};

module.exports = {
    marked_options: {
        headerIds: false,
        smartypants: true,
        renderer: renderer
    },

Also, the benefit of using an extension in my case is that the TOC is written in the markdown file so I can read the TOC from it directly and click on the link while editing my file. If I was using marked the TOC would not exist in the markdown file only in the output.

Both have pros and cons and I suppose it depends on the usage made or habits.

@simonhaenisch
Copy link
Owner

I would probably set the value to 250 to try having it work on "most" computers.

I don't think I'll provide a default value here because I can't even reproduce the issue at all on macOS so it's probably not even necessary for a good chunk of the users. But thanks for confirming that this fixes it, i'll see that I make a commit and release this soon.

@simonhaenisch
Copy link
Owner

Just released 3.3.0 which has the new --watch-options flag or watch_options config option.

@MangelMaxime
Copy link
Author

Thank you @simonhaenisch for your help on this problem.

I will do the upgrade tomorrow at work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants