Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggering update from beforeUpdate #143

Closed
fserb opened this issue Nov 8, 2021 · 14 comments
Closed

Triggering update from beforeUpdate #143

fserb opened this issue Nov 8, 2021 · 14 comments

Comments

@fserb
Copy link
Contributor

fserb commented Nov 8, 2021

I have some js that I bundle with esbuild. Writing a plugin for it, I ended up with something like:

const built = new Set();
site.loadAssets([".js"], async filename => {
  built.add(path.relative(site.src(), filename));
  // call esbuild on file
  return {content: esbuildOutput.content };
});

I make sure that all the dependencies are in an ignored dir, and everything works fine.

Now I want to update the main bundle when one of the deps change. I did something like:

site.addEventListener("beforeUpdate", async ev => {
  for (const filename of ev.files) {
    if (!filename.endsWith(".js")) continue;
    if (built.has(filename)) continue;

    await site.update(built);
    return;
  }
});

This "works", except that this triggers the whole update cycle twice (including all afterUpdate events). If instead of doing this, I simply add the files to ev.files, it doesn't work because the beforeUpdate of core/source doesn't get called on the new files.

So now I'm stuck: there's no way to restart or cancel the current update, and no way to ask source to clean the cache just for this file. What would be a good idea here?

@oscarotero
Copy link
Member

Ok, some thoughts:

  • The dispatchEvent function stops calling the listeners if one of them returns false (You can see here: https://github.com/lumeland/lume/blob/master/core/site.ts#L117). This was inspired by DOM events, that can prevent the default behavior by returning false(for example click or keydown events). Perhaps Lume could do the same and stops the current build/update when a listener returns false.
  • In your example, the bundle is executed in the loader. I think it's better to divide the load and bundle in two different steps:
site.loadAssets([".js"]); //Load the js files

site.process([".js"], (file) => {
    // Bundle the code
    file.content = esbuildOutput.content;
});

@oscarotero
Copy link
Member

oscarotero commented Nov 8, 2021

BTW, is there any specific reason to use a custom bundler? Have you try the bundler plugin?

@fserb
Copy link
Contributor Author

fserb commented Nov 8, 2021

Yeah. The Deno blunder is not even close to good enough for this (to be fair, it's not designed for this). I need inline source map, minification (and keeping top names, because of WebComponents), tree-shaking, backward compatibility js version support, etc.

I'll try Asset/process version, I think I tried it before and failed, but I'll report back.

@fserb
Copy link
Contributor Author

fserb commented Nov 8, 2021

Ahn, I figured out the problem.

esbuild's Deno version is a pain to work with, because it has a "server" even when you explicit asks it not to have one. And the server MUST be stopped, otherwise it hangs.

Because process is run in parallel, there's no right place to stop it. I ended up with:

site.addEventListener("beforeSave", ev => {
  esbuild.stop();
});

and that solves the problem and it seems to correctly update the js bundles when any other js file changes.

thanks for the help.

@fserb fserb closed this as completed Nov 8, 2021
@fserb
Copy link
Contributor Author

fserb commented Nov 8, 2021

Well, technically now the bundles are re-running every time anything changes, it seems.

@oscarotero
Copy link
Member

oscarotero commented Nov 8, 2021

Yes, this is the expected behavior.
The only difference between build and update is that update only reload the changed files and save the output files if they changed the content (using a hash as you can see here) but the process in the middle (preprocessor, render and processor) are executed to all pages.
This is because you never know how a change in a file can affect to other files.

@oscarotero
Copy link
Member

Hi again, @fserb
I just found your repo (https://github.com/fserb/bkc) and created a esbuild plugin for Lume using your code. You can take a look here. https://github.com/lumeland/experimental-plugins

@fserb
Copy link
Contributor Author

fserb commented Nov 12, 2021

Oh cool. :)

Yeah, there are a few issues with the current plugin that we may want to solve:

  1. as mentioned before, it would probably be better if we rebuild only if some set of extensions changed, as opposed to any file.
  2. it may be useful to consider two use cases:
    a. when we want to esbuild every js/ts files in the tree and we have a clear way to define dependencies (like node_modules or something).
    b. when we have a well defined set of js/ts files as entry points and everything else is a dependency.

What I currently do for b is: I have a jslib directory that I site.ignore() and leave all dependencies there.
It's kinda okey, but I wonder if there could be a way inside the plugin to handle those cases (like passing lists of matching globs either for entry points or dependencies).

I'm not sure if the current way Lume handles assets processing supports that. Do you think this is a reasonable request? If so, how could we do it?

I could try to do it over the weekend, but I'm not sure even if there's a way to solve either of those things at this point.

nit: we also probably want to remove incremental: true from the user options. But I think the overall way to deal with the args (and overwritting the build specific ones) was a great way to deal with it.

@oscarotero
Copy link
Member

Ok, I understand.

The reason to rebuild the entire site after any change, instead of only some pages, is because it's very complex to know in advance the pages that will be affected by any file change. For example:

  • A change in a css file can affects to all files that imports this file.
  • The affected files can be .css files, but also other extensions that generate css files (for example, a njk file that outputs a css file).
  • And if you use the inline plugin, that inlines assets like css, js, images, etc in the html documents, the change in the css file can affect to other html pages.

The only way I can think to have this is configuring directly by the user. For example, with an option like site.updateContext([".js", ".ts"]) (or a better name), indicating that the ".js" and ".ts" extensions are independent of the other pages, meaning that:

  • If a ".js" and/or ".ts" file changes, all ".js" and ".ts" files are regenerated but the other pages aren't.
  • If any other extensions changes (like "njk" files), all pages of the site will be regenerated except ".js" and ".ts" files.

But this option should be configured manually by the user because, as said, it depends on many things like relations between pages, plugins installed, etc.


And about the second issue, there's a similar request here #53. But I'd like to avoid globs because it can affects to performance.
Perhaps, this comment #53 (comment) is a good solution: allow not only strings, but also functions that returns true/false, so you can do something like this:

site.ignore((filename) => isIgnored(filename));

This could be a good solution for complex ignored conditions. Anyway, in your case I think the best solution is just ignore the folder (or rename the folder prefixing it with _).

Btw, can I add your site to the pages of examples? (https://lumeland.github.io/getting-started/examples/)

@oscarotero
Copy link
Member

@fserb I just pushed some changes in this direction. The idea is the following:

You can configure scoped extensions that will be treated independently of the rest of the pages. For example:

site.scopedExtensions([".css"], [".js", ".ts"]);

Here we are defining two scopes (one for .css files, and other for .js and .ts files).
This means that if a .css file changes, only the .css files will be rebuild, but not the other pages. And if a .js or .ts file changes, only .js and .ts will be updated.
If any other page changes (ex: a .njk file), all the "unscoped" pages will be rebuild, but scoped extensions (.css, .js and .ts) won't.

To manage this, there's a new Changes class that you can see here: https://github.com/lumeland/lume/blob/master/core/changes.ts. This class generates a function that will be used to filter the pages that will be updated.

I not sure if this explanation is clear enough, so let me know your thoughts. I'm also open to change the function name if there's a better name.

@fserb
Copy link
Contributor Author

fserb commented Nov 13, 2021

This makes sense to me. And it does address the problem.

I wonder if it would make more sense to make those scopes more generic. The reason I say this is that the two obvious dependency connection for any file is "my directory or any subdirectory" and "same extension".

I know you've been trying to avoid globs for perf reasons on copy. But maybe in the case of changes is a more contained problem, as we only have to iterate over the changes, not all files?

Overall, I keep wondering if extensions is the right abstraction across the board on Lume, or if things would be cleaner and more powerful if everything was a regexp (and extensions and globs are just subset of Regexps).

Re: adding the file as an example, sure. I haven't made the site officially public yet, but go ahead. :)

@oscarotero
Copy link
Member

oscarotero commented Nov 14, 2021

Ok, fair enough.
I've renamed the function to scopedUpdates and now it accepts functions instead of extensions. This allows to use glob or any other method to validate a file path:

site.scopedUpdates(
    (path) => path.endsWith(".css"), // A scope with all css files
    (path) => path.startsWith("/js/"), // A scope with all files inside "js" folder
)

And about the use of extensions instead of regexp, I have thought several times in a way to allow to define functions, but it's more complicated that it seems. The truth is that extensions are a very flexible and understandable format because it define which files must be loaded and how they will be processed:

  • Lume needs to know the extension to build the url of a page. For example:
    • /about-me.md is converted to /about-me/ url, removing the .md suffix
    • /styles.css.njk is converted to /styles.css url, removing the .njk suffix
    • /styles.css.tmpl.js is converted to /styles.css url, removing the .tmpl.js suffix
  • It also needs to know the extension to know how to handle every file (md, njk, jsx, etc)

A regexp only returns true or false (well, it can return captured values too), but it's more tricky to work with it to define loaders because we need to know the suffix of the path that we have to remove and sometimes it's not too easy. For example .tmpl.js suffix is used to generate pages using javascript templates and uses the module loader, which is different to .js that is the format of client side javascript files that are loaded as text files.

In short, we need the extensions, even if we use regexp, glob or any other advanced way to load pages. And having the two things (extensions and glob/functions/etc) would make Lume more complicated and confusing (in addition to possible performance issues). I'm not entirely closed to implement that, but I have to think in the best way to do it so they can coexist harmoniously.

@fserb
Copy link
Contributor Author

fserb commented Nov 14, 2021

+1 for scopedUpdates. It feels much better.

For the generic conversation. I hope I'm not sounding pushy, I really don't mean to. I was just using the opportunity to think about this a bit.

I agree that extensions make the conversion from src to dest obvious, but the way to solve this generically would be to define a new Format type, that has a tester (that tells if a path belongs to it or not) and a transform to the next stage. So a current extension would be:

const formats = {
  ".tmpl.js": { 
      match: path => path.endsWith('.tmpl.js'), 
      transform: path => path.slice(0, -8) 
  },
};

and we could have:

site.addExtensionFormat('.png');
site.addFormat("my format", path => path.startsWith('pages/', path => path.slice(6));
site.preprocess(['my format', '.png'], page => ... );

This takes care of the ergonomics problem, as we can use the keys only as identifiers (basically we could do this change without breaking any config files). It still leaves open a priority issue (when more than one format matches). I can think of two solutions here: add an explicit priority number to each format, or have a formatPriority list that lists keys in order. I slightly like the list more, but they are both fine.

I'm not 100% sure I understood the other point.

One thing that could be complicated is recursion. Because maybe treating local files and generated files the same is not good enough and people may need to match only one of the two. I'd see two solutions for this: either create a special location for generated files (files are matched as /src/<path> for real files, and/or /dest/<path> for files that were the output of a transformation) or have a separate way to decide if you want only 'original files' or something. Either has its small drawbacks, but they are both relatively simple. Or it may not be an issue at all.

@oscarotero
Copy link
Member

oscarotero commented Nov 14, 2021

I love to hear good ideas for Lume, so don't worry :)

Your idea of the Format type sounds great, but I'm not sure if it covers all use cases. For example:

  • A format has a loader assigned, a function responsive to read the file content. You can see the different loaders here.
  • There are two types of files: pages and data. The same format can have different loaders depending on the type. For example:
    • .js files can be loaded as text files, because they aren't executed, but bundled, minified, etc.
    • But javascript data files (like _data.js or _data/site.js) are loaded as javascript modules, because they need to be executed at the build time.
  • Formats are used also to detect the template engine used to render the pages. For example:
    • The page about-me.njk uses the Nunjucks template engine.
    • A page with the variable layout: main.pug uses the Pug template engine to render the layout file in _includes.
    • A page with the variable templateEngine: njk,md will be rendered twice, first using Nunjucks and then markdown see this
  • Processors and preprocessors use also the extensions to determine if a file must be processed before/after rendering. Both extensions (src and dest) are used. Let's say we register a processor to minify .css files:
    • A sass file (.scss) that ouputs .css is processed, because the extension is present in the dest file.
    • A .css file is processed too because the extension is present in the src file.

File extensions can cover all these casuistry with a single api (just prefix the format at the end of the file name) and it works in both directions (you can register anything (loader, processor, etc) for all files of the same extension; and you can assign a extension to a individual file). In addition to that, you don't need to see the code in the _config.js file to find out how a file will be loaded and processed, because it's clear in the file extension that, for example, a .md file is a markdown file that will be loaded and render as markdown.

Your proposal could work as a low level API to handle this but I see some limitations, for example how to use my format (in your example) as a format for layouts stored in _includes? or for processors/preprocessors? The format priority to resolve conflicts sounds a bit hacky...

Anyway, I'll keep thinking about this and open for suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants