Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out a consistent globbing strategy #2374

Closed
Rich-Harris opened this issue Sep 5, 2021 · 5 comments · Fixed by #2430
Closed

Figure out a consistent globbing strategy #2374

Rich-Harris opened this issue Sep 5, 2021 · 5 comments · Fixed by #2430
Milestone

Comments

@Rich-Harris
Copy link
Member

Describe the problem

In #2247, we switched to using micromatch (from globrex) to determine which files are included/excluded when creating packages. But we're still using globrex via tiny-glob here:

exclusions.forEach((exclusion) => {
excluded_paths = [
...excluded_paths,
...glob(exclusion, {
cwd: assets_dir,
dot: true
})
];
});

As a result, our inclusion/exclusion logic is inconsistent, and we're bundling two libraries with overlapping functionality.

Describe the proposed solution

Personally I'd love to not use micromatch. It's a very big library for constructing regexes. If we do need to provide the extra flexibility, then should we take a leaf out of uvu's book and just accept regexes in the first place?

export default {
  kit: {
    package: {
      exports: {
        exclude: /^internal/
      }
    }
  }
};

Alternatives considered

Instead of regexes, a function?

export default {
  kit: {
    package: {
      exports: file => !file.startsWith('internal')
    }
  }
};

Importance

nice to have

Additional Information

No response

@rmunn
Copy link
Contributor

rmunn commented Sep 6, 2021

A good general rule is that simple things should be simple, while complicated things should still be possible. Any time someone wants to write a list of included or excluded paths, they're going to expect globbing to be possible: '/app/**/*.js' is going to be natural to write. But for people with more complicated scenarios, there might be times when Bash globs aren't quite enough, so I also like the idea of regexes and functions being available for the use cases that need them.

So my initial thought is: why not all three? Allow strings, regexes and functions, as follows:

  • A string is taken to be a glob and passed through micromatch to turn into a regex
  • A regex is used as-is to match the paths
  • A function is treated as a predicate, taking one argument and returning a Boolean (or rather, something truthy or falsy)

It should be possible to bake that logic into a simple function, so that consistency can be maintained by always calling the same function for calculating inclusions and exclusions.

P.S. I don't think it's going to be possible to get away from using some globbing library, because a) people will expect to be able to write exclude: [ 'test/**/*.spec.js' ], and b) there are plenty of corner cases in globbing rules, so it's better to use an existing library that's already gotten it right rather than reinvent the wheel and probably get a lot of corner cases subtly wrong.

@Rich-Harris
Copy link
Member Author

I'm very sceptical of APIs that can take input in multiple forms. They're much more confusing for users, and end up creating overhead that has a habit of leaking into other areas of the project (a combinatorial explosion of tests, types, documentation...) beyond the code that normalises the globs and regexes into functions.

When in doubt, a function is probably best, because it covers everything — if someone wants complex globbing patterns, they can choose to use micromatch without all users needing to bear the cost of it:

import mm from 'micromatch';

export default {
  kit: {
    package: {
      exports: mm.matcher('!internal/**')
    }
  }
};

@benmccann benmccann added this to the 1.0 milestone Sep 9, 2021
@jiv-e
Copy link

jiv-e commented Sep 10, 2021

I'm just jumping in to contribute for the first time. This is the first issue on the list, so why not!

I'm not sure if I really understand the practical context perfectly, but I can comment on the general issue.

I think that injecting a filter function from configuration to the list_files function is a great idea. Gives all the power to the user. It's simple and easy to understand.

Maybe something like this...
See original code here:

exclusions.forEach((exclusion) => {
excluded_paths = [
...excluded_paths,
...glob(exclusion, {
cwd: assets_dir,
dot: true
})
];
});

Untested suggestion

function get_assets_list(config) {
	const assets_dir = config.kit.files.assets;
	/**
	 * @type {import('types/internal').Asset[]}
	 */
	let assets = [];
	if (fs.existsSync(assets_dir)) {
		/**
		 * @type {function(string): boolean}
		 */
		const excludesFromConfig = config.kit.serviceWorker.excludes || filename => false;
                const excludes = filename => endsWith('/.DS_STORE') || excludesFromConfig(filename);
		assets = list_files(assets_dir, '', [], excludes);
	}
	return assets;
...
/**
 * @param {string} dir
 * @param {string} path
 * @param {import('types/internal').Asset[]} files
 * @param {function(string): boolean} excludes Function for exclusion logic. Returns true for excluded file names.
 */
function list_files(dir, path, files = [], excludes = filename => false ) {
	fs.readdirSync(dir).forEach((file) => {
		const full = `${dir}/${file}`;

		const stats = fs.statSync(full);
		const joined = path ? `${path}/${file}` : file;

		if (stats.isDirectory()) {
			list_files(full, joined, files, excludes);
		} else {
			if (excludes(joined)) {
				return;
			}
			files.push({
				file: joined,
				size: stats.size,
				type: mime.getType(joined)
			});
		}
	});

	return files;
}

@Karlinator
Copy link
Contributor

I can get behind using a function for exclusions, though it does make some simple things slightly more complicated. It also makes the method of matching very explicit for the developer, which is good. Globbing, for instance, is not always consistent between flavours/options.

@jiv-e GitHub pro tip: if you write the language name (e.g javascript) just after the opening triple backtick you get syntax highlighting!

This:
```javascript
const something = 'a string';
```
becomes:

const something = 'a string';

makes it a little easier to read 😄

I haven't tested your suggestion either, but it looks reasonable I think?

@jiv-e
Copy link

jiv-e commented Sep 13, 2021

Online docs could show some examples how to do basic exclusions and globbing with some battle tested library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants