Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
233 lines (149 sloc) 13.6 KB
uuid layout title slug subtitle date updated author author_slug header_img fb_img tw_img status language meta_title meta_description tags
a4e1d457-7479-401d-bb13-b6ec84cbe34a
post
Lean NPM packages
lean-npm-packages
2019-09-08 16:40:58 UTC
2019-09-08 16:40:58 UTC
Luciano Mammino
luciano-mammino
./lean-npm-packages.jpg
./lean-npm-packages-fb.png
./lean-npm-packages-tw.png
published
en_US
javascript
node-js
npm

Every developer on the planet knows how modular Node.js and the JavaScript ecosystem have become. This is probably due to the great job that package management systems and registries like bower (discontinued) and npm carried over in the last few years. I personally believe that this is also a consequence of the "many small modules" philosophy that has been popularised within the JavaScript ecosystem.

This is great, but all that glitters is not gold... Look, for instance, at this picture for a second:

node_modules heaviest objects in the universe

Yeah you have probably seen this picture before and it's probably not funny anymore... Anyway, this picture is a good summary right there on how this "many small modules" idea got a little bit out of hand within the JavaScript ecosystem.

Every time you run npm install you basically start to get so many files that you might feel like you are downloading the entire world wide into your hard drive! 😰

There are even tools that try to scout for node_modules folders in your system and get rid of them (E.g. wipe-modules). There are also some developers who showed how all the node_modules folders in their system is making their backups too slow (see tweet)!

Some like to make fun of this issue or they just complain about it. In this article, I don't want to do any of those things. I'd rather prefer to be a little bit more constructive and try to share some simple techniques to keep your NPM modules as lean as possible, so that other developers will save bandwidth and time when pulling your modules from NPM!

Repository vs Registry

In some languages like Go or PHP, what you have in a module repository is exactly what you get through the package manager when trying to install the module. This is because the code you download through the package manager is actually coming straight from the repository (or from a proxy that keeps a copy of the repository). In this cases, the structure of your repository is fundamentally tied to the file structure of your module: what you get by installing a module is pretty much what you would get by cloning the repository.

NPM doesn't work this way. In fact, NPM allows you to selectively push files into the registry, so you might end up with a very different file structure compared to what you have in your git repository.

While this interesting property of the system have caused some security issues in the past (see the event-stream module incident if you are curious), it also offers us an opportunity to be very selective with what we publish and keep the module lean.

This is especially important if you "build" your JavaScript code (e.g. using Typescript, Babel or a module bundler), so that the "distribution" (dist) version of your module is the result of a compilation/transpilation/bundling process. In such cases, you don't need to publish the entire codebase on NPM as your users will be using only the dist version of your code. The same goes for tests, documentation, images and other files that won't be used by the users of your module in their codebase, you should keep them only in your repository and avoid to publish them in the registry.

Conversely, you probably don't want to keep dist code in your repository. This code can easily be regenerated by the build toolchain when necessary and there's no point in tracking changes on the dist files when what you are really changing over time is the source code. In git you can use .gitignore to make sure dist files are kept out of the repository.

In short, registries are for production-ready code (dist) while repositories are for development code (src).

In the rest of this article we will see some ways to configure an NPM package so that all the unnecessary files will be excluded from the registry.

Publishing on NPM

With the NPM command line, npm publish is the de facto way of publishing new modules (or new versions of a module) into the NPM registry.

An NPM module is nothing else than a folder with a valid package.json file in it. It doesn't have to be a git repository (in reality the definition of what an NPM module can be is a little bit more complicated, to get the full spiel, check out the official NPM documentation).

By default npm publish will publish all the files in the package directory (including subfolders recursively).

So the first thing to do is to be careful and make sure that you don't have sensible files containing passwords, tokens or other sensible information in your project folder. It's generally a good idea to keep those away from the module folder, just in case...

You should also try to avoid to keep unrelated files in the same folder. Yeah, I admit that many times I did some quick n'dirty wget to get something I needed while I was working on a module and ended up with a lot of unrelated stuff published in my module. Please be smarter than me, don't do that! 😜

Default rules

Before starting to deep dive into the different ways you can specify the files to be included/excluded when you publish your package, let's see first what are the default rules.

No matter what you do, there are some files that are always excluded:

  • .*.swp
  • ._*
  • .DS_Store
  • .git
  • .hg
  • .npmrc
  • .lock-wscript
  • .svn
  • .wafpickle-*
  • config.gypi
  • CVS
  • npm-debug.log

Similarly, there are files that are always included:

  • package.json
  • README.md (and its variants, like README.markdown or README.rst)
  • CHANGELOG.md (and its variants, like CHANGELOG.markdown or CHANGELOG.rst)
  • LICENSE and LICENCE

Note that package-lock.json is NOT automatically included.

.gitignore & .npmignore

The first interesting property of npm publish is that, if your folder is also a git repository and you are using a .gitignore file, all the patterns listed in it will be used to exclude files.

So, for instance, if you have *.cache pattern in your .gitignore, all the files matching the pattern won't be published in the registry.

We discussed already that you might want to have different rules between what you track in your repository and what you publish to the registry, so relying on one configuration to ignore files for both targets might not always be a good idea.

In those cases you can create a more specific file called .npmignore (which supports exactly the same syntax as .gitignore). If this file exists, npm publish will use that to exclude files, rather than using .gitignore.

This means that there's no inheritance, the two files are totally independent. If you want a pattern to exclude files for both your repository and your registry, you will have to put the pattern in both configuration files.

One interesting lesser known (and rarely used) tip is that you can put .npmignore files also in subdirectories. The patterns specified in these files will apply only to the subtree of directories where the .npmignore is found.

The files field

If you don't like the idea of blacklisting some files (in fairness, you might forget to exclude a file with some sensitive information in it...) you can also follow a whitelisting approach.

In fact, NPM allows you to use a field called files in your package.json to specify an array of file patterns to include in the package.

From the official documentation:

The optional files field is an array of file patterns that describes the entries to be included when your package is installed as a dependency. File patterns follow a similar syntax to .gitignore, but reversed: including a file, directory, or glob pattern (*, **/*, and such) will make it so that file is included in the tarball when it’s packed. Omitting the field will make it default to ["*"], which means it will include all files.

One important rule is that files included with the files field cannot be excluded through .npmignore. In other words, the files field has higher priority than .npmignore.

.npmignore vs the files field

As we said, .npmignore is effectively a blacklist of files, while the files field acts as a whitelist.

This means that, if the files field is populated, everything is excluded by default and only those files explicitly listed will be included in the packaged tarball.

You are probably wondering now, should I use the files field or the .npmignore file?

To be honest, I don't think there's a silver bullet here. Just pick the mental model (whitelist vs blacklist) that comes easier to you.

An example

I generally prefer to keep my folder structure simple and explicit by having folders for source (src) and distribution files (dist).

With this approach you can simply say that src is what you want to keep in your repo (excluding dist) and, viceversa, in dist is what you want to publish on NPM (excluding src).

Just to make a very simple example, let's say we are building a new library and our code base contains the following files:

  • src/index.js: source code for our module logic (using ES2019 syntax, because we like to be cool! 😎)
  • src/index.test.js: unit test file
  • dist/index.js: distributable version of our module (transpiled to ES5 with babel)

Now we want to keep src/index.js and src/index.test.js in our repository (but not in your final package) and dist/index.js in our package (but not in our repository).

One way we can achieve this result is by adding dist/ to our .gitignore, this will make sure we never commit files from the dist folder to the repository. Then we can either use the .npmignore file or the files field to specify what goes in our package.

I personally prefer to use the files field, which in this case will be super simple.

{
  "name": "some-test-package",
  "version": "1.0.0",
  "main": "dist/index.js",
  "files": [
    "dist/"
  ]
}

Notice that I am also pointing the entrypoint (main) to our index.js file in dist. This is what will be used when our module is imported.

With this approach I can add all sorts of other files to my repo (e.g. integration tests, functional tests, images, documentation, etc.) and I won't have to worry about polluting my final package and making the end user download a lot of stuff that they won't need!

Testing the package files

But how do we know if our setup is correct? We don't want to publish the package just to see if our setup is correct.

Thankfully there are at least 2 ways to preview what's gonna end up in the registry with npm publish without having to actually publish anything.

The first way is npm pack, this command will create a tarball that contains all the files that will be published in the registry.

The output is actually pretty nice and it will list all the included files.

If we run npm pack on the package folder from the example above we should see something like this:

npm notice
npm notice 📦  some-test-package@1.0.0
npm notice === Tarball Contents ===
npm notice 74B  dist/index.js
npm notice 266B package.json
npm notice 13B  LICENSE.md
npm notice 39B  README.md
npm notice === Tarball Details ===
npm notice name:          some-test-package
npm notice version:       1.0.0
npm notice filename:      some-test-package-1.0.0.tgz
npm notice package size:  428 B
npm notice unpacked size: 392 B
npm notice shasum:        738776acad3cb41c549a884c6f9e946e7f367657
npm notice integrity:     sha512-QQS68QqFtfTGE[...]XmPGJpSYqmpKw==
npm notice total files:   4
npm notice
some-test-package-1.0.0.tgz

Note that only 4 files have been included:

  • dist/index.js
  • package.json
  • LICENSE.md
  • README.md

An alternative approach is to run npm publish in dry run mode with the flag --dry-run. With this approach no tarball is created but you will see the output of all the files that would be published with a normal npm publish run.

Conclusion

In summary, these are the main points I wanted to get across with this article:

  • What you have in your repository can (and probably should) be different from what you publish in the NPM registry.
  • You can exclude files by specifying patterns in .npmignore (similarly to .gitignore)
  • Alternatively, you can whitelist files by specifying patterns of files to be included in the files field in your package.json
  • There's a list of files that are always included and, similarly, a list of files that are always excluded (see list above).
  • Be smart and only publish the bare minimum needed for people to use your library: keep your NPM package lean!

With these advices we are probably not going to solve the node_modules drama, but at least we can do our part to make it a little bit more bearable.

Please, let me know what you think about these advices here in the comments. Did you know about these configuration options? Did you use other strategies to keep your NPM packages lean?

I'll see you in the next article. Until then, keep your NPM modules lean! 🤗📦

CIAO 👋

You can’t perform that action at this time.