Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does bundle include source files and other non-runtime dependent files in the project? #95

Closed
fourpastmidnight opened this issue Aug 20, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@fourpastmidnight
Copy link

fourpastmidnight commented Aug 20, 2021

Describe the bug

I'm a bit new to the whole JavaScript ecosystem and packaging and bundling in general. I know enough to be dangerous 😛

At first, in my project, I was trying to put my "production build' outputs into their own folder hierarchy. Long story short, it somewhat worked—though with using Yarn PnP, I needed to mimic some of the actual project hierarchy (so that PnP path resolution worked, because there's no (good) way to "re-run" yarn install on a production output (unless you copy the package.json files???—and if you're output hierarchy is different from development hierarchy, this still doesn't work anyway). Anyway, my original "production build" looked like:

  1. Run a build and put the resulting files in a separate location, say ./releases, which mimicked my project folder hierarchy to the point where the production built files could be resolved via .pnp.js.
  2. Copy yarn's .pnp.js file to this folder
  3. Copy the .yarn/cache folder
  4. Create a .env file at the root (currently done by a PowerShell script for now.)

And this worked. The biggest drawback is that I'm bringing over all the development dependencies (because I wasn't about to try and figure out what was needed and what is not).

And now I come to the point: I find yarn.build. This sounds promising. I didn't necessarily want to resort to using webpack or something.

So I've been doing a LOT of build testing (prior to using yarn.build), and have several files "polluting" my repo, e.g. previous release builds, and self-extracting zips of those builds, etc. When I ran yarn bundle, it did exactly what I expected it NOT to do: grabbed everything in the repo except .git (though, it did slim down the .yarn/cache to just those things needed by my app—I think, I haven't actually tested that the bundle works yet).

I thought yarn bundle was only supposed to package up only those things which are needed to run the project in production? So why are the various ./src directories for the monorepo packages being included? Why are non-project files being included (e.g. previous release builds in the root monorepo directory)? The bundle came out to a WHOPPING 89MB - 920MB!!! When I manually went into the generated bundle.zip and deleted everything NOT related to running the project, the bundle.zip file size was 5MB. That's more what I expected (actually, I expected 20MB - 30MB, but that just shows you how much weight development dependencies add to a project--in my case, 75MB).

To Reproduce

Steps to reproduce the behavior:

  1. Run command yarn bundle in a distributable package in a dirty (😉) repo

Expected behavior

I expected that only the output folders from the build (i.e. the folder listed in the package.json#directories, package.json#files, or inferred files/directories based on package.json#main to be included in the bundled output.

Screenshots

I'm in the middle of trying to get my build working the way I want it to--so there's lots of cruft in my project workspace at the moment. Once I have good, stable production builds being packaged correctly, I intend on cleaning all of this up. But for now, this is what I've got. And this is being packaged in the bundle.zip, inflating its size from 5MB to 890MB!!!!!
image

image

And here's what's contained in the bundle.zip:

image

Even yarn PnP specific stuff, like yarn sdks and plugins are included in the bundle!! But these aren't necessary at runtime, are they?? I never included them in my hodge-podge builds/packages I was creating before and this app is running in production just fine.

After deleting all the unnecessary cruft, here's what the bundle looks like, along with its size:

image

image

Qutie a difference!!!!

Desktop:

  • OS: Windows 10, no WSL
  • Yarn: v2.4.2
  • Node: v14.17.1

Additional Context

I did not use yarn build because the build scripts in my various package.json files are not simply named build and are not the same for all packages—e.g. for a react website, the command is simply build, but for other packages, the command could be either build:dev or build:prod. The simplistic option of specifying a command via -c is not sophisticated enough for my project. In addition, for the react web site, I need to pass different environment variables to Node depending on the environment I'm building for so that the right .env.* files are used. (Perhaps it's possible to do this with yarn.build, e.g. NODE_ENV=MyEnv yarn build?? But I didn't see any documentation suggesting this would work.)

@ojkelly
Copy link
Owner

ojkelly commented Aug 21, 2021

Hey @fourpastmidnight 👋

I think you've found a spot with the documentation that needs more work. bundle is essentially for nodejs apps, not websites. It's not a replacement for webpack, parcel, esbuild or even tsc, though is solves a similar problem specifically for nodejs applications.

Yarn's workspace feature (having multiple local packages), is a really powerful tool for code sharing. It allows you to break up your application, while sharing common code, in the same way as if that code was pushed to npm. Except it can stay local to the monorepo, and can be versioned with everything else.

When you run a nodejs app, it needs to go and load all the modules you have imported/required inside your app. (and those of your dependencies). Additionally, becuase of pnp, you need the .yarn/cache folder as well as the pnp.cjs file, and the folder structure.

The bundle command is run from the target workspace/package that you want to run. Knowing this, it looks at the dependency graph of that workspace, and gets rid of everything that isn't needed, but keeps everything that is. Whether it's in .yarn/cache or another local package.

In the past with node_modules, this was quite a challenge, especially with yarn 1, and module hoisting.

Imagine you have a monorepo with 10 lambdas/nodejs apps/microservices in it, and they all shared a bunch of code. Each package will have a different dependency graph, and wont need everything in the repo. So we can create a separate bundle for each.

In contrast, when building and distributing to the web we often collapse everything into a single file. Or more recently, a collection of files.


I thought yarn bundle was only supposed to package up only those things which are needed to run the project in production? So why are the various ./src directories for the monorepo packages being included? Why are non-project files being included (e.g. previous release builds in the root monorepo directory)? The bundle came out to a WHOPPING 89MB - 920MB!!!

bundle makes sure to modify your repo as little as possible, we don't know anything more than what yarn does.

The initial reason for this was that I typically build and deploy lambdas from CI, and so other release artifacts were not caught up in the final bundles. However, it's something that should be configurable, and at the very least should read the existing .gitignore files, so I've made #97 to track that feature.

The other ./src directories are there becuase your package imports them. When you import a local package with "packageName":"workspaces:*" it's referring to that package on disk. So if it wasn't there, the application would break.

The intent of bundle is to give you a zip that you can ship to AWS Lambda, Docker, or any other compute environment that can run nodejs (or ts-node etc) and all you need to do is run node entrypoint.js and it preloads pnp for you, and re-exports the script you defined in package.json#main.

At first, in my project, I was trying to put my "production build' outputs into their own folder hierarchy. Long story short, it somewhat worked—though with using Yarn PnP, I needed to mimic some of the actual project hierarchy (so that PnP path resolution worked

Yep that's precisely the problem bundle is meant to solve. Maintain the folder hierarchy, so pnp resolution works.

there's no (good) way to "re-run" yarn install on a production output (unless you copy the package.json files???

You'd need to copy the package.json files and the folder hierarchy for the whole repo, and much of what's in .yarn otherwise, you're basically starting from a fresh yarn install. note, inside .yarn/releases is the specific version of yarn that you're using, you need that too.

Run a build and put the resulting files in a separate location, say ./releases, which mimicked my project folder hierarchy to the point where the production built files could be resolved via .pnp.js.
Copy yarn's .pnp.js file to this folder
Copy the .yarn/cache folder
Create a .env file at the root (currently done by a PowerShell script for now.)

Yep sounds like you might not have copied everything you needed. You need all of .yarn, and .pnp.cjs is rebuilt each time you run yarn so you don't need that and can recreate it. If anything changes regarding which packages are on disk, you'll need to run yarn again.

So I've been doing a LOT of build testing (prior to using yarn.build), and have several files "polluting" my repo, e.g. previous release builds, and self-extracting zips of those builds, etc.

Just on this, it's helpful to control these. Either with .gitignore files, or adding a clean script that will clear out the build artefacts for a package. Often I'll have a build script that runs clean then build.

I expected that only the output folders from the build (i.e. the folder listed in the package.json#directories, package.json#files, or inferred files/directories based on package.json#main to be included in the bundled output.

This sounds like a feature to add. I often write a lot of TypeScript code, and for packages that are shared I don't always compile then, instead just pointing package.json#main to src/index.ts. However, for sure, there's a case to be made for removing the other folders.

I think we would need to add a per package config for folders to remove for the bundle, because it's not easy to know if a script in your output folder wants to reach into ./src or ./static or similar.

Even yarn PnP specific stuff, like yarn sdks and plugins are included in the bundle!! But these aren't necessary at runtime, are they

Some of the plugins could be, though perhaps an option to clear them out might be useful.

As an example, there's a plugin in this repo to allow you to use package.yaml instead of package.json, and if your run script in say Docker was yarn start you'd need the plugin to be included.

But mainly, yarn bundle tries to do the least modification needed. We can't guess which files to delete, because not everyones case is the same. And, yarn.build is quite useful in polyglot repositories too. So it only removes what it's sure it can (all the unused dependencies and workspaces).

I did not use yarn build because the build scripts in my various package.json files are not simply named build and are not the same for all packages—e.g. for a react website, the command is simply build, but for other packages, the command could be either build:dev or build:prod. The simplistic option of specifying a command via -c is not sophisticated enough for my project. In addition, for the react web site, I need to pass different environment variables to Node depending on the environment I'm building for so that the right .env.* files are used. (Perhaps it's possible to do this with yarn.build, e.g. NODE_ENV=MyEnv yarn build?? But I didn't see any documentation suggesting this would work.)

The yarn build -c option, is really a way for you to leverage the dependency graph to run a script in parallel, such as yarn build -c clean. There's a philosophical difference in how a monorepo should be setup, with this yarn.build being design such that every package must be buildable from the same command. In the same way they should all run tests with yarn test (which calls yarn run test.

That's not to say how you're doing it is wrong, just it's a different approach.

I'd argue that having a different command for each package to build makes moving between packages hard for developers new to the project, and that anything that needs to be in the environment for build, should likely be passed in instead at runtime. The only exception to that would be NODE_ENV=production.

Though also, yarn build is mostly relevant to either front end packages, or intermediate packages with outputs like say a GraphQL schema package, that outputs typings.

You can add a build command with your specific ENV vars required, but there is no differentiation between say dev and build.

The last part behind the philosophy here, is that you should build one artifact, test it, deploy it to dev, test it there, maybe run integration tests, then deploy it to production. The actual artifact never changes, giving you a higher confidence it will succeed in production.

Then, the only things that change per environment are ENV vars, secrets (which should always be passed at runtime), and the state of feature flags.

However, that may not be suitable for how your repository is setup. Which doesn't mean how you've setup is wrong, just it's different to what yarn.build is made for.


I think yarn bundle will still work for you, at least for any nodejs apps you have. You'll definitely want to add a script to clean out your artefacts. I'd also recommend treating each package/workspace as having hard boundaries. They shouldn't know anything above their package.json (except for exceptional cases). Their outputs should all be contained within their own folders - this makes both knowing where it came from, and cleaning straightforward.

Repository owner locked and limited conversation to collaborators Aug 21, 2021
@ojkelly ojkelly closed this as completed Aug 21, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants