File hash does not change after its content updates #188

h2jorm · 2017-12-10T07:55:43Z

This is a 🙋 feature request.

🤔 Expected Behavior

An output file hash should change when its content updates.

😯 Current Behavior

File hash does not change.

🔦 Context

<!-- index.html -->
<html>
<body>
  <script src="./main.js"></script>
</body>
</html>

// main.js
console.log(1);

When build, 'b695675d84099f097ec37d68c8c83fce.js' generates.

parcel build --no-cache --no-minify index.html

And then, change main.js

// main.js
console.log(2);

Build again.

parcel build --no-cache --no-minify index.html

The javascript file name is still 'b695675d84099f097ec37d68c8c83fce.js'.
I am not sure it is the expected behavior or not. However, when I using webpack, the output file hash will change every time its content updates.

🌍 Your Environment

Software	Version(s)
Parcel	v1.1.0
Node	v8.9.1
npm/Yarn	yarn v1.3.2
Operating System	macOS High Sierra 10.13.1

The text was updated successfully, but these errors were encountered:

DeMoorJasper · 2017-12-10T13:34:29Z

As far as i know this is expected behaviour.
What does change on file change is the hash contained in the fragment that is used for bundling.

{
  "dependencies":[],
  "generated": {
    "js":"\"use strict\";\n\nalert('helo');"
  },
  "hash": "d99518b2c556df9c6c4d8a2e9bd72423" // <-- This changes on change
}

Being generated here in Asset.js

generateHash() {
  return objectHash(this.generated);
}

The filename hash is however based on following parameters (so build, minify and serve files should have different cache names):

const OPTION_KEYS = ['publicURL', 'minify', 'hmr'];

sudhakar · 2017-12-11T04:50:39Z

@davidnagli Could you reopen this issue please. IMHO, its okie to not change hash during development build. But for production build, if hash doesnt change even when the content changes, then Cache-Control & ETag headers can not be effectively used.

In my case, I add react.js, react-dom.js etc on to a separate bundle vendor.js, which rarely changes. So I set it to cache for 1yr. If I happen to added one or two more libs, I wouldnt be able to bust the cahce as hash never changes as browser thinks that "I already has this file & no need to ask the server again" :(

devongovett · 2017-12-11T04:57:02Z

the filenames are not currently generated based on the hash of contents. you could do something like versioning, e.g. http://mycdn.com/v1.0.0/somelib.js. when you publish a new version, the url will change.

sudhakar · 2017-12-11T05:09:59Z

yes @devongovett thats a good idea. But it would be extra configuration to achieve cache busting. I am happy with current setup for now. But IMO, its nice to have it in the core, so users get cache busting for free!

jouni-kantola · 2017-12-11T05:13:30Z

Why close an issue that clearly adds value if it would be fixed, @davidnagli?

From my point of view, hashes should always be based on content. Long-term caching should be content based not release based.

devongovett · 2017-12-11T05:14:45Z

one other reason this will be hard to change is that the filenames need to be generated before the entire contents of the file is available, since assets are processed in parallel.

davidnagli · 2017-12-11T05:23:47Z

Sorry for closing the issue incorrectly. It was my understanding that this was Parcels expected behavior.

@devongovett So are we going to overhaul the caching system?

eXon · 2017-12-12T14:31:10Z

Maybe it would be easier to add the hash in the url at the query level instead of the filename. That way you can easily cache my-app.js?hash-here without having to change them for real. It's the best world of both.

chee · 2017-12-12T22:47:45Z

What about creating a random string (build time, perhaps) and using it in the filename?

like

filename = `${hash}.${buildstamp}.${ext}`

With a buildstamp of Date().now().toString(36) you'd get a filename like:

d710beaad39d4ee3906c24983931b45b.jb47tk6c.js

You would get a cachebust file with every build, and the contents of the file would not need to be known before the filename.

(the buildstamp would be the same for every file in that build)
(entry would not get stamped)

DeMoorJasper · 2017-12-17T19:43:44Z

If this is about releasing code/cache issues with users, why not just append the version number found in package.json to the hash or before the extension?
This would need to get implemented in parcel, not in an after-building script or whathever

jouni-kantola · 2017-12-17T19:53:26Z

This is a typical scenario I think should be supported. I'm addressing this from a web performance/UX perspective rather than DX.

I have 1/n vendor bundles with filenames including a hash
I have n code splitted bundles with application code, styles, etc
I fix a bug in an application module, and release
Client only needs to download that specifically code splitted bundle where I fixed the bug

A scenario like this could save the client from re-downloading hundreds of KiB's. If the whole release would be versioned everything would be cache busted.

chee · 2017-12-17T19:55:14Z

@DeMoorJasper i'm sure there are many people using npm for managing their codebase who aren't bumping their version number every time they make a change, because they aren't publishing it as a module.

think a continuous deployment setup where there are several people merging pull requests into a main branch that's being built on the server and sent down the tubes.

They'd, as i would, want the cache to be bust by build rather than by the version number (which may not have changed).

When code-splitting, it'd be great for a built file's name to be the same as last time unless a dependency changed, so the same thing never needs to be redownloaded if it isn't changing.

DeMoorJasper · 2017-12-17T19:56:55Z

@chee was just an idea totally forgot about browser cache and web performance, wondering how this would be implemented.
Now i'm leaning more towards your timestamp approach

vforvalerio87 · 2017-12-20T17:44:09Z

This should be really fixed imho; I just implemented parcel in a project and everytime I make any change to js or css I have to manually add a progressive number and change the reference in the html, otherwise when I deploy to production (which has browser caching and a CDN) the server won't give me the updated version of those files.
In my opinion the best approach would be the content checksum approach.

shawwn · 2017-12-20T18:07:42Z

Since this problem only impacts production builds, one solution would be to stick a random query parameter (or current timestamp) in the bundled HTML. E.g. ``` <script src="/dist/4bf9825be5009102663282d9e776881e.js?192832984"></script> ``` We should try to avoid content hashing if possible. Although it seems like the correct solution, it's currently possible to generate the bundled name (/dist/${hash}.js) with no performance penalty at all, and without needing to access the contents of the original JS file. It's based solely on filename. This is important, because Asset generation happens in a subprocess. In that child process, Assets don't have access to the contents of any other asset. An example would be helpful. StylusAsset calls `addURLDependency` like so: ``` // Setup a handler for the URL function so we add dependencies for linked assets. style.define('url', node => { let filename = this.addURLDependency(node.val, node.filename); return new stylus.nodes.Literal(`url(${JSON.stringify(filename)})`); }); ``` `addURLDependency` returns the bundled filename, e.g. `/dist/${hash}.png`. If we were to change our hashes to require the *contents* of the other asset, not just its filename, then this would break our parallelism. Every asset would need to either get the contents of all its referenced assets, or query the parent process for the correct hash of a filename. Either way, that would require waiting on data. Right now we don't need to do that. We should maintain the parallel nature of Parcel as much as possible. And part of that parallelism is the fact that each asset never needs to examine the contents of any other asset for any reason. Therefore, if we could stick random query parameters into the bundled HTML, that seems like the best cache busting solution.

…

On Wed, Dec 20, 2017 at 11:44 AM, Valerio Versace ***@***.***> wrote: This should be really fixed imho; I just implemented parcel in a project and everytime I make any change to js or css I have to manually add a progressive number and change the reference in the html, otherwise when I deploy to production (which has browser caching and a CDN) the server won't give me the updated version of those files. In my opinion the best approach would be the content checksum approach. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#188 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADo8FzEIAEah-ICYrgCf8wFC4m6H2Ksks5tCUdqgaJpZM4Q8aQh> .

shawwn · 2017-12-20T18:16:13Z

When code-splitting, it'd be great for a built file's name to be the same as last time unless a dependency changed, so the same thing never needs to be redownloaded if it isn't changing.

Good point. The query parameter should be the UTC mtime of the source asset file, not random. That will preserve caching.

One way to do this without breaking asset-level parallelism is to modify HTMLPackager and CSSPackager to scan for bundled URLs (/dist/${hash}.${ext}) and substitute in the query parameter (/dist/${hash}.${ext}?${mtime}).

That can happen as a post-processing step, after bundling. It should be possible to do this efficiently.

That will preserve all the previous advantages, like the fact that assets can generate bundled filenames without needing the contents of the other assets or querying the parent process.

vforvalerio87 · 2017-12-20T18:54:10Z

The mtime approach is undesirable though because it busts cache for every asset every time. I'd rather stick to manually renaming assets in that case because I don't want every user to re-download everything again every time I deploy a website (possibly numerous times per day)

shawwn · 2017-12-20T19:08:32Z

Yes, mtime was a dumb idea. It wouldn't work for buildservers, for example, or if you re-cloned a repository. It occurs to me that just before packaging, all of the assets' content hashes are known. (`asset.hash`) So we can do the same thing that my previous comment outlined, but put ${asset.hash} into the query parameter rather than mtime. In other words, the correct time to handle this is during packaging, rather than during asset generation. Packaging happens in the main process, so we have access to the content hashes. The only problem is how to correctly re-link the dependencies in generated source code (e.g. how to change occurrences of "/dist/${hash}.js" to "/dist/${hash}.js?${asset.hash}" without causing problems or analyzing the generated code) but a simple string substitution might work in all cases.

…

On Wed, Dec 20, 2017 at 12:54 PM, Valerio Versace ***@***.***> wrote: The mtime approach is undesirable though because it busts cache for every asset every time though. I'd rather stick to manually renaming assets in that case because I don't want every user to re-download everything again every time I deploy a website (possibly numerous times per day) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#188 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADo8H0P4jFCYU_9YFOs6Y8vlvY_rbR3ks5tCVfTgaJpZM4Q8aQh> .

jouni-kantola · 2017-12-20T19:10:28Z

To be honest, I'd much rather have a slow build where hashing was content based, than having users re-downloading assets when they shouldn't need to. What about a flag to the CLI?

Munter · 2017-12-25T23:46:14Z

Here are some learnings from Assetgraph, where we solved the same issue.

You absolutely want to do content hashing to you can achieve deterministic content addressable file names that lend them selves well to far future cache expiry. Random build-specific hash busts the cache too often. Query parameters aren't always treated correctly by proxies in between the server and client.

You do however not need to do content hashing a lot of times. You can get away with doing them once, at the point where you know you are done making source code modifications and are ready to write out to disc.

The hash renaming must be done in a depth first post-order graph traversal to ensure content hashes updating all the way up to the entry points when deeply nested dependencies update. Any other traversal algorithm will result in caching errors

chee · 2017-12-26T18:56:48Z

Query parameters aren't always treated correctly by proxies in between the server and client.

Some proxy software classifies anything with a query string as dynamic content, and so does not cache it at all. This is, for instance, Squid's default behaviour.

benhutton · 2018-01-09T17:56:28Z

@devongovett what are you thinking about this one? Is asset fingerprinting something that you agree should be baked into Parcel's core? Is it something we should try to figure out how to add the correct hooks to write a plugin for? Is it something I should try to find some other way to write a post-processor to accomplish?

shanebo · 2018-01-09T21:34:33Z

@devongovett, me and @benhutton are willing to invest some time and energy into this fingerprinting issue but we don't want to head in an implementation direction you aren't a fan of. Would you be open to putting some thought into this with us so we can try and work on a PR solution, or plugin?

shawwn · 2018-01-09T21:36:56Z

Hey Shane, sure thing. Hop on our slack and ping me: https://slack.parceljs.org/ (I'm @shawwn)

…

On Tue, Jan 9, 2018 at 3:34 PM, Shane Thacker ***@***.***> wrote: @devongovett <https://github.com/devongovett>, me and @benhutton <https://github.com/benhutton> are willing to invest some time and energy into this fingerprinting issue but we don't want to head in an implementation direction you aren't a fan of. Would you be open to putting some thought into this with us so we can try and work on a PR solution, or plugin? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#188 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADo8K5fHqo6nk3E3xvfPRO6eb5v8RjVks5tI9tqgaJpZM4Q8aQh> .

devongovett · 2018-01-10T06:47:11Z

@shanebo @benhutton I think this will be very difficult to achieve in the current parallel architecture. We can't know the content hash until all assets have been processed, but we need to know the final bundled URL during asset processing so URLs can be placed in the right places (e.g. CSS files linking to images, HTML files linking to CSS, etc.).

If you have suggestions for ways around this, or alternative hashing/versioning strategies, let me know!

chee · 2018-01-10T08:50:21Z

@devongovett what if the files were generated as template files, with placeholders for the paths

<script src="{{main.js}}"></script>

then those are compiled afterwards in a separate operation?

(so the hash of the file would be the content of the file with the template markers in it)

the difficulty might be if a codesplit dependency changes, but the parent does not. that the parent would still need cachebust even though nothing is different.

Munter · 2018-01-10T14:28:01Z

@chee oh no, please don't invent conventions that take the source code away from working in a browser :'(

Munter · 2018-01-10T14:31:41Z

@devongovett

We can't know the content hash until all assets have been processed, but we need to know the final bundled URL during asset processing so URLs can be placed in the right places (e.g. CSS files linking to images, HTML files linking to CSS, etc.).

Why do you need to know the final URL before all assets are processed? What if you just used the current names as a temporary name. You could actually keep it like this in the development environment, since content addressable URL's are more of a production feature.

When you do a production build you could just rename the files once more and update the references to them. Or does the build pipeline somehow lose the references to an asset in the middle of the pipeline?

benhutton · 2018-01-10T14:49:09Z

Likely what needs to happen, either as a plugin or as part of the parcel core, is some sort of post-processor that does the depth first post order traversal mentioned at the bottom of #188 (comment)

We assemble the full and final graph and do a quick walk through it, renaming files and then re-referencing them further up.

It should be relatively fast, but I agree that it only needs to happen in production. It could easily be hidden behind some sort of flag on the executable.

As to why we care about this particular strategy so much: it's the only thing that seems to work reliably with CDNs. (That we know of! If someone else has a better solution, I'm all ears.)

Query strings are non-ideal, as mentioned here: File hash does not change after its content updates #188 (comment). They also don't play nice with image transformation SaaS like imgix. But even if we did go with a query string, it would have the same post-processing concerns that we have here.
Versioning the whole directory is both error-prone and leads to overly-aggressive invalidation
Etags would be the remaining option, but that takes the CDN approach off the table, removing our assets from the edge locations and potentially bogging down our servers with a ton of unnecessary asset queries.

@shawwn, @shanebo and I will try hitting you up on slack later today to talk through this more.

chee · 2018-01-10T14:51:19Z

@Munter i'm talking about doing this as part of the compile step, not as something the developer would have to do, and it would go away.

Munter · 2018-01-10T15:02:18Z

@benhutton I just jumped on your slack as well (@Munter). Feel free to ping me if you need any feedback on how we implemented this in assetgraph. I don't know if the models are close enough to each other to be able to do the same, but it feels close from inspecting the sources here

DeMoorJasper · 2018-01-10T19:02:56Z

@benhutton what about just keeping the current naming-system (for initial and development naming) but renaming all references at the end of a production build to the content hash, it's sort of the same as what @chee and u suggested but i'm pretty sure it'll be way easier to implement

benhutton · 2018-01-11T16:05:41Z

@DeMoorJasper I think that maybe we're talking about the same thing? Only change things for production, and do it at the end.

I don't think there is any way around doing a tree traversal, though. That is, I think that this algorithm will NOT work:

Find the md5 hash of every file.
Rename those files to include the hash.
Go back and edit the files to include references to the new file names with the hashes.

Instead, we need to do the tree traversal that @Munter described.

Find your graph.
Find a leaf node.
Hash, rename.
Update references to that file.
Walk the tree back up, repeating for each file.

The idea is that when any given node changes, all the nodes above it will end up changing too as the references trickle up. And any nodes that are NOT affected will NOT change. So you are busting exactly the right caches at the right time.

Here's the big principle: A file doesn't get edited after it gets hashed. The hash is of the FINAL content of that file.

@DeMoorJasper does that make sense?
@Munter am I describing the algorithm you had in mind accurately?

Munter · 2018-01-11T21:49:04Z

@benhutton That is the exact right algorithm and the correct reason you describe.

This image always helps me visualise it best:

Traversal order: A C E D B H I G F

It's still important to start at your entry point(s) and just remember to put the hashing logic after child traversal. This is the what we do in AssetGraph: https://github.com/assetgraph/assetgraph/blob/master/lib/AssetGraph.js#L445-L462

When you extend Parcel with multiple entry points you probably want to keep track of seen assets to avoid double work as well

fritx · 2018-02-09T02:45:42Z

Is there any workaround for now?

augnustin · 2018-03-08T10:34:17Z

I completely 👍 the MD5 hash naming strategy and I'm glad this was the final pick. Parcel is a lot beautiful because it is plug and play, and it needs to remain this way!

Looking forward to seeing this available in production. Any idea if this would be within few months or much more than that?

Cheers

augnustin · 2018-03-15T16:56:26Z

I'd like to mention that IMHO this issue is top priority:

For now, my deploys are completely random. I try many things before I can serve the last version of my assets. Among those:

assets rebuild
rm -rf public/* && assets rebuild
service nginx restart

still I get unpredictable results...

This makes it unusable in real production context.

augnustin · 2018-03-15T17:25:06Z

Ok I fixed my issue by doing rm -rf .cache. This might be another issue but I'm reporting here in case someone faces the same situation. I'll create the other one when I have more predictable results to share.

devongovett · 2018-03-21T02:58:18Z

Should be solved by #1025 which generates content-hashed filenames for static assets. Please help test using the master branch - a release will hopefully come next week!

augnustin · 2018-03-21T08:28:30Z

Wonderful! Great reactivity.

Definitely willing to test it as soon as it is released. If you can here or in #1025 this would be perfect.

Cheers

davidnagli added 🐛 Bug and removed 🐛 Bug labels Dec 10, 2017

davidnagli closed this as completed Dec 11, 2017

devongovett reopened this Dec 11, 2017

davidnagli added the 💬 RFC Request For Comments label Dec 12, 2017

davidnagli added this to Discussion in RFC via automation Dec 12, 2017

davidnagli moved this from Discussion to Design in RFC Dec 12, 2017

benhutton mentioned this issue Jan 31, 2018

🙋 Use MD5 of File Content As Name To Bust Caches #717

Closed

benhutton mentioned this issue Feb 16, 2018

WIP: Content-hash bundle names #829

Closed

devongovett mentioned this issue Feb 20, 2018

[RFC] File naming strategy #872

Closed

devongovett closed this as completed Mar 21, 2018

RFC automation moved this from Design to Done Mar 21, 2018

bvandorn mentioned this issue May 9, 2018

Dynamically Imported Files Do Not Update Built Entry Hash #1339

Closed

joeosburn mentioned this issue Jun 3, 2020

Add option to enable content hashing when in watch mode #4684

Closed

File hash does not change after its content updates #188

File hash does not change after its content updates #188

Comments

h2jorm commented Dec 10, 2017 • edited Loading

🤔 Expected Behavior

😯 Current Behavior

🔦 Context

🌍 Your Environment

DeMoorJasper commented Dec 10, 2017 • edited Loading

sudhakar commented Dec 11, 2017

devongovett commented Dec 11, 2017

sudhakar commented Dec 11, 2017

jouni-kantola commented Dec 11, 2017 • edited Loading

devongovett commented Dec 11, 2017

davidnagli commented Dec 11, 2017

eXon commented Dec 12, 2017

chee commented Dec 12, 2017 • edited Loading

DeMoorJasper commented Dec 17, 2017 • edited Loading

jouni-kantola commented Dec 17, 2017

chee commented Dec 17, 2017

DeMoorJasper commented Dec 17, 2017

vforvalerio87 commented Dec 20, 2017

shawwn commented Dec 20, 2017 via email

shawwn commented Dec 20, 2017

vforvalerio87 commented Dec 20, 2017 • edited Loading

shawwn commented Dec 20, 2017 via email

jouni-kantola commented Dec 20, 2017

Munter commented Dec 25, 2017

chee commented Dec 26, 2017 • edited Loading

benhutton commented Jan 9, 2018

shanebo commented Jan 9, 2018

shawwn commented Jan 9, 2018 via email

devongovett commented Jan 10, 2018

chee commented Jan 10, 2018 • edited Loading

Munter commented Jan 10, 2018

Munter commented Jan 10, 2018

benhutton commented Jan 10, 2018

chee commented Jan 10, 2018

Munter commented Jan 10, 2018 • edited Loading

DeMoorJasper commented Jan 10, 2018 • edited Loading

benhutton commented Jan 11, 2018 • edited Loading

Munter commented Jan 11, 2018 • edited Loading

fritx commented Feb 9, 2018

augnustin commented Mar 8, 2018 • edited Loading

augnustin commented Mar 15, 2018 • edited Loading

augnustin commented Mar 15, 2018

devongovett commented Mar 21, 2018 • edited Loading

augnustin commented Mar 21, 2018

h2jorm commented Dec 10, 2017 •

edited

Loading

DeMoorJasper commented Dec 10, 2017 •

edited

Loading

jouni-kantola commented Dec 11, 2017 •

edited

Loading

chee commented Dec 12, 2017 •

edited

Loading

DeMoorJasper commented Dec 17, 2017 •

edited

Loading

vforvalerio87 commented Dec 20, 2017 •

edited

Loading

chee commented Dec 26, 2017 •

edited

Loading

chee commented Jan 10, 2018 •

edited

Loading

Munter commented Jan 10, 2018 •

edited

Loading

DeMoorJasper commented Jan 10, 2018 •

edited

Loading

benhutton commented Jan 11, 2018 •

edited

Loading

Munter commented Jan 11, 2018 •

edited

Loading

augnustin commented Mar 8, 2018 •

edited

Loading

augnustin commented Mar 15, 2018 •

edited

Loading

devongovett commented Mar 21, 2018 •

edited

Loading