I changed the way Middleman keeps track of the resources it has yet to built. Instead of initially creating the list of files to built from the sitemap, I implemented a queue that updates itself during the build process. This allows resources to inject other resources into the sitemap dynamically.
The idea originated from a helper I wrote that dynamically generates thumbnail images using RMagick and adds these images to the sitemap. This works fine in development. The build process, however, ignores the generated thumbnails since they have been added to the sitemap later on when the list of resources to render has already been determined.
This pull request does not change any of the existing build behavior. It merely adds the possibility of having resources that generate other resources on-demand.
Adds a build queue that keeps track of the resources that have yet to…
… be built including those resources that have been added dynamically during runtime
Adds a test to ensure that a page that was added to the sitemap after…
… the build process had already started gets rendered eventually
Good idea, this may help with Compass' sprites too.
Won't be able to add such a change to the stable release (3.0.x), but we'll pull this in for 3.1.x
I'm a little confused - could you share your plugin? If your plugin adds resources to the sitemap via a manupulate_resource_list plugin, you shouldn't need this.
@t6d do you have the source of your plugin up anywhere?
@bhollis Sorry, the source is not available anywhere. I doubt that manipulate_resource_list would make a difference. The sitemap is not the problem here. Updating the sitemap works just fine. This even works during compilation. The problem is that even though the sitemap gets updated it doesn't make a difference. This is due to the fact that the list of resources to build is generated in advance and not updated during runtime.
I'll again try to explain what my code does: My code injects thumbnails to the sitemap on demand. Meaning whenever there is a call to thumbnail_image('/path/to/image') in one of my pages, a new resource is created with the appropriate image data and injected into the sitemap. This works just fine and as expected in development. The problem is the following code snippet:
# taken from middleman-core/lib/middleman-core/cli/build.rb
resources = @app.sitemap.resources.sort_by do |r|
sort_order.index(r.ext) || 100
resources.each do |resource|
# building the resource (code omitted)
Here, you generate the list of resources and then build each of these resources. If one of these resources now generates another resource, e.g. a thumbnail image, then the sitemap gets updated, but the list of resources does obviously not get updated because it is only generated once in advance.
Ah, now I understand. That makes perfect sense. I'd support merging this, then.
So if the sitemap doesn't rebuild as these new items are added, that means they can't be effected by other plugins/code, which seems weird. Adding an html page wouldn't run directory_indexes on it.
Seems like we actually need some kind of "preflight" mode. We already try to do this for Compass, which generates sprite assets during build. This gets nasty and recursive fast.
@tdreyno Again, the sitemap itself is not the problem here. The problem is that the resource I'm creating is added to the sitemap when evaluating the template. This apparently happens after the list of sites to build has already been determined. As I understand it, the process works the following way:
(1) Generate sitemap --> (2) Generate list of sites to build --> (3) Evaluate templates and write results to disk
Now, as long as we are in Step 1 modifications to the sitemap made by plugins will have effect. If content gets added to the sitemap in Step 3, however, this will have no effect as the list of resources to build has already been determined. All my implementation does is checking for changes made to the sitemap in Step 3. If such a change has occurred the list of resources to build is updated. The BuildQueue simply keeps track of resources that have been built and those that have yet to be processed by their URL – which should be suffices as unique identifier.
The problem as described here, never occurs in development. Only in production. I would assume that this is due to the fact that in development the content is directly served from the sitemap. Meaning the moment I update the sitemap the content is available to the client. Considering how a web pages is served this makes perfectly sense. For instance, if I have a page that contains a thumbnail which is dynamically generated, the resulting HTML sent back to the client will contain a reference to this thumbnail. Before sending back any HTML though the sitemap will get updated. Meaning, all subsequent requests will have access to this updated sitemap. Now, when the client evaluates the HTML, it will stumble across the reference to the thumbnail and retrieve the data from the server which of course will work since the data is actually contained in the sitemap. Meaning, all is good. This however does not work if you generate the list of resources to build in advance.
I don't think you need a "preflight" mode. All you need is to keep track of what resources need to be built and what resources already have been built. This is what my implementation actually does.
To quickly sum it up: The sitemap is not the problem. The way you keep track of sites yet to build is.
I really like the build queue idea, it'll give us a nice place to optimize copying binary files and to control the build process outside of the CLI. I'm just concerned about the API exposed and the expectations that it implies.
I'd like to merge this as a nice improvement to internal structure without deciding on whether templates should be dynamically generating assets to build yet.
I was recently stung by the same issue this PR attempts to solve. While this does solve the problem, it comes at a pretty heavy-- O(n^3) I think -- performance hit since it has to iterate over multiple arrays for each iteration of the resource list to discover additions.
One simple alternative would be to delay this resource discovery until the end of each sweep through the resources array. Ideally, it seems like the better solution would be some refinement to the resource list manipulators infrastructure to specifically handle the case of adding resources (whether at the beginning of the build process, or at render time). Something like a Middleman::Sitemap::Store#add which could both add the resource to the sitemap & also push it into the build queue if one exists.
I'd like to propose this simple solution to this problem instead of moving to the full-on build queue (at least for the time being): btucker@5e9ff87
Would this be accepted into 3.0-stable?
@btucker, I like your patch, though I'd switch to a Set instead of just an Array. Could you set this up as a new PR with tests?
I'd still like to pull @btucker's code in with some slight improvements (using Set).
@bhollis I'd like to see that happen too!
Closing this, hopefully @btucker will have some time to submit the PR. We'll get it in first thing to 4.0