Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Generating static sites on-demand with Nuxt.js #2370

Closed
jameswragg opened this issue Dec 11, 2017 · 14 comments
Closed

[Question] Generating static sites on-demand with Nuxt.js #2370

jameswragg opened this issue Dec 11, 2017 · 14 comments
Labels

Comments

@jameswragg
Copy link

jameswragg commented Dec 11, 2017

Hi, I'm trying to create a static-site on-demand API using hapi.js to power the API & nuxt.js as the static-site generator but hitting some blockers….

Here's my setup:

  1. The API takes a POST payload
  2. Based on the payload the API dynamically creates the generate.routes array (passing payloads in for each route)
  3. Calls generate() to create the static site in ./dist

Issue:

This API is potentially going to be hammered - building a lot of individual content pages and pushing them to S3 buckets. All generated sites going to the ./dist directory is a problem - I currently have an isGenerating flag to prevent multiple generate calls, forcing the client to re-try the API.

  1. Is using Nuxt as a static-site generator on-demand a ridiculous idea?
  2. If (hopefully) not, are there ways of managing the output better. Potentially streaming straight to an S3 bucket?
  3. Should the generator be able to handle parallel builds?

Thanks in advance,
James

This question is available on Nuxt.js community (#c2062)
@pimlie
Copy link

pimlie commented Dec 11, 2017

Could you explain your use-case a bit more? How many pages / routes are you talking about? What do you use the isGenerating flag for? Why is everything in ./dist a problem?

In my experience its not really useful to use payloads when generating 1.000's of pages. The required memory overhead is too much compared to the overhead of an additional axios request in async-data and generating 1000's of payloads at once just takes too long. Its better to spread the load on the server a bit by just using requests from async-data. Probably this is directly related to the complexity and speed of your payload contents, api and/or api server

My use-case is as follows: I have a website with say ~12.000 product pages. The backend runs on two servers, one for the api and one for nuxt. Both servers are identical in specs, in my case generating a page with nuxt takes more or less the same time as generating the product data so I can use this 1:1 ratio. The ratio you need will probably differ, depending on the complexity of your nuxt pages and api
The prices of these products could update every hour. So I pass the last finished timestamp parameter from nuxt-generate-cluster to my routes api endpoint to list the routes for products that have been updated since the last time nuxt-generate-cluster ran. Sometimes this is just a couple of products and sometimes this are 1.000's.
Nuxt-generate-cluster is started with a cronjob every hour. Generating all ~12.000 pages takes about 20min so I dont have to worry about running two nuxt-generate-cluster instances at the same time (which could give unexpected results).

My website is not hosted on S3, I just have a small lightweight vps. So not sure whether my use-case is helpful for you but after nuxt-generate-cluster has finished I run a script that checks if the chunkhash has changed, if it wasnt I directly rsync to the remote pub folder on the vps (as rsync is atomic). If the chunkhash was changed I rsync to a tmp pub folder on my vps and use a mv command to replace the old pub folder with the new one.
In my case the rsync is finished within 5min or so, as 20min+5min is well below the 1hour mark (the interval the cronjob runs) this is ok without further checks to prevent multiple instances running at the same time. You need to check for your situation what works well

If you want to 'stream directly to S3' after a page has generated you would need to copy the bin/nuxt-generate script and add a hook on generate:page or generate:routeCreated. Then you can upload the generated page immediately to S3 within that hook.

Hope this helps.

@jameswragg
Copy link
Author

jameswragg commented Dec 11, 2017

This is incredibly helpful, thanks.

Could you explain your use-case a bit more?

It's quite a different setup to you in that I'm building a site builder. The output being a single page or multi-page site constructed on the fly from the inbound API payload. The output in ./dist is the stand-alone site 'package' that can is then uploaded anywhere (e.g. S3 / gh-pages).

The setup I have is a single Nuxt page that I'd like as a catch-all for all inbound routes. As the routes are defined and data needed to render the page is provided to .generate() from the API's payload, so the asyncData in the vue app simply returns the payload the route received.

Can I have a single Nuxt page as a catch-all to render all incoming routes?

How many pages / routes are you talking about?

I would imagine anywhere from 1 - 10 pages per site.

What do you use the isGenerating flag for? Why is everything in ./dist a problem?

It may be a little clearer now why I have the isGenerating flag as I'm unable to have multiple nuxt-generate's running in the same output dir as I need to package up the output and upload to S3 automatically.

Hopefully that's a little clearer as to what I'm trying to do @pimlie ?

@pimlie
Copy link

pimlie commented Dec 11, 2017

You can use the generate.dir property to change the default dist output folder. If you change that prop during run-time nuxt-generate should work for multiple jobs at the same time (assuming you dont build nuxt each time).

@jameswragg
Copy link
Author

Interesting. If I update in the nuxt.config.js then the output changes, however setting nuxt.options.generate.dir dynamically just before calling Generator.generate({ build: false }) doesn't seem to take affect.

@pimlie
Copy link

pimlie commented Dec 11, 2017

Hmm yeah, internally the generator uses this.distPath. See https://github.com/nuxt/nuxt.js/blob/dev/lib/builder/generator.js#L16

@jameswragg
Copy link
Author

Ahh thanks. Now I'm creating a new Builder & Generator instance with each site request, they end up in different directories (as specified in the distPath) but now I have context issues - they're all the same content.

If I try to generate 3 different pages in parallel (I'm using a simple bash script with 3 curls posting different data) then the three output sets in my ./dist directory have the same generated content, even though different content was passed in the route payloads.

Is this a Nuxt generate() concurrency issue, or my code? I'm guessing a test repo is needed, I'll see if I can pull together tomorrow.

@jameswragg
Copy link
Author

I've created a reduced test-case that demonstrates the issue here:
https://github.com/jameswragg/nuxt-generate-parallel

Any help would be gratefully received.

@jameswragg
Copy link
Author

A colleague just pointed out that the require on the nuxt.config.js is a single reference. I changed the nuxt.config.js to return a function & all good! Have updated the repo for reference.

Thanks for the pointers @pimlie

@peckomingo
Copy link

I did some research on the subject as well and stumbled over your posts @pimlie , this one and this.

We have to solve a similar challenge at the moment and I am trying to generate up to 50k product pages from a product catalog as static pages. We want to fetch the data from a headless CMS and pass it to Nuxt which should generate the pages. It is not a shop, so no complexity in this regards.

You already described your approach quite well in the post above, I can't follow it trough completely though, but before I put more effort in this I wanted to ask you... What is your conclusion and would you do it again in a similar way?

  • As I understand it, you tackled the performance issue with nuxt-generate-cluster, right?
  • How did you setup your dynamic routes for the 12k pages? We also have to solve i18n, implementing it would be possible by passing a array of the routes togenerate. But does it swallow an array of 50k routes without hicups?
  • What will happen if two content editors edit the product catalog in between a short period? If the generation process is already ongoing and another editor publishes an update. There would be a mismatch of the topicality of the datasets.

I really appreaciate a few words from you about your experience about this, now some time after you implemented it.

Cheers and thank you! 🙏

@qm3ster
Copy link

qm3ster commented Mar 5, 2018

@jameswragg payload may still be useful if instead of loading all of the data in there upfront you make the payload a function.

@pimlie
Copy link

pimlie commented Mar 8, 2018

@jameswragg

  • As long as we are speaking about the process of generating 10k's pages, then nuxt-generate-cluster allows you to utilise all the cpu cores in your system so generating those pages will scale almost linear (as long as your api can keep up). To be clear, nuxt-generate-cluster does not improve performance when generating a single page.
  • I just load all the routes in memory, currently it are about 20k or something. Its only a couple of megabytes at the most, so shouldnt be that hard for even a not so decent server. Now talking about pre-loading the payload for all those pages in memory, that was a no-go. Actually nodejs just hang, it couldnt cope with the amount of data required.
  • If that is a problem, then its up to you to implement some sort of locking I guess when requesting the routes through the routes method (and release the lock through the done method). Or implement some kind of versioning, nuxt-generate-cluster should provide you the tools to do that. But as the implementation should be done at your api it cant help you further.

Or what you could do is implement an iterative approach. Eg make sure that your routes api endpoint returns not only returns routes of changed products but also all the related pages which where (possibly) influenced by that. Then after the first iteration you request all the routes that where (possibly) changed since nuxt-generate-cluster last started and keep doing that until routes returns 0 routes.

In my setup this is not really an issue, my product pages dont refer to each other. So its mostly category pages referring to product pages and if one product changes I just re-generate that page and all the category pages (eg cat A page 1, cat A page 2 etc). In other words, I dont bother actually with generating the pages as a single transaction. My data changes all the time, so I just generate the pages every 1 or 2 hours and accept that my data can be stale for max 2 hours.

Thinking about it, you could pre-load all the payload data, save it to disk and then implement a payload method like qm3ster suggested which returns the payload data from disk. This would solve your single transaction issue, but I found that (at least in my situation) this wouldnt give any performance benefit due to my api being quite heavy and it was better to just spread the load evenly over time.

I have been looking into a project to implement a push strategy for generating pages (nuxt-generate-cluster pulls from the api, this project should accept pushes from the api). That way you could implement a real continuous deployment with static files which are generated on demand. Every time your product data changes a new page for that product should be automatically generated at the same time. But this project will probably take a while to implement and mature...

@qm3ster But is there really a performance benefit to that approach (from the nuxt point of view)? Whats the difference of calling a payload method with regards to asyncData?

@pimlie
Copy link

pimlie commented Mar 8, 2018

Sorry, that should be @peckomingo :)

@qm3ster
Copy link

qm3ster commented Aug 25, 2018

@pimlie the purpose of the payload api assumes that you can get a bunch of data all at the same time.
Either you are getting it through a back door (directly via js, without making a loopback call to the server)
Or you are including into the payload some data that can be common to many pages, so they don't individually get it.

module.exports = {
  generate: {
    async routes() {
      const [
        seeds,
        sharedData
      ] = await Promise.all([
        getSeeds(),
        getSharedData()
      ])
      return seeds.map(seed => ({
        route: '/bepis/' + seed,
        payload: {
          data: sharedData[seed],
          backdoor: () => backdoorFunction(seed)
        }
      }))
    }
  }
}

@lock
Copy link

lock bot commented Nov 1, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Nov 1, 2018
@danielroe danielroe added the 2.x label Jan 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants