Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
generate a site with about 60K posts, take forever #560
Tried to migrate a site from drupal to jekyll. Unfortunately no migrate script for drupal 7. So, I generated all individual posts(57882 of them) in _posts folder. Ran jekyll. Took forever and never succeed on a server (RAM: 12GB, CPU: 16 cores)
Is jekyll suitable for such volume of site? What should I do to speed things up?
I am not a programmer, so let me paraphrase what you said and see if I get it. I should write a script to instruct jekyll to generate, e.g 1000 articles, every run. Which translates to action is: move 1000 articles to _posts and run jekyll, then move another 1000 articles in and re-run jekyll? I think for individule posts this works, but not for aggregated pages: archive.html categories.html, etc.
The tricky thing is about categoires and tags. Since it has to collect all posts to know a category has X many posts.
How to go about that?
According to doc here, jekyll can't incrementally generate site:
Jekyll collects data.
Jekyll computes data.
@404pnf, you are right, if you batch process your posts then you lose out on aggregate data for the entire post collection. Unfortunately the decreased performance when handling large quantities of posts is a known issue for Jekyll. However it would seem like you have plenty of processing power on your server. Can you successfully compile say 100 posts? 1000? etc?
Yeah, check to see what the limit is. It may be 10,000. Due to the fact that each post and page is held as a separate object, it requires huge amounts of memory to run 60,000 posts, I'm sure. You may have the capacity though.
It'd be cool to add a
I will follow the advice, go test the limits and report back.
My observation from previous tries told me, the memory it requires in RAM is roughly twice to three times the size of all the posts.
In my case, if I process all the posts,, 247M, in one run, the ram usage shows by top command is 450mb-600mb and stay there. If the size of posts is 41M, then it takes 112M RES RAM.
bench mark testing
All aggreated pages are disabled, yaml frontmatter has only layout and title.
number of posts | elapsed time | RES RAM | VIRT RAM | CPU | load average
2000 | < 10 mins | 317m | 354m | 100% | 1.6 - 3
I think evetually the 30K one will finnish sucessfully.
Huge ram desn't help since the memory usage is predicatable less than three times of the size of all posts.
Multi-core cpu doesn't help since ruby only uses on core.
For such volume, if I don't want any aggregate pages, I would like to have ar generator that read in path of all the files to be converted and do it one by one. But I am not a programmer. Would anyone give a skeleton code example of how to do that in jekyll and I will try to figure out the rest.
In my case, for aggregated pages I can use another generator just for that, since my aggregate pages only need a tiltle and an url of a post (categories, tags, archive and sitemap), with the exception of atom.xml (I don't use it), then the code only need to collect yaml frontmatter and generate 4 pages. I don't know how to do it either. :)
@404pnf you don't need to spawn a server if you are just trying to compile the site. I don't know whether it matters performance-wise though:
$ jekyll --kramdown --no-auto --limit_posts=num
Disabling pages that display aggregate data does not mean that Jekyll does not still compute the aggregate data. It's a consequence of the one-huge-site-object design.
I think it might be possible to hack Jekyll to process a page one at time, with the obvious consequence that there will be no aggregate data.
I currently work on my own static blog generator that might address these bottlenecks. I'd be willing to test your website against my engine. http://ruhoh.com -- I can't promise anything but I can say that ruhoh takes a more functional approach so I can very likely get it to work to process one file at a time. Additionally I might even be able to do a kind of two-stage processing:
I'm much more committed to my project then Jekyll at this point so let me know if you want some help in working on your site.
@plusjade I tried ruhoh, very promising! Especially when working with large volume of posts, ruhoh tells you what it is currently doing. It's comforting to know that information because staring at a blank screen makes you think the program doesn't work.
I do have some questions. I will post it to ruhoh issue.
Here's a case where I ran into this wall.
I've slowly been making moves to collect everything I post online under one (jekyll-managed) roof. As part of this, I exported my entire Twitter history and wrote a quick script to shred tweets into post files. HFS+ and Git handle these ~13,500 new files gracefully; Jekyll doesn't.
I'm not surprised I ran into this wall; I certainly didn't expect Jekyll to perform nicely here. Then again, perhaps an argument can be made that somewhere in the distant future, Jekyll really should handle this scale gracefully?
@davepeck We are working toward this goal, however Jekyll will still have to process all 13000 posts. Jekyll 3 does have an
If you're willing to work on the above items or have further input, we can start a tracking issue for speeding up Jekyll if there isn't on already. Thanks!