Fetch Stack Exchange posts for publishing on a Jekyll-powered blog.
Usage: ./se2jekyll.rb -s SITE post_id ... -s, --site SITE Site name -t, --tags TAG(S) Space-delimited lowercase tags -h, --help Display this screen
se2jekyll.rb Meta.Puzzling 3020 > site_evaluations.md
All posts to Stack Exchange acquire a
Creative Commons license
that allows republication with
script uses the
Stack Exchange API to get a copy
of a post for publication via Jekyll
with the primary benefit being added
front matter. It's intended
to be mostly automatic, but you might want to put it in the
folder for revision before publishing.
The API site parameter can be extracted from the API itself or you can make a guess based on URL of the site. For instance, Movies & TV's URL is
http://movies.stackexchange.com/so its site parameter is just
movies. The meta site is
Meta.Movies. Note that capitalization does not matter to the API, but the string will be used in the attribution text as-is.
Every question and answer has a unique (to the site) ID. The second parameter* is that number, which may be found in the URL or by examining the share link at the bottom of a post. You can find your post_ids by your display name or via the API.
If you want to customize tags, this is the option for you. Pass it any number of space-delimted strings like this:
se2jekyll.rb -s stackoverflow 55885729 -t "libcurl curl locked"
If all goes well, a converted version of the post will be sent to
STDOUT. It will included some Jekyll front matter (tuned to
my blog's configuration),
a short attribution notice, and the body of the post. It's your
responsibility to redirect the output to an appropriate file.
Depending on which Markdown renderer you use, you might find some strangenesses in the HTML output. For instance, Stack Exchange parses two block quotes separated by a blank line as a single block and GitHub Flavored Markdown as two blocks. Many other quirks won't matter too much, but that one is pretty visible to me. The moral of the story is to leave room for edits.
I've tried to fill in sensible values to the front matter. A few quirks to note:
I use the question title as the post title, which is often a reasonable choice. Not everyone has the titling skill, however.
If a title includes a
#, I encode it
#since that usually begins a comment in the YAML fron matter block.
I also titles with multiple colons as
&colonfor a similar reason. But I interpret titles with one colon as being a title and a subtitle. This might not work for your blog, but it works for mine. I suspect this should be optional behavior if anyone besides me uses this script.
Currently, the default tag is
meta-posttag, which works for my blog's tagging system but might not for yours.
The date is set to the creation date of the post on Stack Exchange, not the current date.
I include two custom variables:
These should not cause problems unless your blog layout has conflicting definitions of these variables.
Feel free to fork my code if these choices don't make sense for your purposes.
- Not really a bug of this script but rather of, um, the current paradigm of font usage on the web: some pages might not render properly if your site doesn't set use a font family with the needed glyphs. See The Tony the Pony problem.
There are several things we could look up with an extra API call or two. In particular:
- Question tags
- Actual site name
- List of editors to credit and not just the owner
Obtain multiple posts based on some criteria such as author.
Maybe make use of OAuth identification somehow.
There's a Ruby library for the Stack Exchange API. Should I use it?
* Technically, you can pass multiple
post_ids. Currently, only
one is really supported since the output is sent to
rather than individual files. It's not terrible hard to break
the posts out based on front matter, however, so I left this as