Join GitHub today
Import to jekyll from unzipped posterous archive directory #12
referenced this pull request
Feb 21, 2013
This is very nice, thank you for writing it. I've been making some changes to extract more information (tags, etc.). I'm thinking it might be more reliable to work off the XML files that are also included in the Posterous backup, instead of the HTML files, since they contain more information and also the source of each post in HTML. Did you play with this as well? I'll submit a pull request when I'm done with my version.
Hey @zzamboni, thanks for the feedback! The reason I didn't use the XML files was because, as far as I could tell, the image links contained within them linked to the posterous website, whereas the HTML files contain links to the locally stored images (i.e. those downloaded as part of the archive). Maybe a combination of both is needed?
I don't know if this will be useful to your efforts, but I wrote my own script to import my posts into Octopress, complete with fixed-up links, images, and videos (encoded videos are downloaded from Posterous instead of using the high bitrate originals): https://gist.github.com/nitrogenlogic/5200766
A bit of manual work still has to be done before and after with my script, such as gathering the XML files and minifying images. To generate the single XML file my script would need (since wordpress_export_1.xml has replaced all UTF-8 characters with question marks), I would do something like this:
cd /path/to/space-[numbers, name, etc.] cat head.xml posts/*.xml > fixed_export.xml echo '</channel></rss>' >> fixed_export.xml cd /path/to/new/blog ./posterous_import.rb /path/to/space-[numbers, name, etc.]/fixed_export.xml