-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import to jekyll from unzipped posterous archive directory #12
Import to jekyll from unzipped posterous archive directory #12
Conversation
Nice! |
This would be awesome to get merged soon. Posterous shuts its doors at the end of April 2013. |
@jroes Does this code work for you? |
I just fixed an issue which caused the script to crash upon attempting to parse a non-post html file. Hopefully you hadn't got around to testing the old version yet! |
@pauldmccarthy Awesome! Would you please add some Test::Unit specs for this? We're trying to ensure that these are tested :) Feel free to write as many methods as you wish; anything which will give us a good grade on Code Climate! |
@parkr No problem - I'm going to be travelling for the next week or so though; I'll try to squeeze in some time somewhere. |
…existing posterous importer. Modularised posterous archive importer to allow for easier testing. Added a couple of unit tests for posterous archive importer.
|
||
post["title"] = page.css("div.post_header h3").text | ||
post["date"] = page.css("div.post_info span.post_time").text | ||
post["body"] = page.css("div.post_body").text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs #inner_html
here to preserve existing HTML formatting in the post.
Worked for me other than my comment! |
👍 thanks for the confirmation @jroes! |
This is very nice, thank you for writing it. I've been making some changes to extract more information (tags, etc.). I'm thinking it might be more reliable to work off the XML files that are also included in the Posterous backup, instead of the HTML files, since they contain more information and also the source of each post in HTML. Did you play with this as well? I'll submit a pull request when I'm done with my version. |
Hey @zzamboni, thanks for the feedback! The reason I didn't use the XML files was because, as far as I could tell, the image links contained within them linked to the posterous website, whereas the HTML files contain links to the locally stored images (i.e. those downloaded as part of the archive). Maybe a combination of both is needed? |
True - I've been working on and off the last couple of days on an updated |
I don't know if this will be useful to your efforts, but I wrote my own script to import my posts into Octopress, complete with fixed-up links, images, and videos (encoded videos are downloaded from Posterous instead of using the high bitrate originals): https://gist.github.com/nitrogenlogic/5200766 A bit of manual work still has to be done before and after with my script, such as gathering the XML files and minifying images. To generate the single XML file my script would need (since wordpress_export_1.xml has replaced all UTF-8 characters with question marks), I would do something like this: cd /path/to/space-[numbers, name, etc.]
cat head.xml posts/*.xml > fixed_export.xml
echo '</channel></rss>' >> fixed_export.xml
cd /path/to/new/blog
./posterous_import.rb /path/to/space-[numbers, name, etc.]/fixed_export.xml |
nitrogenlogic: thank you so much for this! It works perfectly. I've been making a few small changes which I will contribute back when I'm done. |
Import to jekyll from unzipped posterous archive directory
I'm glad the script was useful to you, zzamboni. |
parkr: please consider integrating some of the code from nitrogenlogic's script to which he linked above. I found it very feature-complete and easy to use. |
@zzamboni I'd be happy to! Mind throwing together a PR with the bits you'd like to see come over into this project? |
Would either of you (@zzamboni or @nitrogenlogic) mind throwing together a PR for the enhancements you made, @nitrogenlogic? :) |
A very quick hack to import posts from an archived posterous space.