Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import to jekyll from unzipped posterous archive directory #12

Merged
merged 5 commits into from
Mar 24, 2013
Merged

Import to jekyll from unzipped posterous archive directory #12

merged 5 commits into from
Mar 24, 2013

Conversation

pauldmccarthy
Copy link

A very quick hack to import posts from an archived posterous space.

@parkr
Copy link
Member

parkr commented Feb 22, 2013

Nice!

@jroes
Copy link

jroes commented Mar 4, 2013

This would be awesome to get merged soon. Posterous shuts its doors at the end of April 2013.

@parkr
Copy link
Member

parkr commented Mar 5, 2013

@jroes Does this code work for you?

@pauldmccarthy
Copy link
Author

I just fixed an issue which caused the script to crash upon attempting to parse a non-post html file. Hopefully you hadn't got around to testing the old version yet!

@parkr
Copy link
Member

parkr commented Mar 5, 2013

@pauldmccarthy Awesome! Would you please add some Test::Unit specs for this? We're trying to ensure that these are tested :) Feel free to write as many methods as you wish; anything which will give us a good grade on Code Climate!

@pauldmccarthy
Copy link
Author

@parkr No problem - I'm going to be travelling for the next week or so though; I'll try to squeeze in some time somewhere.

…existing posterous importer. Modularised posterous archive importer to allow for easier testing. Added a couple of unit tests for posterous archive importer.

post["title"] = page.css("div.post_header h3").text
post["date"] = page.css("div.post_info span.post_time").text
post["body"] = page.css("div.post_body").text
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs #inner_html here to preserve existing HTML formatting in the post.

@jroes
Copy link

jroes commented Mar 7, 2013

Worked for me other than my comment!

@parkr
Copy link
Member

parkr commented Mar 7, 2013

👍 thanks for the confirmation @jroes!

@zzamboni
Copy link

This is very nice, thank you for writing it. I've been making some changes to extract more information (tags, etc.). I'm thinking it might be more reliable to work off the XML files that are also included in the Posterous backup, instead of the HTML files, since they contain more information and also the source of each post in HTML. Did you play with this as well? I'll submit a pull request when I'm done with my version.

@pauldmccarthy
Copy link
Author

Hey @zzamboni, thanks for the feedback! The reason I didn't use the XML files was because, as far as I could tell, the image links contained within them linked to the posterous website, whereas the HTML files contain links to the locally stored images (i.e. those downloaded as part of the archive). Maybe a combination of both is needed?

@zzamboni
Copy link

True - I've been working on and off the last couple of days on an updated
version (based on your script) that uses the XML source, and gets some
additional info from it like the original link. Once I get it working ill
post a pull request.

@nitrogenlogic
Copy link

I don't know if this will be useful to your efforts, but I wrote my own script to import my posts into Octopress, complete with fixed-up links, images, and videos (encoded videos are downloaded from Posterous instead of using the high bitrate originals): https://gist.github.com/nitrogenlogic/5200766

A bit of manual work still has to be done before and after with my script, such as gathering the XML files and minifying images. To generate the single XML file my script would need (since wordpress_export_1.xml has replaced all UTF-8 characters with question marks), I would do something like this:

cd /path/to/space-[numbers, name, etc.]
cat head.xml posts/*.xml > fixed_export.xml
echo '</channel></rss>' >> fixed_export.xml
cd /path/to/new/blog
./posterous_import.rb /path/to/space-[numbers, name, etc.]/fixed_export.xml

@zzamboni
Copy link

nitrogenlogic: thank you so much for this! It works perfectly. I've been making a few small changes which I will contribute back when I'm done.

parkr added a commit that referenced this pull request Mar 24, 2013
Import to jekyll from unzipped posterous archive directory
@parkr parkr merged commit 76ac0f5 into jekyll:initial-migrator-import Mar 24, 2013
parkr added a commit that referenced this pull request Mar 24, 2013
@nitrogenlogic
Copy link

I'm glad the script was useful to you, zzamboni.

@zzamboni
Copy link

parkr: please consider integrating some of the code from nitrogenlogic's script to which he linked above. I found it very feature-complete and easy to use.

@parkr
Copy link
Member

parkr commented Mar 26, 2013

@zzamboni I'd be happy to! Mind throwing together a PR with the bits you'd like to see come over into this project?

@parkr
Copy link
Member

parkr commented Apr 26, 2013

Would either of you (@zzamboni or @nitrogenlogic) mind throwing together a PR for the enhancements you made, @nitrogenlogic? :)

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants