Import to jekyll from unzipped posterous archive directory #12

pauldmccarthy · 2013-02-21T22:31:10Z

A very quick hack to import posts from an archived posterous space.

parkr · 2013-02-22T00:31:48Z

Nice!

jroes · 2013-03-04T21:26:57Z

This would be awesome to get merged soon. Posterous shuts its doors at the end of April 2013.

parkr · 2013-03-05T13:22:45Z

@jroes Does this code work for you?

… subdirectory

pauldmccarthy · 2013-03-05T19:30:15Z

I just fixed an issue which caused the script to crash upon attempting to parse a non-post html file. Hopefully you hadn't got around to testing the old version yet!

…uctions.

parkr · 2013-03-05T20:51:58Z

@pauldmccarthy Awesome! Would you please add some Test::Unit specs for this? We're trying to ensure that these are tested :) Feel free to write as many methods as you wish; anything which will give us a good grade on Code Climate!

pauldmccarthy · 2013-03-05T20:56:17Z

@parkr No problem - I'm going to be travelling for the next week or so though; I'll try to squeeze in some time somewhere.

…existing posterous importer. Modularised posterous archive importer to allow for easier testing. Added a couple of unit tests for posterous archive importer.

jroes · 2013-03-07T05:05:10Z

lib/jekyll/importers/posterous-archive.rb

+
+        post["title"]  = page.css("div.post_header h3").text
+        post["date"]   = page.css("div.post_info span.post_time").text
+        post["body"]   = page.css("div.post_body").text


Needs #inner_html here to preserve existing HTML formatting in the post.

jroes · 2013-03-07T05:05:26Z

Worked for me other than my comment!

parkr · 2013-03-07T12:07:43Z

👍 thanks for the confirmation @jroes!

…erous html files

zzamboni · 2013-03-14T06:44:51Z

This is very nice, thank you for writing it. I've been making some changes to extract more information (tags, etc.). I'm thinking it might be more reliable to work off the XML files that are also included in the Posterous backup, instead of the HTML files, since they contain more information and also the source of each post in HTML. Did you play with this as well? I'll submit a pull request when I'm done with my version.

pauldmccarthy · 2013-03-17T08:26:33Z

Hey @zzamboni, thanks for the feedback! The reason I didn't use the XML files was because, as far as I could tell, the image links contained within them linked to the posterous website, whereas the HTML files contain links to the locally stored images (i.e. those downloaded as part of the archive). Maybe a combination of both is needed?

zzamboni · 2013-03-17T21:22:06Z

True - I've been working on and off the last couple of days on an updated
version (based on your script) that uses the XML source, and gets some
additional info from it like the original link. Once I get it working ill
post a pull request.

nitrogenlogic · 2013-03-19T22:46:54Z

I don't know if this will be useful to your efforts, but I wrote my own script to import my posts into Octopress, complete with fixed-up links, images, and videos (encoded videos are downloaded from Posterous instead of using the high bitrate originals): https://gist.github.com/nitrogenlogic/5200766

A bit of manual work still has to be done before and after with my script, such as gathering the XML files and minifying images. To generate the single XML file my script would need (since wordpress_export_1.xml has replaced all UTF-8 characters with question marks), I would do something like this:

cd /path/to/space-[numbers, name, etc.]
cat head.xml posts/*.xml > fixed_export.xml
echo '</channel></rss>' >> fixed_export.xml
cd /path/to/new/blog
./posterous_import.rb /path/to/space-[numbers, name, etc.]/fixed_export.xml

zzamboni · 2013-03-24T05:25:42Z

nitrogenlogic: thank you so much for this! It works perfectly. I've been making a few small changes which I will contribute back when I'm done.

Import to jekyll from unzipped posterous archive directory

nitrogenlogic · 2013-03-24T20:34:55Z

I'm glad the script was useful to you, zzamboni.

zzamboni · 2013-03-25T21:55:45Z

parkr: please consider integrating some of the code from nitrogenlogic's script to which he linked above. I found it very feature-complete and easy to use.

parkr · 2013-03-26T11:19:55Z

@zzamboni I'd be happy to! Mind throwing together a PR with the bits you'd like to see come over into this project?

parkr · 2013-04-26T18:58:53Z

Would either of you (@zzamboni or @nitrogenlogic) mind throwing together a PR for the enhancements you made, @nitrogenlogic? :)

import to jekyll from unzipped posterous archive directory

ed6199d

pauldmccarthy mentioned this pull request Feb 21, 2013

Add the optional ability to include images in a posterous migration. #5

Merged

small fix - restrict post file search to the posterous archive posts/…

0f67069

… subdirectory

Added a comment block to posterous-archive.rb, containing usage instr…

faf37e5

…uctions.

Changed posterous archive importer class name to avoid conflict with …

f6bf049

…existing posterous importer. Modularised posterous archive importer to allow for easier testing. Added a couple of unit tests for posterous archive importer.

jroes reviewed Mar 7, 2013
View reviewed changes

HTML formatting is preserved when post content is extracted from post…

35219fb

…erous html files

parkr added a commit that referenced this pull request Mar 24, 2013

Merge pull request #12 from pauldmccarthy/initial-migrator-import

76ac0f5

Import to jekyll from unzipped posterous archive directory

parkr merged commit 76ac0f5 into jekyll:initial-migrator-import Mar 24, 2013

parkr added a commit that referenced this pull request Mar 24, 2013

Update history to reflect merge of #12.

edb77b3

jekyll locked and limited conversation to collaborators Feb 27, 2017

jekyllbot added the frozen-due-to-age label Feb 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import to jekyll from unzipped posterous archive directory #12

Import to jekyll from unzipped posterous archive directory #12

pauldmccarthy commented Feb 21, 2013

parkr commented Feb 22, 2013

jroes commented Mar 4, 2013

parkr commented Mar 5, 2013

pauldmccarthy commented Mar 5, 2013

parkr commented Mar 5, 2013

pauldmccarthy commented Mar 5, 2013

jroes Mar 7, 2013

jroes commented Mar 7, 2013

parkr commented Mar 7, 2013

zzamboni commented Mar 14, 2013

pauldmccarthy commented Mar 17, 2013

zzamboni commented Mar 17, 2013

nitrogenlogic commented Mar 19, 2013

zzamboni commented Mar 24, 2013

nitrogenlogic commented Mar 24, 2013

zzamboni commented Mar 25, 2013

parkr commented Mar 26, 2013

parkr commented Apr 26, 2013

Import to jekyll from unzipped posterous archive directory #12

Import to jekyll from unzipped posterous archive directory #12

Conversation

pauldmccarthy commented Feb 21, 2013

parkr commented Feb 22, 2013

jroes commented Mar 4, 2013

parkr commented Mar 5, 2013

pauldmccarthy commented Mar 5, 2013

parkr commented Mar 5, 2013

pauldmccarthy commented Mar 5, 2013

jroes Mar 7, 2013

Choose a reason for hiding this comment

jroes commented Mar 7, 2013

parkr commented Mar 7, 2013

zzamboni commented Mar 14, 2013

pauldmccarthy commented Mar 17, 2013

zzamboni commented Mar 17, 2013

nitrogenlogic commented Mar 19, 2013

zzamboni commented Mar 24, 2013

nitrogenlogic commented Mar 24, 2013

zzamboni commented Mar 25, 2013

parkr commented Mar 26, 2013

parkr commented Apr 26, 2013