New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import to jekyll from unzipped posterous archive directory #12

Merged
merged 5 commits into from Mar 24, 2013

Conversation

Projects
None yet
6 participants
@pauldmccarthy

pauldmccarthy commented Feb 21, 2013

A very quick hack to import posts from an archived posterous space.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Feb 22, 2013

Member

Nice!

Member

parkr commented Feb 22, 2013

Nice!

@jroes

This comment has been minimized.

Show comment
Hide comment
@jroes

jroes Mar 4, 2013

This would be awesome to get merged soon. Posterous shuts its doors at the end of April 2013.

jroes commented Mar 4, 2013

This would be awesome to get merged soon. Posterous shuts its doors at the end of April 2013.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 5, 2013

Member

@jroes Does this code work for you?

Member

parkr commented Mar 5, 2013

@jroes Does this code work for you?

@pauldmccarthy

This comment has been minimized.

Show comment
Hide comment
@pauldmccarthy

pauldmccarthy Mar 5, 2013

I just fixed an issue which caused the script to crash upon attempting to parse a non-post html file. Hopefully you hadn't got around to testing the old version yet!

pauldmccarthy commented Mar 5, 2013

I just fixed an issue which caused the script to crash upon attempting to parse a non-post html file. Hopefully you hadn't got around to testing the old version yet!

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 5, 2013

Member

@pauldmccarthy Awesome! Would you please add some Test::Unit specs for this? We're trying to ensure that these are tested :) Feel free to write as many methods as you wish; anything which will give us a good grade on Code Climate!

Member

parkr commented Mar 5, 2013

@pauldmccarthy Awesome! Would you please add some Test::Unit specs for this? We're trying to ensure that these are tested :) Feel free to write as many methods as you wish; anything which will give us a good grade on Code Climate!

@pauldmccarthy

This comment has been minimized.

Show comment
Hide comment
@pauldmccarthy

pauldmccarthy Mar 5, 2013

@parkr No problem - I'm going to be travelling for the next week or so though; I'll try to squeeze in some time somewhere.

pauldmccarthy commented Mar 5, 2013

@parkr No problem - I'm going to be travelling for the next week or so though; I'll try to squeeze in some time somewhere.

Changed posterous archive importer class name to avoid conflict with …
…existing posterous importer. Modularised posterous archive importer to allow for easier testing. Added a couple of unit tests for posterous archive importer.
@jroes

View changes

Show outdated Hide outdated lib/jekyll/importers/posterous-archive.rb
@jroes

This comment has been minimized.

Show comment
Hide comment
@jroes

jroes Mar 7, 2013

Worked for me other than my comment!

jroes commented Mar 7, 2013

Worked for me other than my comment!

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 7, 2013

Member

👍 thanks for the confirmation @jroes!

Member

parkr commented Mar 7, 2013

👍 thanks for the confirmation @jroes!

@zzamboni

This comment has been minimized.

Show comment
Hide comment
@zzamboni

zzamboni Mar 14, 2013

This is very nice, thank you for writing it. I've been making some changes to extract more information (tags, etc.). I'm thinking it might be more reliable to work off the XML files that are also included in the Posterous backup, instead of the HTML files, since they contain more information and also the source of each post in HTML. Did you play with this as well? I'll submit a pull request when I'm done with my version.

zzamboni commented Mar 14, 2013

This is very nice, thank you for writing it. I've been making some changes to extract more information (tags, etc.). I'm thinking it might be more reliable to work off the XML files that are also included in the Posterous backup, instead of the HTML files, since they contain more information and also the source of each post in HTML. Did you play with this as well? I'll submit a pull request when I'm done with my version.

@pauldmccarthy

This comment has been minimized.

Show comment
Hide comment
@pauldmccarthy

pauldmccarthy Mar 17, 2013

Hey @zzamboni, thanks for the feedback! The reason I didn't use the XML files was because, as far as I could tell, the image links contained within them linked to the posterous website, whereas the HTML files contain links to the locally stored images (i.e. those downloaded as part of the archive). Maybe a combination of both is needed?

pauldmccarthy commented Mar 17, 2013

Hey @zzamboni, thanks for the feedback! The reason I didn't use the XML files was because, as far as I could tell, the image links contained within them linked to the posterous website, whereas the HTML files contain links to the locally stored images (i.e. those downloaded as part of the archive). Maybe a combination of both is needed?

@zzamboni

This comment has been minimized.

Show comment
Hide comment
@zzamboni

zzamboni Mar 17, 2013

True - I've been working on and off the last couple of days on an updated
version (based on your script) that uses the XML source, and gets some
additional info from it like the original link. Once I get it working ill
post a pull request.

zzamboni commented Mar 17, 2013

True - I've been working on and off the last couple of days on an updated
version (based on your script) that uses the XML source, and gets some
additional info from it like the original link. Once I get it working ill
post a pull request.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Mar 19, 2013

I don't know if this will be useful to your efforts, but I wrote my own script to import my posts into Octopress, complete with fixed-up links, images, and videos (encoded videos are downloaded from Posterous instead of using the high bitrate originals): https://gist.github.com/nitrogenlogic/5200766

A bit of manual work still has to be done before and after with my script, such as gathering the XML files and minifying images. To generate the single XML file my script would need (since wordpress_export_1.xml has replaced all UTF-8 characters with question marks), I would do something like this:

cd /path/to/space-[numbers, name, etc.]
cat head.xml posts/*.xml > fixed_export.xml
echo '</channel></rss>' >> fixed_export.xml
cd /path/to/new/blog
./posterous_import.rb /path/to/space-[numbers, name, etc.]/fixed_export.xml

ghost commented Mar 19, 2013

I don't know if this will be useful to your efforts, but I wrote my own script to import my posts into Octopress, complete with fixed-up links, images, and videos (encoded videos are downloaded from Posterous instead of using the high bitrate originals): https://gist.github.com/nitrogenlogic/5200766

A bit of manual work still has to be done before and after with my script, such as gathering the XML files and minifying images. To generate the single XML file my script would need (since wordpress_export_1.xml has replaced all UTF-8 characters with question marks), I would do something like this:

cd /path/to/space-[numbers, name, etc.]
cat head.xml posts/*.xml > fixed_export.xml
echo '</channel></rss>' >> fixed_export.xml
cd /path/to/new/blog
./posterous_import.rb /path/to/space-[numbers, name, etc.]/fixed_export.xml
@zzamboni

This comment has been minimized.

Show comment
Hide comment
@zzamboni

zzamboni Mar 24, 2013

nitrogenlogic: thank you so much for this! It works perfectly. I've been making a few small changes which I will contribute back when I'm done.

zzamboni commented Mar 24, 2013

nitrogenlogic: thank you so much for this! It works perfectly. I've been making a few small changes which I will contribute back when I'm done.

parkr added a commit that referenced this pull request Mar 24, 2013

Merge pull request #12 from pauldmccarthy/initial-migrator-import
Import to jekyll from unzipped posterous archive directory

@parkr parkr merged commit 76ac0f5 into jekyll:initial-migrator-import Mar 24, 2013

parkr added a commit that referenced this pull request Mar 24, 2013

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Mar 24, 2013

I'm glad the script was useful to you, zzamboni.

ghost commented Mar 24, 2013

I'm glad the script was useful to you, zzamboni.

@zzamboni

This comment has been minimized.

Show comment
Hide comment
@zzamboni

zzamboni Mar 25, 2013

parkr: please consider integrating some of the code from nitrogenlogic's script to which he linked above. I found it very feature-complete and easy to use.

zzamboni commented Mar 25, 2013

parkr: please consider integrating some of the code from nitrogenlogic's script to which he linked above. I found it very feature-complete and easy to use.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 26, 2013

Member

@zzamboni I'd be happy to! Mind throwing together a PR with the bits you'd like to see come over into this project?

Member

parkr commented Mar 26, 2013

@zzamboni I'd be happy to! Mind throwing together a PR with the bits you'd like to see come over into this project?

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Apr 26, 2013

Member

Would either of you (@zzamboni or @nitrogenlogic) mind throwing together a PR for the enhancements you made, @nitrogenlogic? :)

Member

parkr commented Apr 26, 2013

Would either of you (@zzamboni or @nitrogenlogic) mind throwing together a PR for the enhancements you made, @nitrogenlogic? :)

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.