Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Comparing changes

Choose two branches to see what's changed or to start a new pull request. If you need to, you can also compare across forks.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also compare across forks.
base: 4af6f349a1
...
compare: 79e4c9b961
  • 6 commits
  • 1 file changed
  • 0 commit comments
  • 1 contributor
Commits on Sep 05, 2012
@rbrito Convert docstring to comment.
When a docstring is in the middle of the code (instead of at the very top of
it), it doesn't have the effects of being part of the __doc__ property of
the object (function, class etc) and becomes a useless statement (i.e., a
noop).

Just convert that to a real comment.
ec1a351
@rbrito Add encoding declaration for safeness. 8cef08a
@rbrito Follow PEP-8 determination of loading packages.
Native packages should be loaded first, one per line and in sorted order.
d84d785
@rbrito Guard case of empty element raising exception.
For some reason, with the current BeautifulSoup module from Debian unstable
(3.2.1-1), trying to access the .string of an element that has none (in this
case, `<guid isPermaLink="false"></guid>`) raises an exception instead of
returning an empty string.

This commit adds a test to guard against that kind of problem.
2942bf2
@rbrito cosmetics: Split line in two, for readability. ed0ca86
@rbrito Remove raw ^M characters included by wordpress.
When you compose posts in wordpress's web interface and you press Enter, it
inserts a raw `'\r'` character when you intended to have a newline instead.

This commit replaces such occurences with real newlines instead.
79e4c9b
Showing with 28 additions and 23 deletions.
  1. +28 −23 ikiwiki-wordpress-import.py
View
51 ikiwiki-wordpress-import.py
@@ -1,4 +1,5 @@
#!/usr/bin/env python
+# -*- coding: utf-8 -*-
"""
Purpose:
@@ -29,14 +30,16 @@
"""
-import os, sys
-import time
+import codecs
+import htmlentitydefs
+import os
import re
+import sys
+import time
from datetime import datetime
from BeautifulSoup import BeautifulSoup
-import codecs, htmlentitydefs
codecs.register_error('html_replace', lambda x: (''.join([u'&%s;' \
% htmlentitydefs.codepoint2name[ord(c)] for c in x.object[x.start:x.end]]), x.end))
@@ -49,32 +52,34 @@ def main(name, email, subdir, branch='master'):
for x in soup.findAll('item'):
# Ignore draft posts
- if x.find('wp:status').string != 'publish': continue
-
- match = stub_pattern.match(x.guid.string)
- if match:
- stub = match.groups()[0]
+ if x.find('wp:status').string != 'publish':
+ continue
+
+ if x.guid.string is not None:
+ match = stub_pattern.match(x.guid.string)
+ if match:
+ stub = match.groups()[0]
+ else:
+ # Fall back to our own stubs
+ stub = re.sub(r'[^a-zA-Z0-9_]', '-', x.title.string).lower()
else:
- # Fall back to our own stubs
- stub = re.sub(r'[^a-zA-Z0-9_]', '-', x.title.string).lower()
+ stub = ""
commit_msg = """Importing WordPress post "%s" [%s]""" % (x.title.string, x.guid.string)
timestamp = time.mktime(time.strptime(x.find('wp:post_date_gmt').string, "%Y-%m-%d %H:%M:%S"))
content = '[[!meta title="%s"]]\n' % (x.title.string.replace('"', r"'"))
content += "[[!meta date=\"%s\"]]\n" % datetime.fromtimestamp(timestamp)
- content += x.find('content:encoded').string.replace('\r\n', '\n')
-
- """
- We do it differently here because we have duplicates otherwise.
- Take a look:
- <category><![CDATA[Health]]></category>
- <category domain="category" nicename="health"><![CDATA[Health]]></category>
-
- If we do the what original did, we end up with all tags and cats doubled.
- Therefore we only pick out nicename="foo". Our 'True' below is our 'foo'.
- I'd much rather have the value of 'nicename', and tried, but my
- python skillz are extremely limited....
- """
+ content += x.find('content:encoded').string.replace('\r\n', '\n').replace('\r', '\n')
+
+ # We do it differently here because we have duplicates otherwise.
+ # Take a look:
+ # <category><![CDATA[Health]]></category>
+ # <category domain="category" nicename="health"><![CDATA[Health]]></category>
+ #
+ # If we do the what original did, we end up with all tags and cats doubled.
+ # Therefore we only pick out nicename="foo". Our 'True' below is our 'foo'.
+ # I'd much rather have the value of 'nicename', and tried, but my
+ # python skillz are extremely limited....
categories = x.findAll('category', nicename=True)
if categories:
content += "\n"

No commit comments for this range

Something went wrong with that request. Please try again.