Permalink
Browse files

Merge pull request #4 from davidedc/master

Adding one more method: export2HTMLFiles
  • Loading branch information...
2 parents 4869f91 + d2932e1 commit 208f163241579fa95a5f911e93d3a16b475f2066 @kerchen committed Mar 22, 2013
Showing with 97 additions and 3 deletions.
  1. +2 −0 .gitignore
  2. +14 −3 README.md
  3. +81 −0 export2HTMLFiles.py
View
@@ -0,0 +1,2 @@
+
+*.json
View
@@ -3,7 +3,8 @@ export2enex and export_gr2evernote
Exports previously-starred articles from Google Reader to Evernote
-Executive Summary: Unless you like crappy imported notes, you should use export2enex.py.
+Executive Summary: Three methods are given below - unless you like crappy imported notes,
+you should use either the export2enex.py or the export2HTMLFiles methods below.
export_gr2evernote.py uses Evernote's e-mail submission feature, which mangles
any HTML in the note, rendering it a pile of markup gibberish.
@@ -17,7 +18,17 @@ locally on your devices, I've found it's a good service for keeping my
data in the cloud without running the risk of losing it if/when Evernote
goes away.
-export2enex takes the exported JSON file produced by Google's Takeout
+**export2HTMLFiles** takes the exported JSON file produced by Google's Takeout
+(namely, 'starred.json') and dumps each starred entry in a numbered html file,
+names as per title of the entry. The html files can then be simply dropped into the
+Evernote desktop client.
+This approach also does a good job of preserving the formatting of the note, and
+it tends to be more consistent (more than 3000 entries from more than 200 blogs
+have been imported using this method under OSX, where the other methods failed).
+Since it's all local, there are no limits to how many notes you can import in one go.
+ Usage: export2HTMLFiles.py
+
+**export2enex** takes the exported JSON file produced by Google's Takeout
(namely, 'starred.json') and dumps it into Evernote, using Evernote's
export file format (.enex). Unlike export_gr2evernote.py, this approach
does a pretty good job of preserving the formatting of the note. Also,
@@ -29,7 +40,7 @@ JSON into Evernote enex format. Once you have it in enex format, you
can import it into Evernote using the desktop client.
Usage: export2enex.py [options] > filename.enex
-export_gr2evernote.py takes the exported JSON file produced by Google's
+**export_gr2evernote.py** takes the exported JSON file produced by Google's
Takeout (namely, 'starred.json') and dumps it into Evernote, using Evernote's
e-mail note submission feature. It doesn't do any formatting of what it
sends to Evernote, so it will most likely look pretty ugly in Evernote.
View
@@ -0,0 +1,81 @@
+# A script for exporting all starred items from Google Reader to HTML files,
+# using exported JSON data from Google's Takeout
+#
+# Copyright 2013 Paul Kerchen, Davide Della Casa
+#
+# This program is distributed under the terms of the GNU General Public License v3.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+import smtplib
+import json
+import io
+import getopt, sys
+import os.path
+import string
+
+
+# Provides a decent filename. A variation of: http://stackoverflow.com/a/295146/1318347
+def cleanFileName(value):
+ valid_chars = "-_() %s%s" % (string.ascii_letters, string.digits)
+ untrimmedFileName = ''.join(c for c in value if c in valid_chars)
+ maximumFileNameLength = 200
+ if len(untrimmedFileName) > maximumFileNameLength:
+ trimmedFileName = (untrimmedFileName[:maximumFileNameLength] + '..')
+ else:
+ trimmedFileName = untrimmedFileName
+ return trimmedFileName.strip()
+
+json_file = open("starred.json")
+json_dict = json.loads( unicode(json_file.read(), encoding="utf-8") )
+
+item_list = json_dict[ "items" ]
+
+articleCounter = 0
+for s in item_list:
+ articleCounter += 1
+ title = str(articleCounter)
+ if 'title' in s.keys():
+ title = title + " " + s["title"]
+ fileName = cleanFileName(title) + '.html'
+ file = open(fileName, 'w+')
+
+ html_body = ""
+
+ if 'alternate' in s.keys():
+ d = s["alternate"][0]
+ alternateURL = d["href"]
+ html_body = html_body + '<p>URL: <a href="'+alternateURL+'">'+alternateURL+'</a></p>'
+ if 'canonical' in s.keys():
+ d = s["canonical"][0]
+ canonicalURL = d["href"]
+ hintAboutSecondURL = (' 2') if 'alternate' in s.keys() else ''
+ html_body = html_body + '<p>URL'+hintAboutSecondURL+': <a href="'+canonicalURL+'">'+canonicalURL+'</a></p>'
+
+ html_body = html_body + '<hr>'
+
+ if 'summary' in s.keys():
+ d = s["summary"]
+ html_body = html_body + d["content"]
+ if 'content' in s.keys():
+ d = s["content"]
+ html_body = html_body + d["content"]
+
+ file.write(html_body.encode("UTF-8"))
+ file.close()
+
+ print('extracted: ' + fileName)
+
+print('...done')
+

0 comments on commit 208f163

Please sign in to comment.