Permalink
Browse files

adding export2HTMLFiles method...

export2HTMLFiles takes the exported JSON file produced by Google's
Takeout
(namely, 'starred.json') and dumps each starred entry in a numbered
html file,
names as per title of the entry. The html files can then be simply
dropped into the
Evernote desktop client.
This approach also does a good job of preserving the formatting of the
note, and
it tends to be more consistent (more than 3000 entries from more than
200 blogs
have been imported using this method under OSX, where the other methods
failed).
Since it's all local, there are no limits to how many notes you can
import in one go.
    Usage: export2HTMLFiles.py
  • Loading branch information...
1 parent 7de305c commit c803be2bc5c437a4aa5c7a941a70d970286b686a @davidedc davidedc committed Mar 22, 2013
Showing with 91 additions and 0 deletions.
  1. +10 −0 README.md
  2. +81 −0 export2HTMLFiles.py
View
10 README.md
@@ -17,6 +17,16 @@ locally on your devices, I've found it's a good service for keeping my
data in the cloud without running the risk of losing it if/when Evernote
goes away.
+export2HTMLFiles takes the exported JSON file produced by Google's Takeout
+(namely, 'starred.json') and dumps each starred entry in a numbered html file,
+names as per title of the entry. The html files can then be simply dropped into the
+Evernote desktop client.
+This approach also does a good job of preserving the formatting of the note, and
+it tends to be more consistent (more than 3000 entries from more than 200 blogs
+have been imported using this method under OSX, where the other methods failed).
+Since it's all local, there are no limits to how many notes you can import in one go.
+ Usage: export2HTMLFiles.py
+
export2enex takes the exported JSON file produced by Google's Takeout
(namely, 'starred.json') and dumps it into Evernote, using Evernote's
export file format (.enex). Unlike export_gr2evernote.py, this approach
View
81 export2HTMLFiles.py
@@ -0,0 +1,81 @@
+# A script for exporting all starred items from Google Reader to HTML files,
+# using exported JSON data from Google's Takeout
+#
+# Copyright 2013 Paul Kerchen, Davide Della Casa
+#
+# This program is distributed under the terms of the GNU General Public License v3.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+import smtplib
+import json
+import io
+import getopt, sys
+import os.path
+import string
+
+
+# Provides a decent filename. A variation of: http://stackoverflow.com/a/295146/1318347
+def cleanFileName(value):
+ valid_chars = "-_() %s%s" % (string.ascii_letters, string.digits)
+ untrimmedFileName = ''.join(c for c in value if c in valid_chars)
+ maximumFileNameLength = 200
+ if len(untrimmedFileName) > maximumFileNameLength:
+ trimmedFileName = (untrimmedFileName[:maximumFileNameLength] + '..')
+ else:
+ trimmedFileName = untrimmedFileName
+ return trimmedFileName.strip()
+
+json_file = open("starred.json")
+json_dict = json.loads( unicode(json_file.read(), encoding="utf-8") )
+
+item_list = json_dict[ "items" ]
+
+articleCounter = 0
+for s in item_list:
+ articleCounter += 1
+ title = str(articleCounter)
+ if 'title' in s.keys():
+ title = title + " " + s["title"]
+ fileName = cleanFileName(title) + '.html'
+ file = open(fileName, 'w+')
+
+ html_body = ""
+
+ if 'alternate' in s.keys():
+ d = s["alternate"][0]
+ alternateURL = d["href"]
+ html_body = html_body + '<p>URL: <a href="'+alternateURL+'">'+alternateURL+'</a></p>'
+ if 'canonical' in s.keys():
+ d = s["canonical"][0]
+ canonicalURL = d["href"]
+ hintAboutSecondURL = (' 2') if 'alternate' in s.keys() else ''
+ html_body = html_body + '<p>URL'+hintAboutSecondURL+': <a href="'+canonicalURL+'">'+canonicalURL+'</a></p>'
+
+ html_body = html_body + '<hr>'
+
+ if 'summary' in s.keys():
+ d = s["summary"]
+ html_body = html_body + d["content"]
+ if 'content' in s.keys():
+ d = s["content"]
+ html_body = html_body + d["content"]
+
+ file.write(html_body.encode("UTF-8"))
+ file.close()
+
+ print('extracted: ' + fileName)
+
+print('...done')
+

1 comment on commit c803be2

@bbattjer

Thanks! Best solution I've found so far, especially with the Mac client. Still confused why the .enex version doesn't work as I exported many notes created in evernote itself and did side by sides against ones created from the script and couldn't find any differences.

Anyway - thanks for this! Importing over 11,000 articles in to the google reader mac client as we speak via HTML!

Please sign in to comment.