Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pelican Comment System: add Blogger comments export script #835

Merged
merged 2 commits into from
Jan 12, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pelican_comment_system/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](http://semver.org/).

## 1.3.0 - 2017-01-10
### Added
- add [blogger_comment_export.py](import/blogger_comment_export.py) script to export comments from Blogger XML export and [associated documentation](docs/import.md) [PR #835](https://github.com/getpelican/pelican-plugins/pull/835)

## 1.2.2 – 2016-12-19
### Fixed
- Correct jQuery expression in cancelReply method [PR #820](https://github.com/getpelican/pelican-plugins/pull/820)
Expand Down
1 change: 1 addition & 0 deletions pelican_comment_system/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Bernhard Scheirle | <http://bernhard.scheirle.de> | <https://github.com/Scheirl

- [Quickstart Guide](doc/quickstart.md)
- [Installation and basic usage](doc/installation.md)
- [Import existing comments](docs/import.md)
- [Avatars and identicons](doc/avatars.md)
- [Comment Atom feed](doc/feed.md)

Expand Down
47 changes: 47 additions & 0 deletions pelican_comment_system/doc/import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Importing Comments

**Note**: Contributions to this section are welcomed!

When moving to Pelican and the Pelican Comment System, it may be desirable to move over your comments as well.

The scripts to support this are found in the `import` directory.

## Blogger

Blogger is good in that it will give you an export of everything, but the bad news is it's one giant XML file. XML is great if you're a computer, but a bit of a pain if you're a human.

The code I used to export my comments from Blogger is found at [blogger_comment_export.py](../import/blogger_comment_export.py).

To use it
yourself, you will need to first adjust the constants at the beginning of the
script (lines 26-33) to point to your Blogger XML export and where you want
the comments to be exported to. You will also need to install `untangle`
(available through pip -- `pip install untangle`).

Comments will be exported into folders matching
the Blogger slug of the post. The email for all authors will be `noreply@blogger.com`. The other file created will be `authors.txt`
which lists the various comment authors, and a link to the profile
picture used on Blogger. These pictures will need to be manually downloaded
and then configured using the `PELICAN_COMMENT_SYSTEM_AUTHORS` setting.
In my case, that looked like this:

```python
# in pelicanconf.py
PELICAN_COMMENT_SYSTEM_AUTHORS = {
('PROTIK KHAN', 'noreply@blogger.com'): "images/authors/rabiul_karim.webp",
('Matthew Hartzell', 'noreply@blogger.com'): "images/authors/matthew_hartzell.webp",
('Jens-Peter Labus', 'noreply@blogger.com'): "images/authors/jens-peter_labus.png",
('Bridget', 'noreply@blogger.com'): "images/authors/bridget.jpg",
('melissaclee', 'noreply@blogger.com'): "images/authors/melissa_lee.jpg",
('Melissa', 'noreply@blogger.com'): "images/authors/melissa_lee.jpg"
}
```

The script was developed for Python 3.6, but should work on Python 3.4+
without modification.

For more information on this script on, you can read my
[blog post](http://blog.minchin.ca/2016/12/blogger-comments-exported.html)
where I introduced it.

-- Wm. Minchin (@MinchinWeb), January 10, 2017
165 changes: 165 additions & 0 deletions pelican_comment_system/import/blogger_comment_export.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
#! python3.6
"""
Export Comments from BLogger XML

Takes in a Blogger export XML file and spits out each comment in a seperate
file, such that can be used with the [Pelican Comment System]
(https://bernhard.scheirle.de/posts/2014/March/29/static-comments-via-email/).

May be simple to extend to export posts as well.

For a more detailed desciption, read my blog post at
http://blog.minchin.ca/2016/12/blogger-comments-exported.html

Author: Wm. Minchin -- minchinweb@gmail.com
License: MIT
Changes:

- 2016.12.29 -- initial release
- 2017.01.10 -- clean-up for addition in Pelican Comment System repo
"""

from pathlib import Path

import untangle

###############################################################################
# Constants #
###############################################################################

BLOGGER_EXPORT = r'c:\tmp\blog.xml'
COMMENTS_DIR = 'comments'
COMMENT_EXT = '.md'
AUTHORS_FILENAME = 'authors.txt'

###############################################################################
# Main Code Body #
###############################################################################

authors_and_pics = []


def main():
obj = untangle.parse(BLOGGER_EXPORT)

templates = 0
posts = 0
comments = 0
settings = 0
others = 0

for entry in obj.feed.entry:
try:
full_type = entry.category['term']
except TypeError:
# if a post is under multiple categories
for my_category in entry.category:
full_type = my_category['term']
# str.find() uses a return of `-1` to denote failure
if full_type.find('#') != -1:
break
else:
others += 1

simple_type = full_type[full_type.find('#')+1:]

if 'settings' == simple_type:
settings += 1
elif 'post' == simple_type:
posts += 1
# process posts here
elif 'comment' == simple_type:
comments += 1
process_comment(entry, obj)
elif 'template' == simple_type:
templates += 1
else:
others += 1

export_authors()

print('''
{} template
{} posts (including drafts)
{} comments
{} settings
{} other entries'''.format(templates,
posts,
comments,
settings,
others))


def process_comment(entry, obj):
# e.g. "tag:blogger.com,1999:blog-26967745.post-4115122471434984978"
comment_id = entry.id.cdata
# in ISO 8601 format, usable as is
comment_published = entry.published.cdata
comment_body = entry.content.cdata
comment_post_id = entry.thr_in_reply_to['ref']
comment_author = entry.author.name.cdata
comment_author_pic = entry.author.gd_image['src']
comment_author_email = entry.author.email.cdata

# add author and pic to global list
global authors_and_pics
authors_and_pics.append((comment_author, comment_author_pic))

# use this for a filename for the comment
# e.g. "4115122471434984978"
comment_short_id = comment_id[comment_id.find('post-')+5:]

comment_text = "date: {}\nauthor: {}\nemail: {}\n\n{}\n"\
.format(comment_published,
comment_author,
comment_author_email,
comment_body)

# article
for entry in obj.feed.entry:
entry_id = entry.id.cdata
if entry_id == comment_post_id:
article_entry = entry
break
else:
print("No matching article for comment", comment_id, comment_post_id)
# don't process comment further
return

# article slug
for link in article_entry.link:
if link['rel'] == 'alternate':
article_link = link['href']
break
else:
article_title = article_entry.title.cdata
print('Could not find slug for', article_title)
article_link = article_title.lower().replace(' ', '-')

article_slug = article_link[article_link.rfind('/')+1:
article_link.find('.html')]

comment_filename = Path(COMMENTS_DIR).resolve()
# folder; if it doesn't exist, create it
comment_filename = comment_filename / article_slug
comment_filename.mkdir(parents=True, exist_ok=True)
# write the comment file
comment_filename = comment_filename / (comment_short_id + COMMENT_EXT)
comment_filename.write_text(comment_text)


def export_authors():
to_export = set(authors_and_pics)
to_export = list(to_export)
to_export.sort()

str_export = ''
for i in to_export:
str_export += (i[0] + '\t\t' + i[1] + '\n')

authors_filename = Path(COMMENTS_DIR).resolve() / AUTHORS_FILENAME
authors_filename.write_text(str_export)


if __name__ == "__main__":
main()
3 changes: 3 additions & 0 deletions pelican_comment_system/pelican_comment_system.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@
from . import avatars


__version__ = "1.3.0"


_all_comments = []
_pelican_writer = None
_pelican_obj = None
Expand Down