Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning issue for import of Google Groups to db Nodes #3305

Open
jywarren opened this issue Sep 7, 2018 · 2 comments
Open

Planning issue for import of Google Groups to db Nodes #3305

jywarren opened this issue Sep 7, 2018 · 2 comments

Comments

@jywarren
Copy link
Contributor

@jywarren jywarren commented Sep 7, 2018

This is a doozy, so watch out! Long term project, no immediate action needed.

We have (and can generate anytime) a full export of Google Groups content to .mbox format using Google Takeout. It's all public information except peoples email addresses.

Someday, we may want to import all of these as nodes, back-dated using the timestamp data, to make them searchable in PublicLab.org. This might involve several challenges:

  1. matching email addresses to usernames where possible
  2. displaying an alert that these were auto-imported from Google Groups, with a link to original URL
  3. ability to display "users" for each email address that does NOT have a matching user account
  4. whether to forward comment responses to these legacy nodes to everyone in that discussion using the old emails
  5. how to display a thread -- initial post as a node, then all responses as comments?
  6. how to ensure "reply back quoted text" is not displayed since it'll be disruptive (similar to reply by email filtering)
  7. how to actually run the import script using mbox data - maybe via https://github.com/darthbatman/mbox-json plus a Ruby script?
  8. do a test run of just one to see how it looks
  9. what tags to use automatically per-list?

I'm sure there's more. This is a starting list.

@ebarry

This comment has been minimized.

Copy link
Member

@ebarry ebarry commented Sep 10, 2018

For instance, might this look like ....

an email thread on plots-waterquality is logged as a back-dated research note authored by the original poster, titled with the former email subject line? And all responses might appear as comments on the research note? Tagged with water-quality ?

@jywarren

This comment has been minimized.

Copy link
Contributor Author

@jywarren jywarren commented Sep 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.