New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export/import group data via json file #4632
Changes from 23 commits
b65a7b9
31fa3dd
78a3985
972b946
3d1b8e1
894e2ed
82b2060
37ff9ac
b865763
9f48008
c77cf87
ccec44d
89c16a0
ed13291
e349ca9
2880358
3cf0db0
6944102
c240a7f
fc68d90
9f7e1c2
194c646
6c3091d
8c4cb79
696393c
d7b16aa
a00c30e
c69a580
4acbfad
6ec1d00
c2fd158
e3027c5
d89bdc6
9785df1
132aad6
e14a40f
0f4bce3
e26f879
792c435
69443f0
d028687
9c710c9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
class GroupExportJob < ActiveJob::Base | ||
queue_as :low_priority | ||
|
||
def perform(group, actor) | ||
groups = actor.groups.where(id: group.all_groups) | ||
filename = GroupExportService.export_filename_for(group) | ||
GroupExportService.export(groups, filename) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would have expected
and then call group.all_groups from within the service There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The plan was that anyone in a group can export the data. Meaning they should only be able to export data for groups they belong to. So I want to pass in only groups the user belongs to. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, then better to pass in the actor here and do that line within the service. I think the filename method is out of place here, there's no reason to have two static methods on GroupExportService like this, since they're so intertwined.
I also wonder about creating an event for this so that we can easily track it and send it to other places in the future if we want (like, text me when it's done, or send me a push notification with a link to the thing, etc.)
This also maintains our current distance from our ideal of 'events are the only way to send emails within the app' |
||
document = Document.create(author: actor, file: File.open(filename, 'r'), title: filename) | ||
UserMailer.group_export_ready(actor, group, document).deliver | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
module GroupExportRelations | ||
extend ActiveSupport::Concern | ||
|
||
included do | ||
# polls | ||
has_many :discussion_polls, through: :discussions | ||
has_many :poll_options, through: :polls | ||
has_many :poll_unsubscriptions, through: :polls | ||
has_many :poll_did_not_votes, through: :polls | ||
has_many :outcomes, through: :polls | ||
has_many :stances, through: :polls | ||
has_many :stance_choices, through: :stances | ||
|
||
# documents | ||
has_many :discussion_documents, through: :discussions, source: :documents | ||
has_many :poll_documents, through: :polls, source: :documents | ||
has_many :comment_documents, through: :comments, source: :documents | ||
has_many :public_discussion_documents, through: :public_discussions, source: :documents | ||
has_many :public_poll_documents, through: :public_polls, source: :documents | ||
has_many :public_comment_documents, through: :public_comments, source: :documents | ||
|
||
# reactions | ||
has_many :discussion_reactions, -> { joins(:user) }, through: :discussions, source: :reactions | ||
has_many :poll_reactions, -> { joins(:user) }, through: :polls, source: :reactions | ||
has_many :stance_reactions, -> { joins(:user) }, through: :stances, source: :reactions | ||
has_many :comment_reactions, -> { joins(:user) }, through: :comments, source: :reactions | ||
has_many :outcome_reactions, -> { joins(:user) }, through: :outcomes, source: :reactions | ||
|
||
# readers | ||
has_many :discussion_readers, through: :discussions | ||
|
||
# guest groups | ||
has_many :discussion_guest_groups, through: :discussions, source: :guest_group | ||
has_many :poll_guest_groups, through: :polls, source: :guest_group | ||
|
||
# users | ||
has_many :discussion_authors, through: :discussions, source: :author | ||
# has_many :discussion_reader_users, through: :discussion_readers, source: :user | ||
has_many :comment_authors, through: :comments, source: :user | ||
has_many :poll_authors, through: :polls, source: :author | ||
has_many :outcome_authors, through: :outcomes, source: :author | ||
has_many :stance_authors, through: :stances, source: :participant | ||
has_many :reader_users, through: :discussion_readers, source: :user | ||
has_many :non_voters, through: :poll_did_not_votes, source: :user | ||
|
||
# events | ||
has_many :membership_events, through: :memberships, source: :events | ||
has_many :discussion_events, through: :discussions, source: :events | ||
has_many :comment_events, through: :comments, source: :events | ||
has_many :poll_events, through: :polls, source: :events | ||
has_many :outcome_events, through: :outcomes, source: :events | ||
has_many :stance_events, through: :stances, source: :events | ||
end | ||
|
||
def all_groups | ||
Queries::UnionQuery.for(:groups, [ | ||
Group.where(id: self.id), | ||
self.subgroups, | ||
self.discussion_guest_groups, | ||
self.poll_guest_groups | ||
]) | ||
end | ||
|
||
def all_users | ||
Queries::UnionQuery.for(:users, [ | ||
self.members, | ||
self.discussion_authors, | ||
self.comment_authors, | ||
self.poll_authors, | ||
self.outcome_authors, | ||
self.stance_authors, | ||
self.reaction_users, | ||
self.reader_users, | ||
self.non_voters | ||
]) | ||
end | ||
|
||
def all_events | ||
Queries::UnionQuery.for(:events, [ | ||
self.membership_events, | ||
self.discussion_events, | ||
self.comment_events, | ||
self.poll_events, | ||
self.outcome_events, | ||
self.stance_events | ||
]) | ||
end | ||
|
||
def all_notifications | ||
Notification.where(event_id: all_events.pluck(:id)) | ||
end | ||
|
||
def all_documents | ||
Queries::UnionQuery.for(:documents, [ | ||
self.documents, | ||
self.discussion_documents, | ||
self.poll_documents, | ||
self.comment_documents | ||
]) | ||
end | ||
|
||
def all_reactions | ||
Queries::UnionQuery.for(:reactions, [ | ||
self.discussion_reactions, | ||
self.poll_reactions, | ||
self.stance_reactions, | ||
self.comment_reactions, | ||
self.outcome_reactions | ||
]) | ||
end | ||
|
||
def reaction_users | ||
User.where(id: all_reactions.pluck(:user_id)) | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,7 +48,7 @@ class FormalGroup < Group | |
belongs_to :default_group_cover | ||
|
||
has_many :subgroups, | ||
-> { where(archived_at: nil).order(:name) }, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with this, but wonder if we need to check the |
||
-> { where(archived_at: nil) }, | ||
class_name: 'Group', | ||
foreign_key: 'parent_id' | ||
has_many :all_subgroups, class_name: 'Group', foreign_key: :parent_id | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,14 @@ def id_and_subgroup_ids | |
Array(id) | ||
end | ||
|
||
def subgroups | ||
Group.none | ||
end | ||
|
||
def documents | ||
Document.none | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmmm I wonder if we want to move the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
end | ||
|
||
def group_privacy=(term) | ||
raise 'guest groups cant be open' if term == 'open' | ||
super | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,6 +89,6 @@ def discussion_readers | |
private | ||
|
||
def set_volume | ||
self.volume = user.default_membership_volume if group.is_formal_group? | ||
self.volume = user.default_membership_volume if id.nil? && group.is_formal_group? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh hmm what is this change needed for? I think more idiomatic is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've working to remove all unnecessary N+1 queries. Reading volume triggers a query in a couple of places (checking default membership volume and group volume) and it also pollutes the export because it returns a value that isn't actually what the column actually contains. So that's why I've moved to using a method that clearly says it's giving a computed value for volume rather than overloading the simple accessor. I like it more and it hasn't really been a problem to change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I was referring to the id.nil? addition. |
||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,10 +33,6 @@ def discussion_reader_id | |
object.id | ||
end | ||
|
||
def discussion_reader_volume | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This gets used in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, wonder if volume changes are not strictly necessary for group export There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A flag that this needs to be addressed in some way before merge; we can't be referencing |
||
object.volume | ||
end | ||
|
||
def seen_by_count | ||
object.discussion.seen_by_count | ||
end | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
class GroupExportService | ||
RELATIONS = %w[ | ||
all_users | ||
all_events | ||
all_notifications | ||
all_documents | ||
all_reactions | ||
memberships | ||
membership_requests | ||
discussions | ||
polls | ||
poll_options | ||
poll_did_not_votes | ||
poll_unsubscriptions | ||
outcomes | ||
stances | ||
stance_choices | ||
discussion_readers | ||
comments | ||
] | ||
|
||
JSON_PARAMS = { groups: {methods: [:type]}, | ||
users: {except: [:encrypted_password, | ||
:reset_password_token, | ||
:email_api_key, | ||
:reset_password_token, | ||
:unsubscribe_token] }}.with_indifferent_access.freeze | ||
|
||
def self.export(groups, filename) | ||
ids = Hash.new { |hash, key| hash[key] = [] } | ||
File.open(filename, 'w') do |file| | ||
groups.each do |group| | ||
puts_record(group, file, ids) | ||
RELATIONS.each do |relation| | ||
puts "Exporting: #{relation}" | ||
group.send(relation).find_each(batch_size: 20000) do |record| | ||
puts_record(record, file, ids) | ||
end | ||
end | ||
end | ||
end | ||
end | ||
|
||
def self.export_filename_for(group) | ||
"tmp/#{DateTime.now.strftime("%Y-%m-%d_%H-%M-%S")}_#{group.name.parameterize}.json" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A note that the group.name.parameterize will fail for guest groups, which we should be able to export just fine. |
||
end | ||
|
||
def self.puts_record(record, file, ids) | ||
table = record.class.table_name | ||
return if ids[table].include?(record.id) | ||
ids[table] << record.id | ||
file.puts({table: table, record: record.as_json(JSON_PARAMS[table])}.to_json) | ||
end | ||
|
||
def self.import(filename) | ||
tables = File.open(filename, 'r').map { |line| JSON.parse(line)['table'] }.uniq | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's some stuff that smells a bit here, but I haven't got a better solution off the top of my head given that (I'm assuming) we don't want to port all of the file into memory. I think json parsing each line of the file N+1 times is going to hurt us, and wonder if there's a way to avoid it. I wouldn't be opposed to, say, iterating through each line of the file, parsing it, and then, Then we only iterate through the file once, run JSON.parse once per line, and still maintain support for weirdo input like a stray group in a middle of a big run of discussions or whatnot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PS I believe import will silently fail imports with ids that exist already, which would be the same behaviour as this would exhibit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. import totally fails if some of the ids already exist. so the check is necessary. Thanks for helping to try and improve, but i think I'd like to call it as good enough for now. It's the best solution (fastest export for large groups by miles) after quite a few attempts and I'd like to move on. |
||
tables.each do |table| | ||
klass = table.classify.constantize | ||
existing_ids = klass.pluck(:id) | ||
new_records = File.open(filename, 'r').map do |line| | ||
data = JSON.parse(line) | ||
next unless (data['table'] == table && !existing_ids.include?(data['record']['id'])) | ||
klass.new(data['record']) | ||
end.compact! | ||
klass.import(new_records, validate: false) | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
%p= t :'user_mailer.group_export_ready.body_html', url: @document.file.url |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,18 @@ angular.module('loomioApp').directive 'groupActionsDropdown', -> | |
replace: true | ||
controller: ['$scope', ($scope) -> | ||
|
||
$scope.canExportData = -> | ||
Session.user().isMemberOf($scope.group) | ||
|
||
$scope.openGroupExportModal = -> | ||
ModalService.open 'ConfirmModal', confirm: -> | ||
submit: -> Records.groups.export($scope.group.id) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We tend towards actions on the model, so
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, thanks |
||
text: | ||
title: 'group_export_modal.title' | ||
helptext: 'group_export_modal.body' | ||
submit: 'group_export_modal.submit' | ||
flash: 'group_export_modal.flash' | ||
|
||
$scope.canAdministerGroup = -> | ||
AbilityService.canAdministerGroup($scope.group) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantic, but you should be able to do
because it should be the service's job to authorize whether you can export or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of related. One thing that might be relevant is anonymous polls.
A data export would allow people to see who voted what.
If it were not for anonymous polls then I'd say that anyone who is a member should be able to export the group data.
If there are anonymous polls, I don't know if anyone should be able to download the data. It's a weird situation. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a poll's anonymous, we should anonymize the export data.
Here's where I kinda wish we had serializers for these things, because it would mean we could make tweaks like this a bit more easily, rather than throwing scopes on the exportable_relations.
Instead of that though (I reckon it's a PITA), I think I'd prefer filtering out anonymous polls over to including them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(NB that we don't use the 'discussion_polls' relation at the moment, because we're ensuring that a poll has the group_id set correctly if the discussion_id is set. A bit stateful-y, but it's worked so far.