New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export/import group data via json file #4632
Conversation
@robguthrie Does this need additional work to be merged? It looks like the existing code doesn't really do anything other than memberships... |
Please leave it for now. I modified it for memberships to repair the damage
last week. I'll pick it up again soon.
It's not super fast but it works well enough to get jobs done.
…On Mon, May 7, 2018 at 9:00 AM, James Kiesel ***@***.***> wrote:
@robguthrie <https://github.com/robguthrie> Does this need additional
work to be merged? It looks like the existing code doesn't really do
anything other than memberships...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4632 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAdr30P_yORjY1eIHASn_PkUgp1tmQtGks5tv6rTgaJpZM4TjPIS>
.
|
Would be nice to get this merged in, @robguthrie ; anything I can do to support? |
Hi @gdpelican. This version does not gobble memory when exporting large groups.. we stream data straight into the file with one json record per line. Ruby process stays around 100 mb rather than growing to 8gb. We may want to think about how to use the .import method, maybe by doing a group by table then import after dropping/ignoring existing records. Anyway, I'm looking to complete this over the next day or so. Hello. |
@gdpelican the export and import of our biggest groups can happen in 5 minutes or so each now, without crazy memory usage. I'm thinking burndown lists now.
|
Ok so we're now excluding user confidential fields. So I think now we enable users to export json for any group they belong to. I'm thinking that we create an attachment owned by the user with no attachable, and add the exported json to it, then add a delete delayed job scheduled for a week later. |
@gdpelican this is in final stages now. Just going to finish the email styling. CR appreciated |
@@ -37,6 +37,11 @@ def upload_photo | |||
respond_with_resource | |||
end | |||
|
|||
def export | |||
service.export(group: load_and_authorize(:group), actor: current_user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantic, but you should be able to do
service.export(group: load_resource, actor: current_user)
because it should be the service's job to authorize whether you can export or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of related. One thing that might be relevant is anonymous polls.
A data export would allow people to see who voted what.
If it were not for anonymous polls then I'd say that anyone who is a member should be able to export the group data.
If there are anonymous polls, I don't know if anyone should be able to download the data. It's a weird situation. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a poll's anonymous, we should anonymize the export data.
Here's where I kinda wish we had serializers for these things, because it would mean we could make tweaks like this a bit more easily, rather than throwing scopes on the exportable_relations.
Instead of that though (I reckon it's a PITA), I think I'd prefer filtering out anonymous polls over to including them
# exportable_relations.rb
has_many :exportable_polls, -> { where(anonymous: false) }, source: :polls
# group_export_service
RELATIONS = [
...
'exportable_polls' # (and not 'polls')
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(NB that we don't use the 'discussion_polls' relation at the moment, because we're ensuring that a poll has the group_id set correctly if the discussion_id is set. A bit stateful-y, but it's worked so far.
@@ -48,7 +48,7 @@ class FormalGroup < Group | |||
belongs_to :default_group_cover | |||
|
|||
has_many :subgroups, | |||
-> { where(archived_at: nil).order(:name) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, but wonder if we need to check the /g/:key/subgroups
query to make sure it's still returning the same thing?
app/models/guest_group.rb
Outdated
end | ||
|
||
def documents | ||
Document.none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm I wonder if we want to move the has_many :documents
line from FormalGroup to Group instead of doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -89,6 +89,6 @@ def discussion_readers | |||
private | |||
|
|||
def set_volume | |||
self.volume = user.default_membership_volume if group.is_formal_group? | |||
self.volume = user.default_membership_volume if id.nil? && group.is_formal_group? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hmm what is this change needed for?
I think more idiomatic is self.new_record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've working to remove all unnecessary N+1 queries. Reading volume triggers a query in a couple of places (checking default membership volume and group volume) and it also pollutes the export because it returns a value that isn't actually what the column actually contains.
So that's why I've moved to using a method that clearly says it's giving a computed value for volume rather than overloading the simple accessor. I like it more and it hasn't really been a problem to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was referring to the id.nil? addition.
@@ -33,10 +33,6 @@ def discussion_reader_id | |||
object.id | |||
end | |||
|
|||
def discussion_reader_volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets used in discussion_model.coffee
, I wonder if we need to account for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, wonder if volume changes are not strictly necessary for group export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A flag that this needs to be addressed in some way before merge; we can't be referencing discussionReaderVolume
anywhere on the client if this method is taken out
def perform(group, actor) | ||
groups = actor.groups.where(id: group.all_groups) | ||
filename = GroupExportService.export_filename_for(group) | ||
GroupExportService.export(groups, filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected
filename = GroupExportService.export(group)
and then call group.all_groups from within the service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan was that anyone in a group can export the data. Meaning they should only be able to export data for groups they belong to. So I want to pass in only groups the user belongs to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, then better to pass in the actor here and do that line within the service. I think the filename method is out of place here, there's no reason to have two static methods on GroupExportService like this, since they're so intertwined.
filename = GroupExportService.export(group, actor)
document = Document.create author: actor, file: File.open(filename, 'r'), title: filename
UserMailer.group_export_ready(actor, group, document).deliver
I also wonder about creating an event for this so that we can easily track it and send it to other places in the future if we want (like, text me when it's done, or send me a push notification with a link to the thing, etc.)
GroupExported.publish!(group, actor, document)
This also maintains our current distance from our ideal of 'events are the only way to send emails within the app'
|
||
$scope.openGroupExportModal = -> | ||
ModalService.open 'ConfirmModal', confirm: -> | ||
submit: -> Records.groups.export($scope.group.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend towards actions on the model, so
# group_model.coffee
export: =>
@remote.post(@id, 'export')
submit: $scope.group.export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
@@ -12,6 +12,11 @@ | |||
sign_in user | |||
end | |||
|
|||
describe 'export' do | |||
it 'kicks off a group export job' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to write these, even if it's just a check for 200 and 403. Maybe check to ensure Document.count is incremented too.
end | ||
|
||
def self.import(filename) | ||
tables = File.open(filename, 'r').map { |line| JSON.parse(line)['table'] }.uniq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some stuff that smells a bit here, but I haven't got a better solution off the top of my head given that (I'm assuming) we don't want to port all of the file into memory. I think json parsing each line of the file N+1 times is going to hurt us, and wonder if there's a way to avoid it.
I wouldn't be opposed to, say, iterating through each line of the file, parsing it, and then,
if the table matches the current table, klass.new
and append it to an array of records
if the table doesn't match, klass.import
, flush the existing array, and start a new one with the new table
Then we only iterate through the file once, run JSON.parse once per line, and still maintain support for weirdo input like a stray group in a middle of a big run of discussions or whatnot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS I believe import will silently fail imports with ids that exist already, which would be the same behaviour as this would exhibit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import totally fails if some of the ids already exist. so the check is necessary.
Thanks for helping to try and improve, but i think I'd like to call it as good enough for now. It's the best solution (fastest export for large groups by miles) after quite a few attempts and I'd like to move on.
app/services/group_export_service.rb
Outdated
end | ||
|
||
def self.export_filename_for(group) | ||
"tmp/#{DateTime.now.strftime("%Y-%m-%d_%H-%M-%S")}_#{group.name.parameterize}.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note that the group.name.parameterize will fail for guest groups, which we should be able to export just fine.
@gdpelican my show stopper issue with this is anonymous polls. What do you think we should do? Some options that come to mind:
Any ideas? |
…t belong to group whose data is being requested
…after file.url has been updated with full path
No description provided.