Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk adding of children #274

Open
patrickdavey opened this issue Jun 21, 2017 · 3 comments
Open

Bulk adding of children #274

patrickdavey opened this issue Jun 21, 2017 · 3 comments
Labels

Comments

@patrickdavey
Copy link

Hi there,

Thanks for the awesome gem - it's exactly what I was looking for.

I'm having an issue where I'm trying to add 200 child nodes and it's just a slow process. I have had a look through the documentation, but I can't see an easy way to do bulk inserts, at least, not a supported way.

I can do a bulk insert of all the basic Nodes, setting the parent_id properly, but, creating the hierarchy entry correctly is a little trickier (though I am willing to look into that, but the locking made me a little wary).

My situation is perhaps complicated by the fact that I have a position column, and, further that I am currently on Rails 3.2 so I'm using closure tree 5.2 Ideally I'd like to simply set the position when I am doing a bulk insert so that no UPDATE needs to be performed, again, quite unnecessary in my case.

Can you let me know if there's a strategy for bulk insertion? If not, is there an approach which could work which I might be able to offer as a PR?

Thanks for any & all help & pointers - and thanks again for such a great gem!

@patrickdavey
Copy link
Author

I went ahead and tried to see if I could do a basic bulk update.. (this is very rough code, the rough thing would be wrapped in a transaction (or use advisory_lock?!?!) Anyway, like I said, very rough, I'm using in one place activerecord-import but for the SectionHierarchy I wrote in the straight SQL as that table isn't (easily?!) exposed.

My model is called Section but it is using STI. In my case I'm inserting 200 DocumentProblemNodes. (don't ask about naming ;) this is just a spike, and naming is hard ;)

      columns = [:type, :document_problem_id, :document_id, :position, :parent_id]
      values = document_problem_ids.each_with_index.map { |dp_id, idx| ["DocumentProblemNode", dp_id, document_id, idx, problem_container.id] }
      DocumentProblemNode.import(columns, values)

      # at this point, all of the basic parent info is created, now the hierarchy needs to be modified.

      node_ids_we_have_just_inserted = problem_container.children.where(document_problem_id: document_problem_ids).pluck(:id)

      # now we insert a hierarchy entry (generation zero) for each of these entries.
      value_string = node_ids_we_have_just_inserted.map { |node_id| "(#{node_id}, #{node_id}, 0)" }.join(",")
      sql = "INSERT INTO section_hierarchies (ancestor_id, descendant_id, generations) VALUES #{value_string};"
      ActiveRecord::Base.connection.execute(sql)

      node_ids_we_have_just_inserted.each do |node_id|
        sql = <<-SQL.strip_heredoc
          INSERT INTO section_hierarchies
            (ancestor_id, descendant_id, generations)
          SELECT x.ancestor_id, #{node_id}, x.generations + 1
          FROM section_hierarchies x
          WHERE x.descendant_id = #{problem_container.id }
        SQL
        ActiveRecord::Base.connection.execute(sql)
      end

If I insert my records using the above, I take the time to insert from around 3.660026 to 0.411989 seconds, which is a pretty significant difference for me.

Obviously this is just a hack, and I'm sure there's lots I'm missing, but, if I wrap this is a transaction is it likely to be OK? (mysql user here unfortunately :( What was the issue that led you to use advisory lock?

In my simple test above, after creating the structure, I then called my_model.hash_tree, saved it off into a variable, rebuilt it, and checked ... the two hash's were identical, so it seems like this approach might work.

I'm loathe to do it though, as there's obviously heaps I don't know about.

@mceachen
Copy link
Collaborator

mceachen commented Jun 27, 2017

If you disable the hierarchical maintenance callbacks, you can update all your models, then call .rebuild! on the parent node to all the new nodes.

BTW: please read, comment, and vote on #277 !

@patrickdavey
Copy link
Author

Thanks @mceachen , I'll test and see whether disabling the callbacks helps much. If it's still doing a single insert per row then that's not great for me! Is there a nice way to do that just for a single instance? I'm assuming it's the _ct_after_save I want to skip?

Thanks for such a great gem ! I hope the community can take it over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants