New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make full name updates on tree changes more efficient #3175
Conversation
Going forward, please use ALso, use a more descriptive pull request name than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but like you mentioned, needs testing and performance testing
Also, does it make sense to add some tests for tree node numbering as part of this pull request?
I did some simple overall testing of tree functions and could not break anything. |
Not sure if node numbering tests should be added here, because I'm working on a different method to renumber trees, so it is possible that I might have to refactor the tests. I've identified a few more changes that could be done with merge functionality. Should this branch be merged into production before node numbering fix @maxpatiiuk? |
My recommendation: Since this is not the final solution to the tree problems, and just a step on the way there, finish all the things we planned to do for trees. What things are planned to do depends on the performance gains for a given improvement vs the complexity/implementation time cost. After you identified what needs to be implemented and implemented the changes in a production-ready way, you should mark the pull request as being ready for review. This way, the testers don't have to test your code several times after you make incremental changes to tree performance but can do very comprehensive testing after you are done with all changes. As far as automated tests, yes, we need to have tests at the end of this process. Whether that means you would write tests as you go along, or write them all at the end depends on your preference. The drawback of writing them as you go like you mentioned is that you may end up having to rewrite tests, but the positive is that you can use tests for testing your code as you go along, instead of having to manually test it (i.e, read about test driven development and it's pros/cons). Also, as a golden rule, try to write tests before testers start testing. This is because if tests have to be written anyway, writing them before testers start testing would prevent wasting their time (in case the tests would help discover the issue before testers do) |
Works well according to Edinburgh |
Triggered by ea0ab95 on branch refs/heads/issue_388
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the code looks good!
I do have some questions strictly regarding 'semantics' and code quality.
There is nothing here to outright prevent the changes getting merged, but I would like some clarification on things before it is merged into production.
@maxpatiiuk Can we merge this into production for our next release? |
This pull request has been mentioned on Specify Community Forum. There might be relevant details there: |
not yet, I need to think about an edge case that comes with merging. Haven't gotten time yet. |
looks good. As soon as Vinny fixes the things that Jason found we can merge this |
@grantfitzsimmons good for testing now. |
Important note: Since there is no index for node numbers, SQL will have to look at every row to find matching nodes. We should probably add an index to it. Even considering that, this is faster than previous code, because scanning all rows is cheaper than updating all rows. There are other places on the backend where tree nodes are found using node numbers - they will also be faster. |
If we index node numbers, the node numbers update will become slower, correct? |
Yes they will become slower. Actions other than desynonymize and synomize modify node numbers in someway. But also note that these exact actions also reset fullname for the entire table which we don't do now so depends on the total time saved.
Yes, at the end, renumbering will take more time. Each update statement in that function will take longer.
Well, the stats and queries don't use node numbers right now. The only reason to add indexing for node numbers here is to help the "where nodenumber between number1 and number2 clause (and their variations) which backend uses when opening, closing, and moving intervals, and now selecting nodes to update for fullname. Without node indexing, MySQL currently locks entire table (not a good thing - that's one of the reasons why we occasionally see lock timeout errors when using trees) - but these locks are usually for a small amount of time because these actions usually go fast. I'll set up some bench mark testing for these to get a comprehensive view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to address https://github.com/specify/specify7/pull/3175/files#r1180411187 before this is merged.
This pull request has been mentioned on Specify Community Forum. There might be relevant details there: https://discourse.specifysoftware.org/t/taxon-tree-manipulations-lock-database/1309/2 |
Conclusion: Vinny will make one change today. Will merge after. |
@bronwyncombs @specify/ux-testing
|
Fixes #388
Currently, the backend updates the full name for all the nodes in a tree table. However, we can restrict the full name update to just the nodes which are moved/inserted. I have added code that restricts the full name update to nodes that might be affected.
I have also modified the merge node function to modify full names after the nodes have been merged. Even though it will make stray full name updates (to nodes that were already in the target), it will do it only once. This is done because there is the additional overhead of updating the full name for every child. This will require additional testing and performance measurements. The change is trivial to remove it causes a problem.