Make full name updates on tree changes more efficient #3175

realVinayak · 2023-03-14T21:36:19Z

Fixes #388

Currently, the backend updates the full name for all the nodes in a tree table. However, we can restrict the full name update to just the nodes which are moved/inserted. I have added code that restricts the full name update to nodes that might be affected.

I have also modified the merge node function to modify full names after the nodes have been merged. Even though it will make stray full name updates (to nodes that were already in the target), it will do it only once. This is done because there is the additional overhead of updating the full name for every child. This will require additional testing and performance measurements. The change is trivial to remove it causes a problem.

maxpatiiuk · 2023-03-17T03:38:45Z

Going forward, please use issue-388 instead of issue_388 for branch naming (so that all issue branches are sorted consistently in the test panel)

ALso, use a more descriptive pull request name than Issue 388

maxpatiiuk

Looks good, but like you mentioned, needs testing and performance testing

Also, does it make sense to add some tests for tree node numbering as part of this pull request?

melton-jason · 2023-03-17T13:47:28Z

I did some simple overall testing of tree functions and could not break anything.
Of course, more in-depth and comprehensive testing should be done, but move, merge, synonymization are all operational and full names are correctly built

realVinayak · 2023-03-17T14:21:34Z

Not sure if node numbering tests should be added here, because I'm working on a different method to renumber trees, so it is possible that I might have to refactor the tests. I've identified a few more changes that could be done with merge functionality. Should this branch be merged into production before node numbering fix @maxpatiiuk?

maxpatiiuk · 2023-03-17T15:21:01Z

My recommendation:

Since this is not the final solution to the tree problems, and just a step on the way there, finish all the things we planned to do for trees. What things are planned to do depends on the performance gains for a given improvement vs the complexity/implementation time cost.

After you identified what needs to be implemented and implemented the changes in a production-ready way, you should mark the pull request as being ready for review. This way, the testers don't have to test your code several times after you make incremental changes to tree performance but can do very comprehensive testing after you are done with all changes.

As far as automated tests, yes, we need to have tests at the end of this process. Whether that means you would write tests as you go along, or write them all at the end depends on your preference. The drawback of writing them as you go like you mentioned is that you may end up having to rewrite tests, but the positive is that you can use tests for testing your code as you go along, instead of having to manually test it (i.e, read about test driven development and it's pros/cons).

Also, as a golden rule, try to write tests before testers start testing. This is because if tests have to be written anyway, writing them before testers start testing would prevent wasting their time (in case the tests would help discover the issue before testers do)

grantfitzsimmons · 2023-04-27T15:00:16Z

Works well according to Edinburgh

Triggered by ea0ab95 on branch refs/heads/issue_388

melton-jason

Overall, the code looks good!
I do have some questions strictly regarding 'semantics' and code quality.

There is nothing here to outright prevent the changes getting merged, but I would like some clarification on things before it is merged into production.

specifyweb/specify/tree_extras.py

grantfitzsimmons · 2023-05-10T13:15:02Z

@maxpatiiuk Can we merge this into production for our next release?

specifysoftware · 2023-05-10T13:17:46Z

This pull request has been mentioned on Specify Community Forum. There might be relevant details there:

https://discourse.specifysoftware.org/t/geo-tree-limitations-in-specify-7-and-locality-name-problems/1008/5

realVinayak · 2023-05-10T13:21:52Z

not yet, I need to think about an edge case that comes with merging. Haven't gotten time yet.

maxpatiiuk · 2023-05-10T13:22:31Z

@maxpatiiuk Can we merge this into production for our next release?

looks good. As soon as Vinny fixes the things that Jason found we can merge this

realVinayak · 2023-05-24T05:34:35Z

@grantfitzsimmons good for testing now.

realVinayak · 2023-05-24T05:53:35Z

Important note: Since there is no index for node numbers, SQL will have to look at every row to find matching nodes. We should probably add an index to it. Even considering that, this is faster than previous code, because scanning all rows is cheaper than updating all rows.

There are other places on the backend where tree nodes are found using node numbers - they will also be faster.
@maxpatiiuk going back to our discussion about indexing common relationships, can we also discuss adding indexing here? Another thing to consider - update might become slower because indexing makes update more expensive.

maxpatiiuk · 2023-05-24T12:31:32Z

If we index node numbers, the node numbers update will become slower, correct?
But taxon updates that don't change node numbers won't be affected?
And workbench taxon uploads will be affected because they will cause new indexes and node renumber?
Seems like a tradeoff. You could benchmark the impact on large tree move and large workbench upload VS time saved in query builder and stats

realVinayak · 2023-05-24T13:11:13Z

If we index node numbers, the node numbers update will become slower, correct? But taxon updates that don't change node numbers won't be affected?

Yes they will become slower. Actions other than desynonymize and synomize modify node numbers in someway. But also note that these exact actions also reset fullname for the entire table which we don't do now so depends on the total time saved.

And workbench taxon uploads will be affected because they will cause new indexes and node renumber?

Yes, at the end, renumbering will take more time. Each update statement in that function will take longer.

Seems like a tradeoff. You could benchmark the impact on large tree move and large workbench upload VS time saved in query builder and stats

Well, the stats and queries don't use node numbers right now. The only reason to add indexing for node numbers here is to help the "where nodenumber between number1 and number2 clause (and their variations) which backend uses when opening, closing, and moving intervals, and now selecting nodes to update for fullname. Without node indexing, MySQL currently locks entire table (not a good thing - that's one of the reasons why we occasionally see lock timeout errors when using trees) - but these locks are usually for a small amount of time because these actions usually go fast.

I'll set up some bench mark testing for these to get a comprehensive view

melton-jason

Don't forget to address https://github.com/specify/specify7/pull/3175/files#r1180411187 before this is merged.

specifysoftware · 2023-09-14T23:31:17Z

This pull request has been mentioned on Specify Community Forum. There might be relevant details there:

https://discourse.specifysoftware.org/t/taxon-tree-manipulations-lock-database/1309/2

grantfitzsimmons · 2023-10-02T18:06:07Z

Conclusion: Vinny will make one change today. Will merge after.

realVinayak · 2023-10-02T19:43:22Z

@bronwyncombs @specify/ux-testing
Testing:

Try merging two any nodes together. production should take more time
Try moving nodes from one parent to another. production should be slower.
Make sure full names are correctly constructed. To do this, do the above actions in two different copies of the same database (an older version would work) but on the same nodes. Ideally, do this on species nodes. Make sure that after doing the above actions, the form says the same full name.
Do a workbench upload of at least till species level. The production and this branch should take the same time. View the uploaded nodes in the form, the nodes should have the same full name.
Make new species in the forms. Add two node with the same name, and production and this branch should have the same full name.

realVinayak added 3 commits March 14, 2023 16:07

Restrict fullname update to moved nodes

6e1f2fd

Set fullname for merge after completion

c565e30

Reflect full name set changes on workbench

64063dd

realVinayak requested review from acwhite211, melton-jason, a team and maxpatiiuk March 14, 2023 21:36

maxpatiiuk approved these changes Mar 17, 2023

View reviewed changes

Prevent skipping full name reset

5948c73

realVinayak force-pushed the issue_388 branch from b6ebb2b to 5948c73 Compare March 17, 2023 18:19

maxpatiiuk changed the title ~~Issue 388~~ Make full name updates on tree changes more efficient Mar 18, 2023

This was linked to issues Mar 18, 2023

Disable node-numbering for institutions who only use Specify 7 #2393

Open

Better tree handling #2206

Open

This was referenced Mar 18, 2023

Disable node-numbering for institutions who only use Specify 7 #2393

Open

Better tree handling #2206

Open

Repair Tree function doesn't work in 7.8+ #2776

Closed

grantfitzsimmons added 2 commits April 27, 2023 10:00

Merge branch 'production' into issue_388

ea0ab95

Lint code with ESLint and Prettier

f5025f4

Triggered by ea0ab95 on branch refs/heads/issue_388

melton-jason requested changes Apr 28, 2023

View reviewed changes

specifyweb/specify/tree_extras.py Outdated Show resolved Hide resolved

specifyweb/specify/tree_extras.py Outdated Show resolved Hide resolved

remove redunant full name set

0aae94f

grantfitzsimmons added this to the 7.9.1 milestone Aug 28, 2023

realVinayak requested a review from melton-jason August 28, 2023 17:37

CarolineDenis requested a review from a team August 28, 2023 18:32

melton-jason approved these changes Aug 29, 2023

View reviewed changes

melton-jason requested a review from a team August 29, 2023 20:11

realVinayak added 2 commits October 2, 2023 14:36

Merge remote-tracking branch 'origin/production' into issue_388

cdb96d2

Revert to using debug instead of warning

c711487

realVinayak merged commit b63632a into production Oct 3, 2023
9 checks passed

realVinayak deleted the issue_388 branch October 3, 2023 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make full name updates on tree changes more efficient #3175

Make full name updates on tree changes more efficient #3175

realVinayak commented Mar 14, 2023 •

edited by grantfitzsimmons

maxpatiiuk commented Mar 17, 2023

maxpatiiuk left a comment

melton-jason commented Mar 17, 2023

realVinayak commented Mar 17, 2023 •

edited by maxpatiiuk

maxpatiiuk commented Mar 17, 2023

grantfitzsimmons commented Apr 27, 2023

melton-jason left a comment

grantfitzsimmons commented May 10, 2023 •

edited by maxpatiiuk

specifysoftware commented May 10, 2023

realVinayak commented May 10, 2023

maxpatiiuk commented May 10, 2023 •

edited

realVinayak commented May 24, 2023

realVinayak commented May 24, 2023 •

edited by maxpatiiuk

maxpatiiuk commented May 24, 2023

realVinayak commented May 24, 2023

melton-jason left a comment

specifysoftware commented Sep 14, 2023

grantfitzsimmons commented Oct 2, 2023

realVinayak commented Oct 2, 2023

Make full name updates on tree changes more efficient #3175

Make full name updates on tree changes more efficient #3175

Conversation

realVinayak commented Mar 14, 2023 • edited by grantfitzsimmons

maxpatiiuk commented Mar 17, 2023

maxpatiiuk left a comment

Choose a reason for hiding this comment

melton-jason commented Mar 17, 2023

realVinayak commented Mar 17, 2023 • edited by maxpatiiuk

maxpatiiuk commented Mar 17, 2023

grantfitzsimmons commented Apr 27, 2023

melton-jason left a comment

Choose a reason for hiding this comment

grantfitzsimmons commented May 10, 2023 • edited by maxpatiiuk

specifysoftware commented May 10, 2023

realVinayak commented May 10, 2023

maxpatiiuk commented May 10, 2023 • edited

realVinayak commented May 24, 2023

realVinayak commented May 24, 2023 • edited by maxpatiiuk

maxpatiiuk commented May 24, 2023

realVinayak commented May 24, 2023

melton-jason left a comment

Choose a reason for hiding this comment

specifysoftware commented Sep 14, 2023

grantfitzsimmons commented Oct 2, 2023

realVinayak commented Oct 2, 2023

realVinayak commented Mar 14, 2023 •

edited by grantfitzsimmons

realVinayak commented Mar 17, 2023 •

edited by maxpatiiuk

grantfitzsimmons commented May 10, 2023 •

edited by maxpatiiuk

maxpatiiuk commented May 10, 2023 •

edited

realVinayak commented May 24, 2023 •

edited by maxpatiiuk