Batch Edit, Workbench, Query Builder backend improvements #4929

realVinayak · 2024-05-18T17:28:36Z

Some backend improvements

Workbench

Refactored collection relationships. code was tricky before.
Arbitrary nested to-many support
Makes multiple matches for to-many more smarter
Batch Edit
Apparently also Enable uploading Collection Relationships in the WorkBench when Collection Object is the base table #3750

Query Builder

Generic relationship and fields in trees. At least the backend support for now.

Checklist

Self-review the PR after opening it to make sure the changes look good
and self-explanatory (or properly documented)
Add automated tests
Add relevant issue to release milestone

Testing instructions

Batch-editing

Implementation and design

The current implementation uses workbench, and query builder.
The workbench and batch-edit dataset are differntiated at the user level using a new "isupdate" field in spdataset table. DEV note: There is no difference at the code level -- everything is as general as possible. In fact, the isupdate field is only used at the code level to follow a special rollback procedure
The batch-edit datasets can seen using the batch-edit overlay accessible via the batch-edit menu item. Currently, the only way to create a new dataset is via the query builder interface.
To make a batch-edit dataset, go to the query builder, and add the relevant fields to the query. Some fields and relationships are not supported. Nested-to-many, for instance, are supported in workbench, but aren't in the batch-edit. Special fields like nodenumber and highestchildnodenumber, fullname field in tree are also not supported.
Other than relationships mentioned above, every field is supported. If an unsupported field like nodenumber is added, it is rendered as readonly. However, you cannot make nested-to-manys visible (this is different than being able to map it) --- you can map nested-to-manys, and even filter on them. As long as they are hidden, it'll not block. You can also arbitrarily add formatted and aggregated fields (they are unmapped, and are ignored)

Testing Instructions

Make a query with columns in the base table, and select relationships to edit. In the current, I am using CO with the base table. There are 4 different types of relationships, in general.
- To-one dependent (for ex. collectionobjectattribute),
- To-one independent (for ex. Cataloger, CollectingEvent [when not embedded])
- To-many dependent (for ex. determinations, preparations),
- To-many independent (for ex. None for CO as the base table)
Fields

Modifying any field is possible, other than nodenumber, fullname, and highestchildnodenumber. If some fields get updated, only those fields are highlighted
To-one dependent (for ex. collectionobjectattribute)

These relationships get directly updated, and are not matched. If the to-one is not in the db, it'll create one.
This also includes collectingevent when embedded.

Test cases to consider:
- When mapped, the record is directly updated.
- When mapped, if the record is not present, it'll be created, if not null values are present.
- If the record previously had values, and the values are removed (making the cells completely empty), the to-one dependent record will be deleted. Since it is possible that there may be other fields in the database (but not in the query), we may accidentally delete the record. Eample: user selected collectionobject -> collectionObjectAttribute -> remarks. And say they set remarks to empty in the spreadsheet, it is is possible that integer1 field in collectionObjectAttribute may have some value. To prevent accidental deletion, by default, we look at all the fields in the database for that record (other than system fields), to determine whether we can delete the record or not. This behaviour is controlled by a remote preference. (described in a later section):
To-many dependent (for ex. determinations)

Same as to-one dependent. These relationships get directly updated. If the corresponding record is not present, a new one gets created.

Test cases to consider:
- When mapped, the record is directly updated.
- When mapped, if the record is not present, it'll be created, if not null values are present.
- If the cell data is removed, and if every other field is empty in the database (can be disabled via a preference), the record will be deleted.
To-one independent (for ex. cataloger)

These relationships get matched, and uploaded (if match is not found). During upload, it performs a clone of the record (cloning all the non-unique fields, and dependents). The clone takes into account relationships also mapped. That is, if agent needs to be cloned, and you have mapped agentspecialty, it'll take the agentspecialty mapped (rather than cloning previous's agentspecialty).

Test cases to consider:
- Start from a collectionobject with a cataloger, and map some fields. Change some of the values (like, say, lastname and firstname) to of agents that are present in the db. verify that agent gets matched. Note that the match can be performed with just the visible fields, or can also include fields in the database, not included in the query. This is controlled via a preference. By default, to be cautious in matching, it uses just the fields visible in the query.
- If it is unable to match, it'll clone the existing agent, with data from the sheet. Make an agent with addresses/specialties/variants. Make sure the workbench is able to clone the agent correctly, and if you've provided some dependents in the mapping, it takes it.
- Similar to workbench, you could customize the match behaviour by changing the matching options (like "never ignore", "ignore when blank", and "always ignore")
To-many independent

Same as to-many dependent. The only difference is that we always perform an update (we don't delete these). If a mapped record is not present, it'll create one, without any matching.

Test cases to consider:
- Make collection objects, and assign them collectingevent. Do a query using collectingevent as the base table, and add fields of the CO table. Verify that resetting all fields does not delete the collection object (you'll also need to disable the preference that says to look at all fields for null checks)
- If a record is not present, it'll create one, if there is a non-empty field.
Trees

There are two different routes to perform tree updates.
- Workbench method:
  
  If you want to modify a specific rank, or say reassign species for determination, you'd want to add a specific rank in the query. In this case, it always matches and uploads (and possibly clone), so we don't have updates.
  In the query builder, it'll enforce that you select complete branch of the tree. That is, if your query contains rank "species", and "genus", it'll demand you to add ranks all the way down from "genus" to "species". If used part of a relationship, it'll demand going the way down from "genus" to the lowest rank in the tree.
- Update method:
  
  If in the query builder, there is no visible tree rank field, it allows direct modifications (and, thus, updates) to the tree table. This will be useful if you want to, say, update remarks for ones that match name "ploia"
In both of the above methods, fullname, nodenumber, highestchildnodenumber is completely readonly.

Results

There are 4 new different type of results;

NoChange

Reported when the record was meant to updated, but no change occurred. That is, all the values from the db were the same. This is not visible to the user.

Updated

Reported when the record's fields were changed. This does not consider relationships (they are reported with different result)

Deleted

Reported when a record is deleted. Happens when a dependent's cells are all empty.

MatchedAndChanged

Reported when a to-one independent was matched to another record, different than the current one.

The side panel also shows the results per table, for different categories.

Preferences

There are three different preference options.

Remote Preferenences (2)
Defer For Match
Set by sp7.batchEdit.deferForMatch.This preference controls whether database fields are included for matching or not. Defaults to false.
Defer For Null
Set by sp7.batchEdit.deferForNull.This preference controls whether database fields are included for determining if the record is null or not. For dependents, null records are deleted, so this preference is used to control the caution batch-edit follows
User Preferenences (1)
Number of query rows

Determines how many number of query results are used for batch-edit. Defaults to 5000.

Rollbacks

Rollbacks are complicated to perform. In the current design, whenever user creates a batch-edit dataset, via the query builder, it makes two datasets. User can only see one of them. The second is a "backer" of the first, and contains a FK to the first (so we can find backer of a dataset later). When rollback is requested, for every row in the main one, we find the original row in the backer, and perform the regular batch-edit update on it. Essentially, it applies original snapshot.

This is highly experimental, so it is recommended to always take a backup of the db, but this should work in a good amount of cases.

Misc

Queries from record set are supported.

Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 1.2.11 to 1.3.0. - [Release notes](https://github.com/sqlalchemy/sqlalchemy/releases) - [Changelog](https://github.com/sqlalchemy/sqlalchemy/blob/master/CHANGES) - [Commits](https://github.com/sqlalchemy/sqlalchemy/commits) Signed-off-by: dependabot[bot] <support@github.com>

…nship in tree ranks

-- will continue adding

It was a bad idea originally, the first time.

realVinayak · 2024-09-23T19:15:43Z

@emenslin @specify/ux-testing

Implemented the UI for batch-edit prefs, stored as part of the upload plan. Adds a default one. The previous upload plans are still valid (backwards compatible). Thoughts?

Triggered by 5efecac on branch refs/heads/wb_improvements

emenslin · 2024-09-24T19:17:18Z

@emenslin @specify/ux-testing

Implemented the UI for batch-edit prefs, stored as part of the upload plan. Adds a default one. The previous upload plans are still valid (backwards compatible). Thoughts?

I think that the wording can probably be improved but I like this implementation a lot more than having to edit remote prefs. I think the first check is fairly clear but the second one is confusing and I don't know if I would've understood what it meant if you hadn't explained it to me before.

grantfitzsimmons

From @acbentley:

Have been playing around some with batch editing and it looks and works great. However one thing that bothers me is that after you run a query and click batch edit, it redirects you to another page and the original query disappears. So, if you have an issue or want to roll back your changes, the only way to view the results of the query is to reconstruct the query again which seems like a time waste. Is there any way that the original query tab could remain open so that you could go back to it if needed? Maybe there is some reason why this is not possible or inadvisable that I am not thinking of?

emenslin

I didn't get a chance to fully test it but here's some things I noticed

Some localities show up as 'New Cells' even when they have not been edited

chrome_o0avzM4QEw.mp4

Also I'm not 100% sure what happened here as I forgot to save the query but this text doesn't really make sense, if there's a good reason for it I'm not opposed to it being there, I'm just confused. Here's a link to the data set if it's any help https://kufish212-wbimprovements.test.specifysystems.org/specify/workbench/51/

Upload plan does not need be visible as there is no use for it in batch edit, however, it might not be worth removing
In the data mapper the fields should look like they're read only. In batch edit the data mapper is only really used for matching purposes so I think everything else should appear like it is read only since you can't do anything anyway.

pashiav

@emenslin: DB cuic22024 has an error when clicking on Queries because collectionobjectgrouptype is table 1018 (db tested on COGs Update Schema Config to include new geo fields and COG picklists #5257 ):
- https://cuic22024-wbimprovements.test.specifysystems.org/specify/overlay/queries/
- Specify 7 Crash Report - 2024-10-24T16_46_16.896Z.txt

Other than that, I found the same issues mentioned by previous reviews.

From #4929 (comment):

I think that the wording can probably be improved but I like this implementation a lot more than having to edit remote prefs.

I agree, it was confusing when I first saw it. @specify/ux-testing we should discuss the best wording options to explain it clearly and intuitively for users.

combs-a

Still going through this but wanted to leave a comment; if there's a true/false field like Current in Determinations set, it's a bit required to have the "ignore invisible fields for determining whether a record is empty" on to delete the record. If you add in Current to try and delete it by emptying the field it's flagged as an error since the Schema Config requires it. Not a big thing, just wondering if that's ideal to need that specific preference enabled for that function?

I'm testing in sdnhm_herps so if this isn't a thing for other DBs let me know.

Also for the wording, I'd probably do something like this on the second one:

Use only visible fields for determining if a record is 'empty'

realVinayak · 2024-12-18T22:24:10Z

specifyweb/stored_queries/queryfieldspec.py

@@ -177,33 +181,33 @@ def from_stringid(cls, stringid, is_relation):
        extracted_fieldname, date_part = extract_date_part(field_name)
        field = node.get_field(extracted_fieldname, strict=False)



Yikes, you're quite brave for merging production in this branch. Most of the code looks not too bad. Except this one, where I thought some insider info will be necessary.

I know getting rid of TreeRankQuery seems like an easy way, but trust me, TreeRankQuery makes batch-edit very simple. Don't take my word for it. Go to the sibling file batch_edit.py, and you'll see just how minimal changes were necessary to support dataset construction when trees are selected. (IIRC, there are like just 3 places, quite isolated from an abstraction perspective).

Here's the idea behind TreeRankQuery. Basically, each rank is considered as a relationship from tree to itself. So, Kingdom is a to-one relationship from Taxon to Taxon. So, when user selects Collectionobject -> Determination -> Taxon (species, fullname), join path becomes

determination, taxon, species, fullname

Pros

No need for tree_rank and tree_field fields.

It's quite easy to test if join path ends with relationship (just check the last field is relationship or not). You'd previously also have to think about tree_rank and tree_field.

The code, before this merge, already constructs correct queries for something like determination, taxon, species, createdby, firstname, effectively allowing relationships from tree ranks. Try doing that with tree_rank and tree_field!

Most importantly (/s), it makes batch-edit quite simple. For every row, batch-edit dataset construction needs to look at what the IDs are. When you have TreeRankQuery, tree ranks are effectively just like any other relationship.

Cons:

Hard to merge from production, which is a valid reason

Thanks! I was reluctant to merge prod here initially but the alternative seemed to be a bigger headache. I'll look into incorporating TreeRankQuery in the PR that follows #5417

dependabot bot and others added 8 commits February 12, 2023 17:29

Merge branch 'production' into dependabot/pip/sqlalchemy-1.3.0

cff2cb2

Begin refactoring deferred scope

dcd9889

get rid of deferred scoping

23c6050

Begin working on unit tests

1e2e7ff

fixup tests

1ce894b

Add (force?) types

ee27d51

Simplify types

9264668

realVinayak mentioned this pull request May 18, 2024

Improve CollectionRelationship uploads in the WorkBench #3240

Merged

Make backend mostly compatible with handling generic fields + relatio…

03efb8e

…nship in tree ranks

realVinayak changed the title ~~Wb improvements~~ Workbench, Query Builder backend improvements May 21, 2024

realVinayak mentioned this pull request May 21, 2024

Add group number support in Taxon queries #4724

Merged

6 tasks

realVinayak linked an issue May 21, 2024 that may be closed by this pull request

Add support for querying on any tree table field from a chosen rank #4697

Closed

2 tasks

realVinayak added 9 commits May 20, 2024 22:01

don't set field to str if no tree match

8e404a6

Add type + fix id field not misunderstood as formatted

bccb657

Merge branch 'production' into wb_improvements

7d4157d

Merge remote-tracking branch 'origin/production' into wb_improvements

eab69a5

Add experimental support for arbitrary nested-to-manys in workbench

6821a88

Fix remaining unit test

6ceedbb

Begin adding unit tests

19a8dcf

-- will continue adding

fix mypy errors

f2b2e46

Add matching test

aacd880

realVinayak mentioned this pull request Jun 19, 2024

Add copy/pasting detection #3149

Open

realVinayak linked an issue Jun 19, 2024 that may be closed by this pull request

Support nested -to-many relationships in WorkBench #2331

Open

realVinayak and others added 6 commits June 19, 2024 12:55

Merge remote-tracking branch 'origin/production' into wb_improvements

b5300dc

Merge branch 'production' into dependabot/pip/sqlalchemy-1.3.0

3f1d33a

Merge remote-tracking branch 'origin/production' into wb_improvements

3e17437

Handle multiple matches with one-to-ones more smartly

0663155

Remove the need of passing collection to bind

2f32a9b

It was a bad idea originally, the first time.

Test improvements

41c65e3

realVinayak added 5 commits September 16, 2024 14:21

(batch-edit): add permission + make hier RO

3bed2f5

(batch-edit): add doc urls

ab89d28

(batch-edit): move pref to upload plan

f050869

(batch-edit): make upload plan adjustments

d9b19a7

(batch-edit): adjust tests

5efecac

Lint code with ESLint and Prettier

2d86ba4

Triggered by 5efecac on branch refs/heads/wb_improvements

grantfitzsimmons reviewed Sep 30, 2024

View reviewed changes

realVinayak mentioned this pull request Oct 3, 2024

Cannot open Workbench on some test panel DBs #5308

Closed

emenslin requested changes Oct 10, 2024

View reviewed changes

pashiav requested changes Oct 24, 2024

View reviewed changes

combs-a reviewed Oct 28, 2024

View reviewed changes

combs-a requested a review from a team October 28, 2024 17:40

realVinayak closed this Nov 8, 2024

realVinayak reopened this Nov 8, 2024

realVinayak marked this pull request as draft November 8, 2024 08:07

combs-a mentioned this pull request Nov 14, 2024

Cannot delete certain data sets and no error thrown when trying to delete from WB itself #5395

Closed

This was referenced Nov 25, 2024

Batch Edit: Add support for editing simple fields #5413

Open

Batch Edit: Add support for editing to-one dependent relations #5414

Closed

Batch Edit: Support for editing basic fields #5417

Open

sharadsw added 4 commits December 18, 2024 10:49

Merge remote-tracking branch 'origin/production' into wb_improvements

62f0762

Merge remote-tracking branch 'origin/production' into wb_improvements

2cfcc18

Merge remote-tracking branch 'origin/production' into wb_improvements

77b3133

query field type fixes

95111dd

realVinayak commented Dec 18, 2024

View reviewed changes

melton-jason mentioned this pull request Jan 17, 2025

Allow to upload COG in WB #6073

Merged

13 tasks

This was referenced Jan 21, 2025

Batch Edit: Batch edit for relationships #6126

Open

Batch Edit: Batch edit for (multiple) trees #6127

Open

Batch Edit: Ensure rollback works as expected #6128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Edit, Workbench, Query Builder backend improvements #4929

Batch Edit, Workbench, Query Builder backend improvements #4929

realVinayak commented May 18, 2024 •

edited

Loading

realVinayak commented Sep 23, 2024

emenslin commented Sep 24, 2024

grantfitzsimmons left a comment

emenslin left a comment

pashiav left a comment

combs-a left a comment •

edited

Loading

realVinayak Dec 18, 2024

sharadsw Dec 20, 2024

		@@ -177,33 +181,33 @@ def from_stringid(cls, stringid, is_relation):
		extracted_fieldname, date_part = extract_date_part(field_name)
		field = node.get_field(extracted_fieldname, strict=False)

Batch Edit, Workbench, Query Builder backend improvements #4929

Are you sure you want to change the base?

Batch Edit, Workbench, Query Builder backend improvements #4929

Conversation

realVinayak commented May 18, 2024 • edited Loading

Checklist

Testing instructions

Batch-editing

Implementation and design

Testing Instructions

Fields

To-one dependent (for ex. collectionobjectattribute)

To-many dependent (for ex. determinations)

To-one independent (for ex. cataloger)

To-many independent

Trees

Results

NoChange

Updated

Deleted

MatchedAndChanged

Preferences

Remote Preferenences (2)

User Preferenences (1)

Rollbacks

Misc

realVinayak commented Sep 23, 2024

emenslin commented Sep 24, 2024

grantfitzsimmons left a comment

Choose a reason for hiding this comment

emenslin left a comment

Choose a reason for hiding this comment

pashiav left a comment

Choose a reason for hiding this comment

combs-a left a comment • edited Loading

Choose a reason for hiding this comment

realVinayak Dec 18, 2024

Choose a reason for hiding this comment

sharadsw Dec 20, 2024

Choose a reason for hiding this comment

realVinayak commented May 18, 2024 •

edited

Loading

combs-a left a comment •

edited

Loading