-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Edit, Workbench, Query Builder backend improvements #4929
base: production
Are you sure you want to change the base?
Conversation
Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 1.2.11 to 1.3.0. - [Release notes](https://github.com/sqlalchemy/sqlalchemy/releases) - [Changelog](https://github.com/sqlalchemy/sqlalchemy/blob/master/CHANGES) - [Commits](https://github.com/sqlalchemy/sqlalchemy/commits) Signed-off-by: dependabot[bot] <support@github.com>
…nship in tree ranks
-- will continue adding
It was a bad idea originally, the first time.
@emenslin @specify/ux-testing Implemented the UI for batch-edit prefs, stored as part of the upload plan. Adds a default one. The previous upload plans are still valid (backwards compatible). Thoughts? |
Triggered by 5efecac on branch refs/heads/wb_improvements
I think that the wording can probably be improved but I like this implementation a lot more than having to edit remote prefs. I think the first check is fairly clear but the second one is confusing and I don't know if I would've understood what it meant if you hadn't explained it to me before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From @acbentley:
Have been playing around some with batch editing and it looks and works great. However one thing that bothers me is that after you run a query and click batch edit, it redirects you to another page and the original query disappears. So, if you have an issue or want to roll back your changes, the only way to view the results of the query is to reconstruct the query again which seems like a time waste. Is there any way that the original query tab could remain open so that you could go back to it if needed? Maybe there is some reason why this is not possible or inadvisable that I am not thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get a chance to fully test it but here's some things I noticed
- Some localities show up as 'New Cells' even when they have not been edited
chrome_o0avzM4QEw.mp4
- Also I'm not 100% sure what happened here as I forgot to save the query but this text doesn't really make sense, if there's a good reason for it I'm not opposed to it being there, I'm just confused. Here's a link to the data set if it's any help https://kufish212-wbimprovements.test.specifysystems.org/specify/workbench/51/
- Upload plan does not need be visible as there is no use for it in batch edit, however, it might not be worth removing
- In the data mapper the fields should look like they're read only. In batch edit the data mapper is only really used for matching purposes so I think everything else should appear like it is read only since you can't do anything anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- @emenslin: DB
cuic22024
has an error when clicking on Queries because collectionobjectgrouptype is table 1018 (db tested on COGs Update Schema Config to include new geo fields and COG picklists #5257 ):
Other than that, I found the same issues mentioned by previous reviews.
From #4929 (comment):
I think that the wording can probably be improved but I like this implementation a lot more than having to edit remote prefs.
I agree, it was confusing when I first saw it. @specify/ux-testing we should discuss the best wording options to explain it clearly and intuitively for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still going through this but wanted to leave a comment; if there's a true/false field like Current in Determinations set, it's a bit required to have the "ignore invisible fields for determining whether a record is empty" on to delete the record. If you add in Current to try and delete it by emptying the field it's flagged as an error since the Schema Config requires it. Not a big thing, just wondering if that's ideal to need that specific preference enabled for that function?
I'm testing in sdnhm_herps
so if this isn't a thing for other DBs let me know.
Also for the wording, I'd probably do something like this on the second one:
- Use only visible fields for determining if a record is 'empty'
@@ -177,33 +181,33 @@ def from_stringid(cls, stringid, is_relation): | |||
extracted_fieldname, date_part = extract_date_part(field_name) | |||
field = node.get_field(extracted_fieldname, strict=False) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, you're quite brave for merging production in this branch. Most of the code looks not too bad. Except this one, where I thought some insider info will be necessary.
I know getting rid of TreeRankQuery seems like an easy way, but trust me, TreeRankQuery makes batch-edit very simple. Don't take my word for it. Go to the sibling file batch_edit.py, and you'll see just how minimal changes were necessary to support dataset construction when trees are selected. (IIRC, there are like just 3 places, quite isolated from an abstraction perspective).
Here's the idea behind TreeRankQuery. Basically, each rank is considered as a relationship from tree to itself. So, Kingdom is a to-one relationship from Taxon to Taxon. So, when user selects Collectionobject -> Determination -> Taxon (species, fullname), join path becomes
determination, taxon, species, fullname
Pros
- No need for tree_rank and tree_field fields.
- It's quite easy to test if join path ends with relationship (just check the last field is relationship or not). You'd previously also have to think about tree_rank and tree_field.
- The code, before this merge, already constructs correct queries for something like
determination, taxon, species, createdby, firstname
, effectively allowing relationships from tree ranks. Try doing that with tree_rank and tree_field! - Most importantly (/s), it makes batch-edit quite simple. For every row, batch-edit dataset construction needs to look at what the IDs are. When you have TreeRankQuery, tree ranks are effectively just like any other relationship.
Cons:
- Hard to merge from production, which is a valid reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I was reluctant to merge prod here initially but the alternative seemed to be a bigger headache. I'll look into incorporating TreeRankQuery in the PR that follows #5417
Some backend improvements
Workbench
Query Builder
Checklist
and self-explanatory (or properly documented)
Testing instructions
Batch-editing
Implementation and design
Testing Instructions
Make a query with columns in the base table, and select relationships to edit. In the current, I am using CO with the base table. There are 4 different types of relationships, in general.
Fields
Modifying any field is possible, other than nodenumber, fullname, and highestchildnodenumber. If some fields get updated, only those fields are highlighted
To-one dependent (for ex. collectionobjectattribute)
These relationships get directly updated, and are not matched. If the to-one is not in the db, it'll create one.
This also includes collectingevent when embedded.
Test cases to consider:
To-many dependent (for ex. determinations)
Same as to-one dependent. These relationships get directly updated. If the corresponding record is not present, a new one gets created.
Test cases to consider:
To-one independent (for ex. cataloger)
These relationships get matched, and uploaded (if match is not found). During upload, it performs a clone of the record (cloning all the non-unique fields, and dependents). The clone takes into account relationships also mapped. That is, if agent needs to be cloned, and you have mapped agentspecialty, it'll take the agentspecialty mapped (rather than cloning previous's agentspecialty).
Test cases to consider:
To-many independent
Same as to-many dependent. The only difference is that we always perform an update (we don't delete these). If a mapped record is not present, it'll create one, without any matching.
Test cases to consider:
Trees
There are two different routes to perform tree updates.
Workbench method:
If you want to modify a specific rank, or say reassign species for determination, you'd want to add a specific rank in the query. In this case, it always matches and uploads (and possibly clone), so we don't have updates.
In the query builder, it'll enforce that you select complete branch of the tree. That is, if your query contains rank "species", and "genus", it'll demand you to add ranks all the way down from "genus" to "species". If used part of a relationship, it'll demand going the way down from "genus" to the lowest rank in the tree.
Update method:
If in the query builder, there is no visible tree rank field, it allows direct modifications (and, thus, updates) to the tree table. This will be useful if you want to, say, update remarks for ones that match name "ploia"
In both of the above methods, fullname, nodenumber, highestchildnodenumber is completely readonly.
Results
There are 4 new different type of results;
NoChange
Reported when the record was meant to updated, but no change occurred. That is, all the values from the db were the same. This is not visible to the user.
Updated
Reported when the record's fields were changed. This does not consider relationships (they are reported with different result)
Deleted
Reported when a record is deleted. Happens when a dependent's cells are all empty.
MatchedAndChanged
Reported when a to-one independent was matched to another record, different than the current one.
Preferences
There are three different preference options.
Remote Preferenences (2)
Defer For Match
Set by
sp7.batchEdit.deferForMatch
.This preference controls whether database fields are included for matching or not. Defaults to false.Defer For Null
Set by
sp7.batchEdit.deferForNull
.This preference controls whether database fields are included for determining if the record is null or not. For dependents, null records are deleted, so this preference is used to control the caution batch-edit followsUser Preferenences (1)
Number of query rows
Determines how many number of query results are used for batch-edit. Defaults to 5000.
Rollbacks
Rollbacks are complicated to perform. In the current design, whenever user creates a batch-edit dataset, via the query builder, it makes two datasets. User can only see one of them. The second is a "backer" of the first, and contains a FK to the first (so we can find backer of a dataset later). When rollback is requested, for every row in the main one, we find the original row in the backer, and perform the regular batch-edit update on it. Essentially, it applies original snapshot.
This is highly experimental, so it is recommended to always take a backup of the db, but this should work in a good amount of cases.
Misc