Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Edit, Workbench, Query Builder backend improvements #4929

Draft
wants to merge 77 commits into
base: production
Choose a base branch
from

Conversation

realVinayak
Copy link
Contributor

@realVinayak realVinayak commented May 18, 2024

Some backend improvements

Workbench

Query Builder

  • Generic relationship and fields in trees. At least the backend support for now.

Checklist

  • Self-review the PR after opening it to make sure the changes look good
    and self-explanatory (or properly documented)
  • Add automated tests
  • Add relevant issue to release milestone

Testing instructions

Batch-editing

Implementation and design

  1. The current implementation uses workbench, and query builder.
  2. The workbench and batch-edit dataset are differntiated at the user level using a new "isupdate" field in spdataset table. DEV note: There is no difference at the code level -- everything is as general as possible. In fact, the isupdate field is only used at the code level to follow a special rollback procedure
  3. The batch-edit datasets can seen using the batch-edit overlay accessible via the batch-edit menu item. Currently, the only way to create a new dataset is via the query builder interface.
  4. To make a batch-edit dataset, go to the query builder, and add the relevant fields to the query. Some fields and relationships are not supported. Nested-to-many, for instance, are supported in workbench, but aren't in the batch-edit. Special fields like nodenumber and highestchildnodenumber, fullname field in tree are also not supported.
  5. Other than relationships mentioned above, every field is supported. If an unsupported field like nodenumber is added, it is rendered as readonly. However, you cannot make nested-to-manys visible (this is different than being able to map it) --- you can map nested-to-manys, and even filter on them. As long as they are hidden, it'll not block. You can also arbitrarily add formatted and aggregated fields (they are unmapped, and are ignored)

Testing Instructions

  • Make a query with columns in the base table, and select relationships to edit. In the current, I am using CO with the base table. There are 4 different types of relationships, in general.

    • To-one dependent (for ex. collectionobjectattribute),
    • To-one independent (for ex. Cataloger, CollectingEvent [when not embedded])
    • To-many dependent (for ex. determinations, preparations),
    • To-many independent (for ex. None for CO as the base table)
  • Fields

    Modifying any field is possible, other than nodenumber, fullname, and highestchildnodenumber. If some fields get updated, only those fields are highlighted

  • To-one dependent (for ex. collectionobjectattribute)

    These relationships get directly updated, and are not matched. If the to-one is not in the db, it'll create one.
    This also includes collectingevent when embedded.

    Test cases to consider:

    • When mapped, the record is directly updated.
    • When mapped, if the record is not present, it'll be created, if not null values are present.
    • If the record previously had values, and the values are removed (making the cells completely empty), the to-one dependent record will be deleted. Since it is possible that there may be other fields in the database (but not in the query), we may accidentally delete the record. Eample: user selected collectionobject -> collectionObjectAttribute -> remarks. And say they set remarks to empty in the spreadsheet, it is is possible that integer1 field in collectionObjectAttribute may have some value. To prevent accidental deletion, by default, we look at all the fields in the database for that record (other than system fields), to determine whether we can delete the record or not. This behaviour is controlled by a remote preference. (described in a later section):
  • To-many dependent (for ex. determinations)

    Same as to-one dependent. These relationships get directly updated. If the corresponding record is not present, a new one gets created.

    Test cases to consider:

    • When mapped, the record is directly updated.
    • When mapped, if the record is not present, it'll be created, if not null values are present.
    • If the cell data is removed, and if every other field is empty in the database (can be disabled via a preference), the record will be deleted.
  • To-one independent (for ex. cataloger)

    These relationships get matched, and uploaded (if match is not found). During upload, it performs a clone of the record (cloning all the non-unique fields, and dependents). The clone takes into account relationships also mapped. That is, if agent needs to be cloned, and you have mapped agentspecialty, it'll take the agentspecialty mapped (rather than cloning previous's agentspecialty).

    Test cases to consider:

    • Start from a collectionobject with a cataloger, and map some fields. Change some of the values (like, say, lastname and firstname) to of agents that are present in the db. verify that agent gets matched. Note that the match can be performed with just the visible fields, or can also include fields in the database, not included in the query. This is controlled via a preference. By default, to be cautious in matching, it uses just the fields visible in the query.
    • If it is unable to match, it'll clone the existing agent, with data from the sheet. Make an agent with addresses/specialties/variants. Make sure the workbench is able to clone the agent correctly, and if you've provided some dependents in the mapping, it takes it.
    • Similar to workbench, you could customize the match behaviour by changing the matching options (like "never ignore", "ignore when blank", and "always ignore")
  • To-many independent

    Same as to-many dependent. The only difference is that we always perform an update (we don't delete these). If a mapped record is not present, it'll create one, without any matching.

    Test cases to consider:

    • Make collection objects, and assign them collectingevent. Do a query using collectingevent as the base table, and add fields of the CO table. Verify that resetting all fields does not delete the collection object (you'll also need to disable the preference that says to look at all fields for null checks)
    • If a record is not present, it'll create one, if there is a non-empty field.
  • Trees

    There are two different routes to perform tree updates.

    • Workbench method:

      If you want to modify a specific rank, or say reassign species for determination, you'd want to add a specific rank in the query. In this case, it always matches and uploads (and possibly clone), so we don't have updates.
      In the query builder, it'll enforce that you select complete branch of the tree. That is, if your query contains rank "species", and "genus", it'll demand you to add ranks all the way down from "genus" to "species". If used part of a relationship, it'll demand going the way down from "genus" to the lowest rank in the tree.

    • Update method:

      If in the query builder, there is no visible tree rank field, it allows direct modifications (and, thus, updates) to the tree table. This will be useful if you want to, say, update remarks for ones that match name "ploia"

    In both of the above methods, fullname, nodenumber, highestchildnodenumber is completely readonly.

Results

There are 4 new different type of results;

  • NoChange

Reported when the record was meant to updated, but no change occurred. That is, all the values from the db were the same. This is not visible to the user.

  • Updated

Reported when the record's fields were changed. This does not consider relationships (they are reported with different result)

  • Deleted

Reported when a record is deleted. Happens when a dependent's cells are all empty.

  • MatchedAndChanged

Reported when a to-one independent was matched to another record, different than the current one.

  • The side panel also shows the results per table, for different categories.

Preferences

There are three different preference options.

  • Remote Preferenences (2)

  • Defer For Match
    Set by sp7.batchEdit.deferForMatch.This preference controls whether database fields are included for matching or not. Defaults to false.

  • Defer For Null
    Set by sp7.batchEdit.deferForNull.This preference controls whether database fields are included for determining if the record is null or not. For dependents, null records are deleted, so this preference is used to control the caution batch-edit follows

  • User Preferenences (1)

  • Number of query rows

Determines how many number of query results are used for batch-edit. Defaults to 5000.

Rollbacks

Rollbacks are complicated to perform. In the current design, whenever user creates a batch-edit dataset, via the query builder, it makes two datasets. User can only see one of them. The second is a "backer" of the first, and contains a FK to the first (so we can find backer of a dataset later). When rollback is requested, for every row in the main one, we find the original row in the backer, and perform the regular batch-edit update on it. Essentially, it applies original snapshot.

This is highly experimental, so it is recommended to always take a backup of the db, but this should work in a good amount of cases.

Misc

  • Queries from record set are supported.

@realVinayak realVinayak changed the title Wb improvements Workbench, Query Builder backend improvements May 21, 2024
@realVinayak realVinayak linked an issue May 21, 2024 that may be closed by this pull request
2 tasks
@realVinayak realVinayak linked an issue Jun 19, 2024 that may be closed by this pull request
@realVinayak
Copy link
Contributor Author

@emenslin @specify/ux-testing

Implemented the UI for batch-edit prefs, stored as part of the upload plan. Adds a default one. The previous upload plans are still valid (backwards compatible). Thoughts?
image

Triggered by 5efecac on branch refs/heads/wb_improvements
@emenslin
Copy link
Collaborator

@emenslin @specify/ux-testing

Implemented the UI for batch-edit prefs, stored as part of the upload plan. Adds a default one. The previous upload plans are still valid (backwards compatible). Thoughts? image

I think that the wording can probably be improved but I like this implementation a lot more than having to edit remote prefs. I think the first check is fairly clear but the second one is confusing and I don't know if I would've understood what it meant if you hadn't explained it to me before.

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @acbentley:

Have been playing around some with batch editing and it looks and works great. However one thing that bothers me is that after you run a query and click batch edit, it redirects you to another page and the original query disappears. So, if you have an issue or want to roll back your changes, the only way to view the results of the query is to reconstruct the query again which seems like a time waste. Is there any way that the original query tab could remain open so that you could go back to it if needed? Maybe there is some reason why this is not possible or inadvisable that I am not thinking of?

Copy link
Collaborator

@emenslin emenslin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get a chance to fully test it but here's some things I noticed

  • Some localities show up as 'New Cells' even when they have not been edited
chrome_o0avzM4QEw.mp4

Screenshot 2024-10-10 115421

  • Upload plan does not need be visible as there is no use for it in batch edit, however, it might not be worth removing
  • In the data mapper the fields should look like they're read only. In batch edit the data mapper is only really used for matching purposes so I think everything else should appear like it is read only since you can't do anything anyway.

Copy link
Contributor

@pashiav pashiav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, I found the same issues mentioned by previous reviews.

From #4929 (comment):

I think that the wording can probably be improved but I like this implementation a lot more than having to edit remote prefs.

I agree, it was confusing when I first saw it. @specify/ux-testing we should discuss the best wording options to explain it clearly and intuitively for users.

Copy link
Collaborator

@combs-a combs-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still going through this but wanted to leave a comment; if there's a true/false field like Current in Determinations set, it's a bit required to have the "ignore invisible fields for determining whether a record is empty" on to delete the record. If you add in Current to try and delete it by emptying the field it's flagged as an error since the Schema Config requires it. Not a big thing, just wondering if that's ideal to need that specific preference enabled for that function?

I'm testing in sdnhm_herps so if this isn't a thing for other DBs let me know.

Also for the wording, I'd probably do something like this on the second one:

  • Use only visible fields for determining if a record is 'empty'

@@ -177,33 +181,33 @@ def from_stringid(cls, stringid, is_relation):
extracted_fieldname, date_part = extract_date_part(field_name)
field = node.get_field(extracted_fieldname, strict=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, you're quite brave for merging production in this branch. Most of the code looks not too bad. Except this one, where I thought some insider info will be necessary.

I know getting rid of TreeRankQuery seems like an easy way, but trust me, TreeRankQuery makes batch-edit very simple. Don't take my word for it. Go to the sibling file batch_edit.py, and you'll see just how minimal changes were necessary to support dataset construction when trees are selected. (IIRC, there are like just 3 places, quite isolated from an abstraction perspective).

Here's the idea behind TreeRankQuery. Basically, each rank is considered as a relationship from tree to itself. So, Kingdom is a to-one relationship from Taxon to Taxon. So, when user selects Collectionobject -> Determination -> Taxon (species, fullname), join path becomes

determination, taxon, species, fullname

Pros

  1. No need for tree_rank and tree_field fields.
  2. It's quite easy to test if join path ends with relationship (just check the last field is relationship or not). You'd previously also have to think about tree_rank and tree_field.
  3. The code, before this merge, already constructs correct queries for something like determination, taxon, species, createdby, firstname, effectively allowing relationships from tree ranks. Try doing that with tree_rank and tree_field!
  4. Most importantly (/s), it makes batch-edit quite simple. For every row, batch-edit dataset construction needs to look at what the IDs are. When you have TreeRankQuery, tree ranks are effectively just like any other relationship.

Cons:

  1. Hard to merge from production, which is a valid reason

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I was reluctant to merge prod here initially but the alternative seemed to be a bigger headache. I'll look into incorporating TreeRankQuery in the PR that follows #5417

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
10 participants