Fix total count #96

Merged
merged 14 commits into from Jan 11, 2017

Projects

None yet

3 participants

@JasonHoffmann
Contributor

This mostly addresses #92. There are basically two issues.

The first is that if data is changed by the batch process (like deletion or data mutation) then the total_num_results number will constantly shift. To address that the initial total number is passed back and forth between the client and the server so there is always a single source of truth on that. In each step, that number is compared against the query number, and the higher of the two is set as the total. That difference is also recorded in results_changed and then used to calculate a proper offest in the calculate_offset function.

The second is if data is added while the process is running (like a new post getting created right in the middle). I added a final check in the run function to definitely make sure that all results have been processed. If there has been anything added, it will deincrement the step and run again, over and over until everything has been processed. That bit of logic is on line 317 of abstract-batch.php.

Tested this in a number of ways and it works out. But wouldn't mind some double-checking on the logic to make sure it's the most effective way to go about everything.

@JasonHoffmann JasonHoffmann added this to the 0.9 MVP Launch milestone Jan 3, 2017
@zachwills

Nice work on this! Most review comments are relating to code clarity.

I saw that there were some changes to the tests but I couldn't tell if it was testing the new functionality. If not, can we get an acceptance test that exercises a destructive action and ensures everything reacts appropriately?

includes/abstracts/abstract-batch.php
+ *
+ * @var int
+ */
+ public $results_changed = 0;
@zachwills
zachwills Jan 3, 2017 Collaborator

What do you think of $total_results_difference or something? I noticed as I was reading below the var name $results_changed seemed confusing without more context (specifically within calculate_offset).

@zachwills
zachwills Jan 3, 2017 Collaborator

Looking further down at https://github.com/reaktivstudios/locomotive/pull/96/files#diff-5eab949521aa84753b39a4ffceebf371R317 it looks like this isn't actually the difference it's just the number of changed results? If so I think the comment for this var might be misleading.

includes/abstracts/abstract-batch.php
+ public function set_total_num_results( $total_from_query ) {
+ // If this is past step 1, the client is passing back the total number of results.
+ // This accounts for deletion / descructive actions to the data.
+ $total_from_get = isset( $_POST['total_num_results'] ) ? absint( $_POST['total_num_results'] ) : 0; // Input var okay.
@zachwills
zachwills Jan 3, 2017 Collaborator

I was confused a bit at this line because the var says total_from_get but it is grabbing from the $_POST. Maybe something like total_from_request or total_from_input?

includes/abstracts/abstract-batch.php
+
+ // We need to check to see if there is any new data that has been added.
+ // Record the difference if one is found.
+ if ( $total_from_query >= $total_from_get ) {
@zachwills
zachwills Jan 3, 2017 Collaborator

I think we can just leave this as $total_from_query > $total_from_get since the else block will catch when it is equal.

includes/abstracts/abstract-batch.php
+ $this->total_num_results = (int) $total_from_get;
+ }
+
+ if ( $total_from_query !== $total_from_get && $total_from_get > 0 ) {
@zachwills
zachwills Jan 3, 2017 Collaborator

It took me a few seconds to figure out what this chunk was doing- I think this could be clearer if it were moved into a function like $this->recordChangeIfTotalsDiffer( $total_from_get, $total_from_query ).

@@ -126,7 +130,7 @@ var App = React.createClass( {
// Determine if we have to run another step in the batch. Checks if there are more steps
// that need to run and makes sure the 'status' from the server is still 'running'.
if ( response.currentStep !== response.total_steps && 'running' === response.status.toLowerCase() ) {
- self.runBatch( currentStep + 1 );
+ self.runBatch( response.current_step + 1 );
@zachwills
zachwills Jan 3, 2017 Collaborator

Is the currentStep out of date? Should we drop it entirely and rely on response.current_step?

@JasonHoffmann
JasonHoffmann Jan 4, 2017 Contributor

I’m all for that.

@JasonHoffmann
JasonHoffmann Jan 4, 2017 Contributor

Actually, we need that currentStep variable to be able to pass it back in the AJAX request. The issue is that the server may deincrement the step as a way of running a process again if more have been added. Need to keep both unless there’s a different solution.

@zachwills

Adding another review to switch my status on it to 'comment' instead of 'request changes' since the latter prevents merge even when others approve.

@zachwills
Collaborator

Well the comment didn't work, so I just marked it as approved so it wouldn't lock down the merge button.

@jjeaton

From your PR description:

To address that the initial total number is passed back and forth between the client and the server so there is always a single source of truth on that. In each step, that number is compared against the query number, and the higher of the two is set as the total. That difference is also recorded in results_changed and then used to calculate a proper offset in the calculate_offset function.

Is that ok if the higher total is always taken, even if we are reducing the number of items? Help me walk through that logic to understand the approach.

includes/abstracts/abstract-batch.php
+ *
+ * Uses a number passed from the client to the server and compares it to the total objects
+ * pulled by the latest query. If the dataset is larger, we increase the total_num_results number.
+ * Otherwise, keep it at the original (to acount for deletion / changes).
@jjeaton
jjeaton Jan 10, 2017 Contributor

spelling (account)

includes/abstracts/abstract-batch.php
+ */
+ public function set_total_num_results( $total_from_query ) {
+ // If this is past step 1, the client is passing back the total number of results.
+ // This accounts for deletion / descructive actions to the data.
@jjeaton
jjeaton Jan 10, 2017 Contributor

spelling (destructive)

includes/abstracts/abstract-batch.php
+ public function set_total_num_results( $total_from_query ) {
+ // If this is past step 1, the client is passing back the total number of results.
+ // This accounts for deletion / descructive actions to the data.
+ $total_from_request = isset( $_POST['total_num_results'] ) ? absint( $_POST['total_num_results'] ) : 0; // Input var okay.
@jjeaton
jjeaton Jan 10, 2017 Contributor

I think your tests may be able to be improved by passing this in as a parameter and moving the $_POST access up the chain.

@JasonHoffmann
JasonHoffmann Jan 10, 2017 Contributor

Is that so we can test set_total_num_results directly?

@jjeaton
jjeaton Jan 10, 2017 Contributor

Partially yes, this is more me identifying a potential refactoring to decouple this method from the $_POST global a bit. While reviewing the test I noticed how you have to set the $_POST value and it got me thinking of what a better way would be. This is not merge blocking.

@JasonHoffmann
JasonHoffmann Jan 10, 2017 Contributor

Yea, that makes sense. I just don't know where to move it. We could put the global in it's own function, but that will need to be called somewhere either way, then passed down through to the function. Open to suggestions, I'm pretty deep in this code so an outside eye is always welcome 😃

includes/abstracts/abstract-batch.php
+ // In the case of destructive actions (i.e. deletion) there will be a gap equal to the per_page param.
+ // In all other cases, the difference in totals should equal total number of results.
+ // If neither of these are true, we need to run the last step over again.
+ $difference = $this->total_num_results - $this->difference_in_result_totals;
@jjeaton
jjeaton Jan 10, 2017 Contributor

Your comment mentions "If neither of these are true, we need to run the last step over again.". How do we ensure that we aren't processing records again by rerunning the same step in this case?

@JasonHoffmann
JasonHoffmann Jan 10, 2017 Contributor

That difference accounts for the gap between the original total and the most recent query. In the case of a destructive action, the final step will have a number that’s exactly equal to the posts_per_page number. So if there’s 20 total in 4 steps, the last step would have 5 results left.

In all other cases, that difference would just be the same as the total_num_results so we know we are done.

Now why the greater than or equal to? Well, let’s say we have a destructive action, but we also added a post. In the example above then that would increase 20 to 21 and the final step would actually only have 1 example. But it will never have above 5.

@jjeaton
jjeaton Jan 10, 2017 Contributor

Thanks for the thorough explanation

tests/types/test-batch-posts.php
+ /**
+ * Test that batch gets run.
+ */
+ public function test_run_with_destructive_callback() {
@jjeaton
jjeaton Jan 10, 2017 Contributor

Can we add additional tests to ensure that these work for other batches? Terms and users and sites?

@JasonHoffmann
Contributor

The higher number thing probably makes most sense with an example. It’s basically a way to ensure that the number of total_steps is always correct, and to calculate an offset.

Let’s say you have 20 posts, and posts_per_page is 5. The callback will delete the post. The total number of steps will be 4. But if we let everything calculate over again, on step 2 it would calculate 15 total posts and say that the total number of steps is 3, then 2, and that will break the process. Passing back the number between client and server prevents this, and will ensure that 4 steps are completed no matter what.

We go with the higher number because that will hold the total number of steps if data is deleted, and will add a step if data is added. In either case, this is the desired behavior.

Quick note too. the difference_in_result_totals variable holds the difference between the number being passed back and forth and the most recent query. This will then be subtracted from the calculated offset. In the example above the offset for step 2 would be 5 (since 5 have already been processed). But the offset actually needs to be 0 because the first 5 posts have been reset.

Hope this makes a bit more sense. Lot of back of the napkin math going on.

@jjeaton
Contributor
jjeaton commented Jan 10, 2017

@JasonHoffmann it does make sense, thanks! Can you distill some of this explanation down and into comments in the code itself? You have some in there already, just would be useful for future devs to see this and understand as well. 👍 great job figuring this out.

@jjeaton

🚢

@JasonHoffmann JasonHoffmann merged commit d129ede into master Jan 11, 2017

1 check passed

Scrutinizer Analysis: 5 new issues, 14 updated code elements – Tests: passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment