Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix total count #96

Merged
merged 14 commits into from Jan 11, 2017
Merged

Fix total count #96

merged 14 commits into from Jan 11, 2017

Conversation

JasonHoffmann
Copy link
Contributor

This mostly addresses #92. There are basically two issues.

The first is that if data is changed by the batch process (like deletion or data mutation) then the total_num_results number will constantly shift. To address that the initial total number is passed back and forth between the client and the server so there is always a single source of truth on that. In each step, that number is compared against the query number, and the higher of the two is set as the total. That difference is also recorded in results_changed and then used to calculate a proper offest in the calculate_offset function.

The second is if data is added while the process is running (like a new post getting created right in the middle). I added a final check in the run function to definitely make sure that all results have been processed. If there has been anything added, it will deincrement the step and run again, over and over until everything has been processed. That bit of logic is on line 317 of abstract-batch.php.

Tested this in a number of ways and it works out. But wouldn't mind some double-checking on the logic to make sure it's the most effective way to go about everything.

Added a variable to the POST request to send total number of results from the client to the server on each request. This is the single source of truth for the original total, though if more objects are added to the query, the number of results is increased in a the new function set_total_num_results
Copy link
Contributor

@zachwills zachwills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this! Most review comments are relating to code clarity.

I saw that there were some changes to the tests but I couldn't tell if it was testing the new functionality. If not, can we get an acceptance test that exercises a destructive action and ensures everything reacts appropriately?

public function set_total_num_results( $total_from_query ) {
// If this is past step 1, the client is passing back the total number of results.
// This accounts for deletion / descructive actions to the data.
$total_from_get = isset( $_POST['total_num_results'] ) ? absint( $_POST['total_num_results'] ) : 0; // Input var okay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused a bit at this line because the var says total_from_get but it is grabbing from the $_POST. Maybe something like total_from_request or total_from_input?


// We need to check to see if there is any new data that has been added.
// Record the difference if one is found.
if ( $total_from_query >= $total_from_get ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just leave this as $total_from_query > $total_from_get since the else block will catch when it is equal.

$this->total_num_results = (int) $total_from_get;
}

if ( $total_from_query !== $total_from_get && $total_from_get > 0 ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a few seconds to figure out what this chunk was doing- I think this could be clearer if it were moved into a function like $this->recordChangeIfTotalsDiffer( $total_from_get, $total_from_query ).

*
* @var int
*/
public $results_changed = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of $total_results_difference or something? I noticed as I was reading below the var name $results_changed seemed confusing without more context (specifically within calculate_offset).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking further down at https://github.com/reaktivstudios/locomotive/pull/96/files#diff-5eab949521aa84753b39a4ffceebf371R317 it looks like this isn't actually the difference it's just the number of changed results? If so I think the comment for this var might be misleading.

@@ -126,7 +130,7 @@ var App = React.createClass( {
// Determine if we have to run another step in the batch. Checks if there are more steps
// that need to run and makes sure the 'status' from the server is still 'running'.
if ( response.currentStep !== response.total_steps && 'running' === response.status.toLowerCase() ) {
self.runBatch( currentStep + 1 );
self.runBatch( response.current_step + 1 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the currentStep out of date? Should we drop it entirely and rely on response.current_step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m all for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we need that currentStep variable to be able to pass it back in the AJAX request. The issue is that the server may deincrement the step as a way of running a process again if more have been added. Need to keep both unless there’s a different solution.

Copy link
Contributor

@zachwills zachwills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding another review to switch my status on it to 'comment' instead of 'request changes' since the latter prevents merge even when others approve.

@zachwills
Copy link
Contributor

Well the comment didn't work, so I just marked it as approved so it wouldn't lock down the merge button.

Copy link
Contributor

@jjeaton jjeaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your PR description:

To address that the initial total number is passed back and forth between the client and the server so there is always a single source of truth on that. In each step, that number is compared against the query number, and the higher of the two is set as the total. That difference is also recorded in results_changed and then used to calculate a proper offset in the calculate_offset function.

Is that ok if the higher total is always taken, even if we are reducing the number of items? Help me walk through that logic to understand the approach.

*
* Uses a number passed from the client to the server and compares it to the total objects
* pulled by the latest query. If the dataset is larger, we increase the total_num_results number.
* Otherwise, keep it at the original (to acount for deletion / changes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling (account)

// In the case of destructive actions (i.e. deletion) there will be a gap equal to the per_page param.
// In all other cases, the difference in totals should equal total number of results.
// If neither of these are true, we need to run the last step over again.
$difference = $this->total_num_results - $this->difference_in_result_totals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment mentions "If neither of these are true, we need to run the last step over again.". How do we ensure that we aren't processing records again by rerunning the same step in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That difference accounts for the gap between the original total and the most recent query. In the case of a destructive action, the final step will have a number that’s exactly equal to the posts_per_page number. So if there’s 20 total in 4 steps, the last step would have 5 results left.

In all other cases, that difference would just be the same as the total_num_results so we know we are done.

Now why the greater than or equal to? Well, let’s say we have a destructive action, but we also added a post. In the example above then that would increase 20 to 21 and the final step would actually only have 1 example. But it will never have above 5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation

/**
* Test that batch gets run.
*/
public function test_run_with_destructive_callback() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add additional tests to ensure that these work for other batches? Terms and users and sites?

*/
public function set_total_num_results( $total_from_query ) {
// If this is past step 1, the client is passing back the total number of results.
// This accounts for deletion / descructive actions to the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling (destructive)

public function set_total_num_results( $total_from_query ) {
// If this is past step 1, the client is passing back the total number of results.
// This accounts for deletion / descructive actions to the data.
$total_from_request = isset( $_POST['total_num_results'] ) ? absint( $_POST['total_num_results'] ) : 0; // Input var okay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your tests may be able to be improved by passing this in as a parameter and moving the $_POST access up the chain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that so we can test set_total_num_results directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially yes, this is more me identifying a potential refactoring to decouple this method from the $_POST global a bit. While reviewing the test I noticed how you have to set the $_POST value and it got me thinking of what a better way would be. This is not merge blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that makes sense. I just don't know where to move it. We could put the global in it's own function, but that will need to be called somewhere either way, then passed down through to the function. Open to suggestions, I'm pretty deep in this code so an outside eye is always welcome 😃

@JasonHoffmann
Copy link
Contributor Author

The higher number thing probably makes most sense with an example. It’s basically a way to ensure that the number of total_steps is always correct, and to calculate an offset.

Let’s say you have 20 posts, and posts_per_page is 5. The callback will delete the post. The total number of steps will be 4. But if we let everything calculate over again, on step 2 it would calculate 15 total posts and say that the total number of steps is 3, then 2, and that will break the process. Passing back the number between client and server prevents this, and will ensure that 4 steps are completed no matter what.

We go with the higher number because that will hold the total number of steps if data is deleted, and will add a step if data is added. In either case, this is the desired behavior.

Quick note too. the difference_in_result_totals variable holds the difference between the number being passed back and forth and the most recent query. This will then be subtracted from the calculated offset. In the example above the offset for step 2 would be 5 (since 5 have already been processed). But the offset actually needs to be 0 because the first 5 posts have been reset.

Hope this makes a bit more sense. Lot of back of the napkin math going on.

@jjeaton
Copy link
Contributor

jjeaton commented Jan 10, 2017

@JasonHoffmann it does make sense, thanks! Can you distill some of this explanation down and into comments in the code itself? You have some in there already, just would be useful for future devs to see this and understand as well. 👍 great job figuring this out.

Copy link
Contributor

@jjeaton jjeaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants