New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix total count #96
Fix total count #96
Conversation
Added a variable to the POST request to send total number of results from the client to the server on each request. This is the single source of truth for the original total, though if more objects are added to the query, the number of results is increased in a the new function set_total_num_results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this! Most review comments are relating to code clarity.
I saw that there were some changes to the tests but I couldn't tell if it was testing the new functionality. If not, can we get an acceptance test that exercises a destructive action and ensures everything reacts appropriately?
public function set_total_num_results( $total_from_query ) { | ||
// If this is past step 1, the client is passing back the total number of results. | ||
// This accounts for deletion / descructive actions to the data. | ||
$total_from_get = isset( $_POST['total_num_results'] ) ? absint( $_POST['total_num_results'] ) : 0; // Input var okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused a bit at this line because the var says total_from_get
but it is grabbing from the $_POST
. Maybe something like total_from_request
or total_from_input
?
|
||
// We need to check to see if there is any new data that has been added. | ||
// Record the difference if one is found. | ||
if ( $total_from_query >= $total_from_get ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just leave this as $total_from_query > $total_from_get
since the else
block will catch when it is equal.
$this->total_num_results = (int) $total_from_get; | ||
} | ||
|
||
if ( $total_from_query !== $total_from_get && $total_from_get > 0 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a few seconds to figure out what this chunk was doing- I think this could be clearer if it were moved into a function like $this->recordChangeIfTotalsDiffer( $total_from_get, $total_from_query )
.
* | ||
* @var int | ||
*/ | ||
public $results_changed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of $total_results_difference
or something? I noticed as I was reading below the var name $results_changed
seemed confusing without more context (specifically within calculate_offset
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking further down at https://github.com/reaktivstudios/locomotive/pull/96/files#diff-5eab949521aa84753b39a4ffceebf371R317 it looks like this isn't actually the difference it's just the number of changed results? If so I think the comment for this var might be misleading.
@@ -126,7 +130,7 @@ var App = React.createClass( { | |||
// Determine if we have to run another step in the batch. Checks if there are more steps | |||
// that need to run and makes sure the 'status' from the server is still 'running'. | |||
if ( response.currentStep !== response.total_steps && 'running' === response.status.toLowerCase() ) { | |||
self.runBatch( currentStep + 1 ); | |||
self.runBatch( response.current_step + 1 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the currentStep
out of date? Should we drop it entirely and rely on response.current_step
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m all for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we need that currentStep variable to be able to pass it back in the AJAX request. The issue is that the server may deincrement the step as a way of running a process again if more have been added. Need to keep both unless there’s a different solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding another review to switch my status on it to 'comment' instead of 'request changes' since the latter prevents merge even when others approve.
Well the comment didn't work, so I just marked it as approved so it wouldn't lock down the merge button. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From your PR description:
To address that the initial total number is passed back and forth between the client and the server so there is always a single source of truth on that. In each step, that number is compared against the query number, and the higher of the two is set as the total. That difference is also recorded in results_changed and then used to calculate a proper offset in the calculate_offset function.
Is that ok if the higher total is always taken, even if we are reducing the number of items? Help me walk through that logic to understand the approach.
* | ||
* Uses a number passed from the client to the server and compares it to the total objects | ||
* pulled by the latest query. If the dataset is larger, we increase the total_num_results number. | ||
* Otherwise, keep it at the original (to acount for deletion / changes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling (account)
// In the case of destructive actions (i.e. deletion) there will be a gap equal to the per_page param. | ||
// In all other cases, the difference in totals should equal total number of results. | ||
// If neither of these are true, we need to run the last step over again. | ||
$difference = $this->total_num_results - $this->difference_in_result_totals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comment mentions "If neither of these are true, we need to run the last step over again.". How do we ensure that we aren't processing records again by rerunning the same step in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That difference accounts for the gap between the original total and the most recent query. In the case of a destructive action, the final step will have a number that’s exactly equal to the posts_per_page number. So if there’s 20 total in 4 steps, the last step would have 5 results left.
In all other cases, that difference would just be the same as the total_num_results so we know we are done.
Now why the greater than or equal to? Well, let’s say we have a destructive action, but we also added a post. In the example above then that would increase 20 to 21 and the final step would actually only have 1 example. But it will never have above 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough explanation
/** | ||
* Test that batch gets run. | ||
*/ | ||
public function test_run_with_destructive_callback() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add additional tests to ensure that these work for other batches? Terms and users and sites?
*/ | ||
public function set_total_num_results( $total_from_query ) { | ||
// If this is past step 1, the client is passing back the total number of results. | ||
// This accounts for deletion / descructive actions to the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling (destructive)
public function set_total_num_results( $total_from_query ) { | ||
// If this is past step 1, the client is passing back the total number of results. | ||
// This accounts for deletion / descructive actions to the data. | ||
$total_from_request = isset( $_POST['total_num_results'] ) ? absint( $_POST['total_num_results'] ) : 0; // Input var okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your tests may be able to be improved by passing this in as a parameter and moving the $_POST access up the chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that so we can test set_total_num_results
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially yes, this is more me identifying a potential refactoring to decouple this method from the $_POST global a bit. While reviewing the test I noticed how you have to set the $_POST value and it got me thinking of what a better way would be. This is not merge blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that makes sense. I just don't know where to move it. We could put the global in it's own function, but that will need to be called somewhere either way, then passed down through to the function. Open to suggestions, I'm pretty deep in this code so an outside eye is always welcome 😃
The higher number thing probably makes most sense with an example. It’s basically a way to ensure that the number of total_steps is always correct, and to calculate an offset. Let’s say you have 20 posts, and posts_per_page is 5. The callback will delete the post. The total number of steps will be 4. But if we let everything calculate over again, on step 2 it would calculate 15 total posts and say that the total number of steps is 3, then 2, and that will break the process. Passing back the number between client and server prevents this, and will ensure that 4 steps are completed no matter what. We go with the higher number because that will hold the total number of steps if data is deleted, and will add a step if data is added. In either case, this is the desired behavior. Quick note too. the Hope this makes a bit more sense. Lot of back of the napkin math going on. |
@JasonHoffmann it does make sense, thanks! Can you distill some of this explanation down and into comments in the code itself? You have some in there already, just would be useful for future devs to see this and understand as well. 👍 great job figuring this out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
This mostly addresses #92. There are basically two issues.
The first is that if data is changed by the batch process (like deletion or data mutation) then the
total_num_results
number will constantly shift. To address that the initial total number is passed back and forth between the client and the server so there is always a single source of truth on that. In each step, that number is compared against the query number, and the higher of the two is set as the total. That difference is also recorded inresults_changed
and then used to calculate a proper offest in thecalculate_offset
function.The second is if data is added while the process is running (like a new post getting created right in the middle). I added a final check in the
run
function to definitely make sure that all results have been processed. If there has been anything added, it will deincrement the step and run again, over and over until everything has been processed. That bit of logic is on line 317 ofabstract-batch.php
.Tested this in a number of ways and it works out. But wouldn't mind some double-checking on the logic to make sure it's the most effective way to go about everything.