New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query update summary to yielded data from apoc.periodic.iterate() #1815
Comments
I've got something pretty close to implemented, but found out adding schema updates (indexes, contraints) might be moot as there's no easy way to do schema operations in the producer Cypher statement. Even if you use |
jexp
pushed a commit
that referenced
this issue
Mar 10, 2021
) * fix apoc.periodic.iterate not obeying concurrency The procedure uses the default pooled executor when running with parallel=true, but the pool is #cpus * 2...so the concurrency ends up being 2x the cpu cores and not what the user sets. This changes the behavior to use a set cap via an AtomicInteger and keeps the same Future creation/submission logic but adds the decrement of the counter to the future's callback. If parallel == false, the gating is shortcircuited at the expense of some small overhead for manipulating the AtomicInteger counter (for now?). * Fix #1805: prevent execution with invalid params To prevent executing with batch sizes of 0 or concurrency settings that result in no consuming threads, validate that batchSize and concurrency are set to at least 1. * Use native line separators so tests work on Win10 * Centralize multithreaded batch processing logic Both APOC Core's apoc.periodic.Periodic and Full's apoc.periodic.PeriodicExtended had implementations of logic for batch processing with thread pools. Pull it out and centralize it in PeriodicUtils for now. The method signature ends up growing quite a bit, but it's now in 1 place and apoc.periodic.rock_n_roll now had the same logic as apoc.periodic.iterate. * cleanup test class, fix error comparison for win10 * Summarize graph updates for apoc.periodic.iterate Initial fix for #1815 adding in support for providing aggregate update statistics for any changes to the database from the apoc.periodic.iterate consuming statement. For now, the following data is provided: - nodesCreated - nodesDeleted - relationshipsCreated - relationshipsDeleted - propertiesSet - labelsAdded - labelsRemoved Consequently, this also adds the capability to the deprecated apoc.periodic.rock_n_roll as it now uses shared logic. Given the challenge of doing schema operations in the consuming statement, index and constraint updates are not provided. Support might be required for using apoc.schema.assert, but it will in turn need to be updated to properly report update statistics. * update docs, adding updateStatistics yield value
conker84
pushed a commit
to conker84/neo4j-apoc-procedures
that referenced
this issue
Mar 29, 2021
…rib#1815) (neo4j-contrib#1818) * fix apoc.periodic.iterate not obeying concurrency The procedure uses the default pooled executor when running with parallel=true, but the pool is #cpus * 2...so the concurrency ends up being 2x the cpu cores and not what the user sets. This changes the behavior to use a set cap via an AtomicInteger and keeps the same Future creation/submission logic but adds the decrement of the counter to the future's callback. If parallel == false, the gating is shortcircuited at the expense of some small overhead for manipulating the AtomicInteger counter (for now?). * Fix neo4j-contrib#1805: prevent execution with invalid params To prevent executing with batch sizes of 0 or concurrency settings that result in no consuming threads, validate that batchSize and concurrency are set to at least 1. * Use native line separators so tests work on Win10 * Centralize multithreaded batch processing logic Both APOC Core's apoc.periodic.Periodic and Full's apoc.periodic.PeriodicExtended had implementations of logic for batch processing with thread pools. Pull it out and centralize it in PeriodicUtils for now. The method signature ends up growing quite a bit, but it's now in 1 place and apoc.periodic.rock_n_roll now had the same logic as apoc.periodic.iterate. * cleanup test class, fix error comparison for win10 * Summarize graph updates for apoc.periodic.iterate Initial fix for neo4j-contrib#1815 adding in support for providing aggregate update statistics for any changes to the database from the apoc.periodic.iterate consuming statement. For now, the following data is provided: - nodesCreated - nodesDeleted - relationshipsCreated - relationshipsDeleted - propertiesSet - labelsAdded - labelsRemoved Consequently, this also adds the capability to the deprecated apoc.periodic.rock_n_roll as it now uses shared logic. Given the challenge of doing schema operations in the consuming statement, index and constraint updates are not provided. Support might be required for using apoc.schema.assert, but it will in turn need to be updated to properly report update statistics. * update docs, adding updateStatistics yield value
conker84
pushed a commit
to conker84/neo4j-apoc-procedures
that referenced
this issue
Mar 29, 2021
…rib#1815) (neo4j-contrib#1818) * fix apoc.periodic.iterate not obeying concurrency The procedure uses the default pooled executor when running with parallel=true, but the pool is #cpus * 2...so the concurrency ends up being 2x the cpu cores and not what the user sets. This changes the behavior to use a set cap via an AtomicInteger and keeps the same Future creation/submission logic but adds the decrement of the counter to the future's callback. If parallel == false, the gating is shortcircuited at the expense of some small overhead for manipulating the AtomicInteger counter (for now?). * Fix neo4j-contrib#1805: prevent execution with invalid params To prevent executing with batch sizes of 0 or concurrency settings that result in no consuming threads, validate that batchSize and concurrency are set to at least 1. * Use native line separators so tests work on Win10 * Centralize multithreaded batch processing logic Both APOC Core's apoc.periodic.Periodic and Full's apoc.periodic.PeriodicExtended had implementations of logic for batch processing with thread pools. Pull it out and centralize it in PeriodicUtils for now. The method signature ends up growing quite a bit, but it's now in 1 place and apoc.periodic.rock_n_roll now had the same logic as apoc.periodic.iterate. * cleanup test class, fix error comparison for win10 * Summarize graph updates for apoc.periodic.iterate Initial fix for neo4j-contrib#1815 adding in support for providing aggregate update statistics for any changes to the database from the apoc.periodic.iterate consuming statement. For now, the following data is provided: - nodesCreated - nodesDeleted - relationshipsCreated - relationshipsDeleted - propertiesSet - labelsAdded - labelsRemoved Consequently, this also adds the capability to the deprecated apoc.periodic.rock_n_roll as it now uses shared logic. Given the challenge of doing schema operations in the consuming statement, index and constraint updates are not provided. Support might be required for using apoc.schema.assert, but it will in turn need to be updated to properly report update statistics. * update docs, adding updateStatistics yield value
conker84
pushed a commit
to conker84/neo4j-apoc-procedures
that referenced
this issue
Mar 29, 2021
…rib#1815) (neo4j-contrib#1818) * fix apoc.periodic.iterate not obeying concurrency The procedure uses the default pooled executor when running with parallel=true, but the pool is #cpus * 2...so the concurrency ends up being 2x the cpu cores and not what the user sets. This changes the behavior to use a set cap via an AtomicInteger and keeps the same Future creation/submission logic but adds the decrement of the counter to the future's callback. If parallel == false, the gating is shortcircuited at the expense of some small overhead for manipulating the AtomicInteger counter (for now?). * Fix neo4j-contrib#1805: prevent execution with invalid params To prevent executing with batch sizes of 0 or concurrency settings that result in no consuming threads, validate that batchSize and concurrency are set to at least 1. * Use native line separators so tests work on Win10 * Centralize multithreaded batch processing logic Both APOC Core's apoc.periodic.Periodic and Full's apoc.periodic.PeriodicExtended had implementations of logic for batch processing with thread pools. Pull it out and centralize it in PeriodicUtils for now. The method signature ends up growing quite a bit, but it's now in 1 place and apoc.periodic.rock_n_roll now had the same logic as apoc.periodic.iterate. * cleanup test class, fix error comparison for win10 * Summarize graph updates for apoc.periodic.iterate Initial fix for neo4j-contrib#1815 adding in support for providing aggregate update statistics for any changes to the database from the apoc.periodic.iterate consuming statement. For now, the following data is provided: - nodesCreated - nodesDeleted - relationshipsCreated - relationshipsDeleted - propertiesSet - labelsAdded - labelsRemoved Consequently, this also adds the capability to the deprecated apoc.periodic.rock_n_roll as it now uses shared logic. Given the challenge of doing schema operations in the consuming statement, index and constraint updates are not provided. Support might be required for using apoc.schema.assert, but it will in turn need to be updated to properly report update statistics. * update docs, adding updateStatistics yield value
fbiville
pushed a commit
that referenced
this issue
Apr 2, 2021
) * fix apoc.periodic.iterate not obeying concurrency The procedure uses the default pooled executor when running with parallel=true, but the pool is #cpus * 2...so the concurrency ends up being 2x the cpu cores and not what the user sets. This changes the behavior to use a set cap via an AtomicInteger and keeps the same Future creation/submission logic but adds the decrement of the counter to the future's callback. If parallel == false, the gating is shortcircuited at the expense of some small overhead for manipulating the AtomicInteger counter (for now?). * Fix #1805: prevent execution with invalid params To prevent executing with batch sizes of 0 or concurrency settings that result in no consuming threads, validate that batchSize and concurrency are set to at least 1. * Use native line separators so tests work on Win10 * Centralize multithreaded batch processing logic Both APOC Core's apoc.periodic.Periodic and Full's apoc.periodic.PeriodicExtended had implementations of logic for batch processing with thread pools. Pull it out and centralize it in PeriodicUtils for now. The method signature ends up growing quite a bit, but it's now in 1 place and apoc.periodic.rock_n_roll now had the same logic as apoc.periodic.iterate. * cleanup test class, fix error comparison for win10 * Summarize graph updates for apoc.periodic.iterate Initial fix for #1815 adding in support for providing aggregate update statistics for any changes to the database from the apoc.periodic.iterate consuming statement. For now, the following data is provided: - nodesCreated - nodesDeleted - relationshipsCreated - relationshipsDeleted - propertiesSet - labelsAdded - labelsRemoved Consequently, this also adds the capability to the deprecated apoc.periodic.rock_n_roll as it now uses shared logic. Given the challenge of doing schema operations in the consuming statement, index and constraint updates are not provided. Support might be required for using apoc.schema.assert, but it will in turn need to be updated to properly report update statistics. * update docs, adding updateStatistics yield value
conker84
pushed a commit
that referenced
this issue
Apr 12, 2021
) * fix apoc.periodic.iterate not obeying concurrency The procedure uses the default pooled executor when running with parallel=true, but the pool is #cpus * 2...so the concurrency ends up being 2x the cpu cores and not what the user sets. This changes the behavior to use a set cap via an AtomicInteger and keeps the same Future creation/submission logic but adds the decrement of the counter to the future's callback. If parallel == false, the gating is shortcircuited at the expense of some small overhead for manipulating the AtomicInteger counter (for now?). * Fix #1805: prevent execution with invalid params To prevent executing with batch sizes of 0 or concurrency settings that result in no consuming threads, validate that batchSize and concurrency are set to at least 1. * Use native line separators so tests work on Win10 * Centralize multithreaded batch processing logic Both APOC Core's apoc.periodic.Periodic and Full's apoc.periodic.PeriodicExtended had implementations of logic for batch processing with thread pools. Pull it out and centralize it in PeriodicUtils for now. The method signature ends up growing quite a bit, but it's now in 1 place and apoc.periodic.rock_n_roll now had the same logic as apoc.periodic.iterate. * cleanup test class, fix error comparison for win10 * Summarize graph updates for apoc.periodic.iterate Initial fix for #1815 adding in support for providing aggregate update statistics for any changes to the database from the apoc.periodic.iterate consuming statement. For now, the following data is provided: - nodesCreated - nodesDeleted - relationshipsCreated - relationshipsDeleted - propertiesSet - labelsAdded - labelsRemoved Consequently, this also adds the capability to the deprecated apoc.periodic.rock_n_roll as it now uses shared logic. Given the challenge of doing schema operations in the consuming statement, index and constraint updates are not provided. Support might be required for using apoc.schema.assert, but it will in turn need to be updated to properly report update statistics. * update docs, adding updateStatistics yield value
conker84
pushed a commit
to conker84/neo4j-apoc-procedures
that referenced
this issue
Apr 14, 2021
…rib#1815) (neo4j-contrib#1818) * fix apoc.periodic.iterate not obeying concurrency The procedure uses the default pooled executor when running with parallel=true, but the pool is #cpus * 2...so the concurrency ends up being 2x the cpu cores and not what the user sets. This changes the behavior to use a set cap via an AtomicInteger and keeps the same Future creation/submission logic but adds the decrement of the counter to the future's callback. If parallel == false, the gating is shortcircuited at the expense of some small overhead for manipulating the AtomicInteger counter (for now?). * Fix neo4j-contrib#1805: prevent execution with invalid params To prevent executing with batch sizes of 0 or concurrency settings that result in no consuming threads, validate that batchSize and concurrency are set to at least 1. * Use native line separators so tests work on Win10 * Centralize multithreaded batch processing logic Both APOC Core's apoc.periodic.Periodic and Full's apoc.periodic.PeriodicExtended had implementations of logic for batch processing with thread pools. Pull it out and centralize it in PeriodicUtils for now. The method signature ends up growing quite a bit, but it's now in 1 place and apoc.periodic.rock_n_roll now had the same logic as apoc.periodic.iterate. * cleanup test class, fix error comparison for win10 * Summarize graph updates for apoc.periodic.iterate Initial fix for neo4j-contrib#1815 adding in support for providing aggregate update statistics for any changes to the database from the apoc.periodic.iterate consuming statement. For now, the following data is provided: - nodesCreated - nodesDeleted - relationshipsCreated - relationshipsDeleted - propertiesSet - labelsAdded - labelsRemoved Consequently, this also adds the capability to the deprecated apoc.periodic.rock_n_roll as it now uses shared logic. Given the challenge of doing schema operations in the consuming statement, index and constraint updates are not provided. Support might be required for using apoc.schema.assert, but it will in turn need to be updated to properly report update statistics. * update docs, adding updateStatistics yield value
conker84
added a commit
that referenced
this issue
Apr 20, 2021
closed via e1939eb |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature description (Mandatory)
apoc.periodic.iterate() provides only minimal information about the result of the call. Specifically, aside from error info, it provides the total number of rows processed (from the outer driving query) and the total number of batches executed, but it doesn't provide any information of what was updated within those batches.
It would be very useful if update statistics (nodes/rels created, updated, deleted) were obtained during batch execution and summed up so we could YIELD such information.
In regular queries, we get metadata in the summary such as this:
Theoretically we should be able to get these counts from each batch executed, and sum up the counts (while properly omitting on error). This wouldn't be a bad way to YIELD the sums (though probably without the extra nesting), though I can't recall if we can perform schema operations using iterate() or not. If not, then we won't need those last 4.
Considered alternatives
None at this time.
How this feature can improve the project?
Batch and total counts currently available from iterate() do not adequately track what was changed in the db by the procedure. This addition would provide much more accurate information, and would provide a very quick way for a user to determine whether any changes were actually made to the db (which could alert them to a flaw in their query quickly), and what those were.
Requesting this for at least APOC 4.1.x.x up
The text was updated successfully, but these errors were encountered: