Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query update summary to yielded data from apoc.periodic.iterate() #1815

Closed
InverseFalcon opened this issue Feb 22, 2021 · 2 comments
Closed

Comments

@InverseFalcon
Copy link
Collaborator

InverseFalcon commented Feb 22, 2021

Feature description (Mandatory)

apoc.periodic.iterate() provides only minimal information about the result of the call. Specifically, aside from error info, it provides the total number of rows processed (from the outer driving query) and the total number of batches executed, but it doesn't provide any information of what was updated within those batches.

It would be very useful if update statistics (nodes/rels created, updated, deleted) were obtained during batch execution and summed up so we could YIELD such information.

In regular queries, we get metadata in the summary such as this:

"updateStatistics": {
    "_stats": {
      "nodesCreated": 4,
      "nodesDeleted": 0,
      "relationshipsCreated": 0,
      "relationshipsDeleted": 0,
      "propertiesSet": 4,
      "labelsAdded": 4,
      "labelsRemoved": 0,
      "indexesAdded": 0,
      "indexesRemoved": 0,
      "constraintsAdded": 0,
      "constraintsRemoved": 0
    }
}

Theoretically we should be able to get these counts from each batch executed, and sum up the counts (while properly omitting on error). This wouldn't be a bad way to YIELD the sums (though probably without the extra nesting), though I can't recall if we can perform schema operations using iterate() or not. If not, then we won't need those last 4.

Considered alternatives

None at this time.

How this feature can improve the project?

Batch and total counts currently available from iterate() do not adequately track what was changed in the db by the procedure. This addition would provide much more accurate information, and would provide a very quick way for a user to determine whether any changes were actually made to the db (which could alert them to a flaw in their query quickly), and what those were.

Requesting this for at least APOC 4.1.x.x up

@voutilad
Copy link
Contributor

I've got something pretty close to implemented, but found out adding schema updates (indexes, contraints) might be moot as there's no easy way to do schema operations in the producer Cypher statement. Even if you use apoc.schema.assert() that proc doesn't return a QueryStatistics map with counts, either!

jexp pushed a commit that referenced this issue Mar 10, 2021
)

* fix apoc.periodic.iterate not obeying concurrency

The procedure uses the default pooled executor when running with
parallel=true, but the pool is #cpus * 2...so the concurrency ends up
being 2x the cpu cores and not what the user sets.

This changes the behavior to use a set cap via an AtomicInteger and
keeps the same Future creation/submission logic but adds the decrement
of the counter to the future's callback. If parallel == false, the
gating is shortcircuited at the expense of some small overhead for
manipulating the AtomicInteger counter (for now?).

* Fix #1805: prevent execution with invalid params

To prevent executing with batch sizes of 0 or concurrency settings that
result in no consuming threads, validate that batchSize and concurrency
are set to at least 1.

* Use native line separators so tests work on Win10

* Centralize multithreaded batch processing logic

Both APOC Core's apoc.periodic.Periodic and Full's
apoc.periodic.PeriodicExtended had implementations of logic for batch
processing with thread pools. Pull it out and centralize it in
PeriodicUtils for now.

The method signature ends up growing quite a bit, but it's now in 1
place and apoc.periodic.rock_n_roll now had the same logic as
apoc.periodic.iterate.

* cleanup test class, fix error comparison for win10

* Summarize graph updates for apoc.periodic.iterate

Initial fix for #1815 adding in support for providing aggregate update
statistics for any changes to the database from the
apoc.periodic.iterate consuming statement.

For now, the following data is provided:

- nodesCreated
- nodesDeleted
- relationshipsCreated
- relationshipsDeleted
- propertiesSet
- labelsAdded
- labelsRemoved

Consequently, this also adds the capability to the deprecated
apoc.periodic.rock_n_roll as it now uses shared logic.

Given the challenge of doing schema operations in the consuming
statement, index and constraint updates are not provided. Support might
be required for using apoc.schema.assert, but it will in turn need to be
updated to properly report update statistics.

* update docs, adding updateStatistics yield value
conker84 pushed a commit to conker84/neo4j-apoc-procedures that referenced this issue Mar 29, 2021
…rib#1815) (neo4j-contrib#1818)

* fix apoc.periodic.iterate not obeying concurrency

The procedure uses the default pooled executor when running with
parallel=true, but the pool is #cpus * 2...so the concurrency ends up
being 2x the cpu cores and not what the user sets.

This changes the behavior to use a set cap via an AtomicInteger and
keeps the same Future creation/submission logic but adds the decrement
of the counter to the future's callback. If parallel == false, the
gating is shortcircuited at the expense of some small overhead for
manipulating the AtomicInteger counter (for now?).

* Fix neo4j-contrib#1805: prevent execution with invalid params

To prevent executing with batch sizes of 0 or concurrency settings that
result in no consuming threads, validate that batchSize and concurrency
are set to at least 1.

* Use native line separators so tests work on Win10

* Centralize multithreaded batch processing logic

Both APOC Core's apoc.periodic.Periodic and Full's
apoc.periodic.PeriodicExtended had implementations of logic for batch
processing with thread pools. Pull it out and centralize it in
PeriodicUtils for now.

The method signature ends up growing quite a bit, but it's now in 1
place and apoc.periodic.rock_n_roll now had the same logic as
apoc.periodic.iterate.

* cleanup test class, fix error comparison for win10

* Summarize graph updates for apoc.periodic.iterate

Initial fix for neo4j-contrib#1815 adding in support for providing aggregate update
statistics for any changes to the database from the
apoc.periodic.iterate consuming statement.

For now, the following data is provided:

- nodesCreated
- nodesDeleted
- relationshipsCreated
- relationshipsDeleted
- propertiesSet
- labelsAdded
- labelsRemoved

Consequently, this also adds the capability to the deprecated
apoc.periodic.rock_n_roll as it now uses shared logic.

Given the challenge of doing schema operations in the consuming
statement, index and constraint updates are not provided. Support might
be required for using apoc.schema.assert, but it will in turn need to be
updated to properly report update statistics.

* update docs, adding updateStatistics yield value
conker84 pushed a commit to conker84/neo4j-apoc-procedures that referenced this issue Mar 29, 2021
…rib#1815) (neo4j-contrib#1818)

* fix apoc.periodic.iterate not obeying concurrency

The procedure uses the default pooled executor when running with
parallel=true, but the pool is #cpus * 2...so the concurrency ends up
being 2x the cpu cores and not what the user sets.

This changes the behavior to use a set cap via an AtomicInteger and
keeps the same Future creation/submission logic but adds the decrement
of the counter to the future's callback. If parallel == false, the
gating is shortcircuited at the expense of some small overhead for
manipulating the AtomicInteger counter (for now?).

* Fix neo4j-contrib#1805: prevent execution with invalid params

To prevent executing with batch sizes of 0 or concurrency settings that
result in no consuming threads, validate that batchSize and concurrency
are set to at least 1.

* Use native line separators so tests work on Win10

* Centralize multithreaded batch processing logic

Both APOC Core's apoc.periodic.Periodic and Full's
apoc.periodic.PeriodicExtended had implementations of logic for batch
processing with thread pools. Pull it out and centralize it in
PeriodicUtils for now.

The method signature ends up growing quite a bit, but it's now in 1
place and apoc.periodic.rock_n_roll now had the same logic as
apoc.periodic.iterate.

* cleanup test class, fix error comparison for win10

* Summarize graph updates for apoc.periodic.iterate

Initial fix for neo4j-contrib#1815 adding in support for providing aggregate update
statistics for any changes to the database from the
apoc.periodic.iterate consuming statement.

For now, the following data is provided:

- nodesCreated
- nodesDeleted
- relationshipsCreated
- relationshipsDeleted
- propertiesSet
- labelsAdded
- labelsRemoved

Consequently, this also adds the capability to the deprecated
apoc.periodic.rock_n_roll as it now uses shared logic.

Given the challenge of doing schema operations in the consuming
statement, index and constraint updates are not provided. Support might
be required for using apoc.schema.assert, but it will in turn need to be
updated to properly report update statistics.

* update docs, adding updateStatistics yield value
conker84 pushed a commit to conker84/neo4j-apoc-procedures that referenced this issue Mar 29, 2021
…rib#1815) (neo4j-contrib#1818)

* fix apoc.periodic.iterate not obeying concurrency

The procedure uses the default pooled executor when running with
parallel=true, but the pool is #cpus * 2...so the concurrency ends up
being 2x the cpu cores and not what the user sets.

This changes the behavior to use a set cap via an AtomicInteger and
keeps the same Future creation/submission logic but adds the decrement
of the counter to the future's callback. If parallel == false, the
gating is shortcircuited at the expense of some small overhead for
manipulating the AtomicInteger counter (for now?).

* Fix neo4j-contrib#1805: prevent execution with invalid params

To prevent executing with batch sizes of 0 or concurrency settings that
result in no consuming threads, validate that batchSize and concurrency
are set to at least 1.

* Use native line separators so tests work on Win10

* Centralize multithreaded batch processing logic

Both APOC Core's apoc.periodic.Periodic and Full's
apoc.periodic.PeriodicExtended had implementations of logic for batch
processing with thread pools. Pull it out and centralize it in
PeriodicUtils for now.

The method signature ends up growing quite a bit, but it's now in 1
place and apoc.periodic.rock_n_roll now had the same logic as
apoc.periodic.iterate.

* cleanup test class, fix error comparison for win10

* Summarize graph updates for apoc.periodic.iterate

Initial fix for neo4j-contrib#1815 adding in support for providing aggregate update
statistics for any changes to the database from the
apoc.periodic.iterate consuming statement.

For now, the following data is provided:

- nodesCreated
- nodesDeleted
- relationshipsCreated
- relationshipsDeleted
- propertiesSet
- labelsAdded
- labelsRemoved

Consequently, this also adds the capability to the deprecated
apoc.periodic.rock_n_roll as it now uses shared logic.

Given the challenge of doing schema operations in the consuming
statement, index and constraint updates are not provided. Support might
be required for using apoc.schema.assert, but it will in turn need to be
updated to properly report update statistics.

* update docs, adding updateStatistics yield value
fbiville pushed a commit that referenced this issue Apr 2, 2021
)

* fix apoc.periodic.iterate not obeying concurrency

The procedure uses the default pooled executor when running with
parallel=true, but the pool is #cpus * 2...so the concurrency ends up
being 2x the cpu cores and not what the user sets.

This changes the behavior to use a set cap via an AtomicInteger and
keeps the same Future creation/submission logic but adds the decrement
of the counter to the future's callback. If parallel == false, the
gating is shortcircuited at the expense of some small overhead for
manipulating the AtomicInteger counter (for now?).

* Fix #1805: prevent execution with invalid params

To prevent executing with batch sizes of 0 or concurrency settings that
result in no consuming threads, validate that batchSize and concurrency
are set to at least 1.

* Use native line separators so tests work on Win10

* Centralize multithreaded batch processing logic

Both APOC Core's apoc.periodic.Periodic and Full's
apoc.periodic.PeriodicExtended had implementations of logic for batch
processing with thread pools. Pull it out and centralize it in
PeriodicUtils for now.

The method signature ends up growing quite a bit, but it's now in 1
place and apoc.periodic.rock_n_roll now had the same logic as
apoc.periodic.iterate.

* cleanup test class, fix error comparison for win10

* Summarize graph updates for apoc.periodic.iterate

Initial fix for #1815 adding in support for providing aggregate update
statistics for any changes to the database from the
apoc.periodic.iterate consuming statement.

For now, the following data is provided:

- nodesCreated
- nodesDeleted
- relationshipsCreated
- relationshipsDeleted
- propertiesSet
- labelsAdded
- labelsRemoved

Consequently, this also adds the capability to the deprecated
apoc.periodic.rock_n_roll as it now uses shared logic.

Given the challenge of doing schema operations in the consuming
statement, index and constraint updates are not provided. Support might
be required for using apoc.schema.assert, but it will in turn need to be
updated to properly report update statistics.

* update docs, adding updateStatistics yield value
conker84 pushed a commit that referenced this issue Apr 12, 2021
)

* fix apoc.periodic.iterate not obeying concurrency

The procedure uses the default pooled executor when running with
parallel=true, but the pool is #cpus * 2...so the concurrency ends up
being 2x the cpu cores and not what the user sets.

This changes the behavior to use a set cap via an AtomicInteger and
keeps the same Future creation/submission logic but adds the decrement
of the counter to the future's callback. If parallel == false, the
gating is shortcircuited at the expense of some small overhead for
manipulating the AtomicInteger counter (for now?).

* Fix #1805: prevent execution with invalid params

To prevent executing with batch sizes of 0 or concurrency settings that
result in no consuming threads, validate that batchSize and concurrency
are set to at least 1.

* Use native line separators so tests work on Win10

* Centralize multithreaded batch processing logic

Both APOC Core's apoc.periodic.Periodic and Full's
apoc.periodic.PeriodicExtended had implementations of logic for batch
processing with thread pools. Pull it out and centralize it in
PeriodicUtils for now.

The method signature ends up growing quite a bit, but it's now in 1
place and apoc.periodic.rock_n_roll now had the same logic as
apoc.periodic.iterate.

* cleanup test class, fix error comparison for win10

* Summarize graph updates for apoc.periodic.iterate

Initial fix for #1815 adding in support for providing aggregate update
statistics for any changes to the database from the
apoc.periodic.iterate consuming statement.

For now, the following data is provided:

- nodesCreated
- nodesDeleted
- relationshipsCreated
- relationshipsDeleted
- propertiesSet
- labelsAdded
- labelsRemoved

Consequently, this also adds the capability to the deprecated
apoc.periodic.rock_n_roll as it now uses shared logic.

Given the challenge of doing schema operations in the consuming
statement, index and constraint updates are not provided. Support might
be required for using apoc.schema.assert, but it will in turn need to be
updated to properly report update statistics.

* update docs, adding updateStatistics yield value
conker84 pushed a commit to conker84/neo4j-apoc-procedures that referenced this issue Apr 14, 2021
…rib#1815) (neo4j-contrib#1818)

* fix apoc.periodic.iterate not obeying concurrency

The procedure uses the default pooled executor when running with
parallel=true, but the pool is #cpus * 2...so the concurrency ends up
being 2x the cpu cores and not what the user sets.

This changes the behavior to use a set cap via an AtomicInteger and
keeps the same Future creation/submission logic but adds the decrement
of the counter to the future's callback. If parallel == false, the
gating is shortcircuited at the expense of some small overhead for
manipulating the AtomicInteger counter (for now?).

* Fix neo4j-contrib#1805: prevent execution with invalid params

To prevent executing with batch sizes of 0 or concurrency settings that
result in no consuming threads, validate that batchSize and concurrency
are set to at least 1.

* Use native line separators so tests work on Win10

* Centralize multithreaded batch processing logic

Both APOC Core's apoc.periodic.Periodic and Full's
apoc.periodic.PeriodicExtended had implementations of logic for batch
processing with thread pools. Pull it out and centralize it in
PeriodicUtils for now.

The method signature ends up growing quite a bit, but it's now in 1
place and apoc.periodic.rock_n_roll now had the same logic as
apoc.periodic.iterate.

* cleanup test class, fix error comparison for win10

* Summarize graph updates for apoc.periodic.iterate

Initial fix for neo4j-contrib#1815 adding in support for providing aggregate update
statistics for any changes to the database from the
apoc.periodic.iterate consuming statement.

For now, the following data is provided:

- nodesCreated
- nodesDeleted
- relationshipsCreated
- relationshipsDeleted
- propertiesSet
- labelsAdded
- labelsRemoved

Consequently, this also adds the capability to the deprecated
apoc.periodic.rock_n_roll as it now uses shared logic.

Given the challenge of doing schema operations in the consuming
statement, index and constraint updates are not provided. Support might
be required for using apoc.schema.assert, but it will in turn need to be
updated to properly report update statistics.

* update docs, adding updateStatistics yield value
@conker84
Copy link
Contributor

closed via e1939eb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants