Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[master < ] Add python & cpp batching #964

Merged
merged 25 commits into from
Jun 26, 2023

Conversation

antoniofilipovic
Copy link
Collaborator

@antoniofilipovic antoniofilipovic commented Jun 6, 2023

[master < Task] PR

  • Check, and update documentation if necessary
  • Provide the full content or a guide for the final git message

To keep docs changelog up to date, one more thing to do:

  • Write a release note here
  • Tag someone from docs team in the comments

@antoniofilipovic antoniofilipovic self-assigned this Jun 6, 2023
@antoniofilipovic
Copy link
Collaborator Author

antoniofilipovic commented Jun 19, 2023

Some decisions made along the way:

  • initializer will receive the same parameters as the main func in batched procedure; thus, we do type checks in mgp.py. The same parameters are sometimes required to do proper initialization, i.e. reading from a file, etc...
  • in py_module.cpp there is an option not to invalidate the PyGraph object. This option would give users the option to cache vertices, edges, and other objects from storage and batch return them from the procedure. There are two perspectives in this case:
    1. Tech perspective: we would need two different memory allocators since objects which plan to be cached need to be held in memory until the batched function is called again. Currently, everything is allocated with the same memory resource which after one batch is executed is restarted to a clean state (memory deallocated) to reduce memory usage. In code there are lot of checks if same memory resource is used for PyGraph and PyVertex and these would need decoupling.
    2. User story: Batching works best when it is used for streams and returning of large objects to reduce memory usage. Vertices and edges are not large objects, thus seems unnecessary to batch return of vertices. Other point is that vertices in order to be cached, user would need get all vertices from storage, which would cause upsurge of memory which seems like not necessary.
  • [DISCUSS] during the execution of batching procedure there is an option to copy the shared pointer to read lock on all modules if we want for batched procedure not to be restarted (restart requires global lock). It seems like this is also more of a bug than a feature in regular procedures, as the user could write MATCH (n) CALL proc.do_smth() and during execution from another session called mg.load(proc) which would result in issue as only what is checked in code is size of results. Return types are checked so that they suit what was registered.

@antoniofilipovic antoniofilipovic marked this pull request as ready for review June 21, 2023 09:02
@gitbuda gitbuda added this to the mg-v2.9.0 milestone Jun 22, 2023
Copy link
Contributor

@Josipmrden Josipmrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but mostly good work

include/mgp.hpp Show resolved Hide resolved
include/mgp.hpp Show resolved Hide resolved
include/mgp.py Show resolved Hide resolved
include/mgp.py Outdated Show resolved Hide resolved
tests/e2e/batched_procedures/simple_read.py Outdated Show resolved Hide resolved
tests/e2e/batched_procedures/simple_read.py Show resolved Hide resolved
tests/e2e/batched_procedures/simple_read.py Outdated Show resolved Hide resolved
src/query/plan/operator.cpp Outdated Show resolved Hide resolved
src/query/plan/operator.cpp Outdated Show resolved Hide resolved
@antoniofilipovic antoniofilipovic added the Ready for review PR is ready for review label Jun 26, 2023
Copy link
Contributor

@Josipmrden Josipmrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes fixed. Minor comment, don't need to re-ask for review again

include/mgp.hpp Show resolved Hide resolved
include/mgp.hpp Outdated Show resolved Hide resolved
include/mgp.hpp Show resolved Hide resolved
Copy link
Contributor

@andrejtonev andrejtonev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments.

src/query/plan/operator.cpp Show resolved Hide resolved
src/query/plan/operator.cpp Show resolved Hide resolved
src/query/plan/operator.cpp Show resolved Hide resolved
src/query/plan/operator.cpp Show resolved Hide resolved
src/query/plan/operator.cpp Outdated Show resolved Hide resolved
src/query/plan/operator.cpp Show resolved Hide resolved
src/query/plan/operator.hpp Show resolved Hide resolved
src/query/plan/operator.hpp Show resolved Hide resolved
src/query/procedure/mg_procedure_impl.cpp Outdated Show resolved Hide resolved
src/query/procedure/py_module.cpp Outdated Show resolved Hide resolved
@antoniofilipovic antoniofilipovic merged commit d573eda into master Jun 26, 2023
6 checks passed
@antoniofilipovic antoniofilipovic deleted the MG-add-python-cpp-batching-procedure branch June 26, 2023 13:46
@antoniofilipovic
Copy link
Collaborator Author

@vpavicic for release note:
Add an option to batch results from procedures from both Python and CPP procedures

I will open PR shortly for procedures on docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready for review PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants