Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query.map() and query.map_async() are no longer implemented #210

Closed
chmoder opened this issue Sep 24, 2019 · 9 comments · Fixed by #218
Closed

query.map() and query.map_async() are no longer implemented #210

chmoder opened this issue Sep 24, 2019 · 9 comments · Fixed by #218
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@chmoder
Copy link
Contributor

chmoder commented Sep 24, 2019

Are the map functions something we want to add in the future? I have a use case I can refactor
or I can also implement query.map*. I am guessing there is some low level reason that the choice to not implement was made; a comment about why would help prevent others from trying if so.

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Sep 25, 2019
@cguardia cguardia added the type: question Request for information or clarification. Not an issue. label Sep 26, 2019
@yoshi-automation yoshi-automation removed the triage me I really want to be triaged. label Sep 26, 2019
@cguardia
Copy link
Contributor

Hi, sorry we took some time to answer. There's really no technical reason for not implementing this. We looked around and it seemed that it was not in use. We are open to considering re-adding this at some point in the beta process, or if you would like to try to implement it, we would be happy to review a PR.

@chmoder
Copy link
Contributor Author

chmoder commented Sep 27, 2019

We will think about it, for now we went with this concept.

pool = Pool(50)
it = query.iter(keys_only=True)
items = pool.map(SomeClass.apply_some_values, it)
pool.close()

@chrisrossi
Copy link
Contributor

@chmoder What is the Pool class from?

Are/were you using any of the arguments to Query.map besides the callback?

@chmoder
Copy link
Contributor Author

chmoder commented Sep 27, 2019

from multiprocessing.pool import Pool
No, we were not using any other arguments in Query.map and Query.map_async.

I would be happy to hear any other suggestions you have if there is a better choice than pool. The reason we used map here is to get the entities of keys in the current entity. (relationship type of thing)

For example:
putting the Car entity in the Person entity but only on query. This is because the Car entities change over time.

Person()
  car = ndb.KeyProperty()

Car()
    model = StringProperty()

@chrisrossi
Copy link
Contributor

Can I ask what you're doing in apply_some_values?

@chmoder
Copy link
Contributor Author

chmoder commented Oct 1, 2019

The idea of apply_some_values is basically a way to set other entities on this one like a foreign key relationship.

It's a contrived example above, but when getting a Person also get that persons Car and set it on the Person instance.

@chrisrossi
Copy link
Contributor

chrisrossi commented Oct 4, 2019

Hi @chmoder ,

Like Carlos says above, the main reason we didn't implement this is we didn't think anyone was using it. (EDIT: And also because there was a large amount of supporting infrastructure just for this one feature.) Having learned otherwise, I don't see any reason why we shouldn't just go ahead and implement.

In the meantime, I think your work-around is fine, but it achieves parallelism in a fundamentally different way than NDB and is relatively expensive in terms of computing resources. Since these operations are going to be I/O bound rather than CPU bound, it may be that the use of multiprocessing isn't as good as just an old fashioned thread pool. In either case, I think NDB's coroutine based single threaded parallelism is still going to give the best performance.

I will, provisionally, add this to our queue.

@chrisrossi chrisrossi self-assigned this Oct 4, 2019
@chrisrossi chrisrossi added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed type: question Request for information or clarification. Not an issue. labels Oct 4, 2019
@chmoder
Copy link
Contributor Author

chmoder commented Oct 4, 2019

We will certainly switch to python-ndb native methods and test them when the PR comes through for this issue. I understand and appreciate your explanation however I am not sure I am familiar enough with NDB's "single threaded parallelism" to write the implementation myself. (EDIT: please feel free to ask if there is something we can do to help.)

In the mean time the pool keeps backward functionality while we refactor our projects. Thank you for the excellent work!

@chrisrossi
Copy link
Contributor

I'm already on the case. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants