Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.11 is not supported #67

Closed
ekreate opened this issue Jan 1, 2023 · 6 comments · Fixed by #84
Closed

3.11 is not supported #67

ekreate opened this issue Jan 1, 2023 · 6 comments · Fixed by #84
Assignees
Labels
enhancement New feature or request

Comments

@ekreate
Copy link

ekreate commented Jan 1, 2023

any chance you will add support for 3.11? :(

@sybrenjansen
Copy link
Owner

Yes, I will have a look

@sybrenjansen sybrenjansen self-assigned this Jan 10, 2023
@sybrenjansen sybrenjansen added the enhancement New feature or request label Jan 10, 2023
@sybrenjansen
Copy link
Owner

I can't yet support python 3.11 because some of the dependencies (i.e. multiprocess) doesn't support 3.11 completely yet. They did have support for a pre-release version of 3.11, but I will need to wait for the support for the actual release.

In the mean time, if you want to use mpire for 3.11, you can still install the latest mpire without issues and use it. Only when you're relying on using the start method 'spawn' with use_dill=True, this will throw an error. All other start methods should work fine.

@ekreate
Copy link
Author

ekreate commented Jan 11, 2023

Thanks for the response! I've been looking at your project and the ray module, and because I don't need all their advanced features I rather use this.

Maybe I misunderstood this module but I just need a more efficient pipe()/queue() method that doesn't serialize and de-serialize the data.

Can I do that using your project, and continue using the mp,process functionality, and not via the map functionality?

@sybrenjansen
Copy link
Owner

Not serializing data can be done with mpire, but only if you use fork as start method (which is the default on linux). Have a look at https://slimmer-ai.github.io/mpire/usage/workerpool/shared_objects.html for more information on how that works.

However, this does require map for now. In another ticket there's someone asking for apply/apply_async, which I might add in the near future. That will come closer I think to what you're looking for.

If you also don't need that, but just a single process which can use data without copying, then I'd suggest to not use mpire and implement it yourself. It's as simple as creating a new process class, which inherits from mp.process and receives an additional parameter which can be the shared object. Also add a task and results queue here. Store that object and the queues as class variables in the init and implement the run method of the process. This can be as simple as a while loop which asks for the next item in the results queue, executes a function (which uses the shared object) and stores the result in the results queue. Once a special task has been given (i.e., a poison pill), you break from the while loop.

@ekreate
Copy link
Author

ekreate commented Jan 17, 2023

Yeah, I implemented that in my current project.
I guess what I'm looking for is to have the exact same API of the pipe, with FIFO, and without needing to worry about locks and race conditions, however without the extra time that it takes to serialize and de-serialize.

I saw that ray provides something similar to what I want, but.. its just so annoying using their interface, and its an overkill, and I can't seem to make vscode stop on breakpoints if I use ray. :/

anyway, I might just implement it using Python's multiprocessing.shared_memory, but it seems like such a hassle :D

Thanks for the responses, its nice to just talk to an expert in the field :)

@sybrenjansen
Copy link
Owner

Can you point me to the functionality in Ray that you're referring to? I'm curious what you mean exactly.

But if you're looking for a similar API as a pipe/queue, you could inherit from the Queue class and override the get and put commands so that they work together with shared_memory. If you only need to read from this shared data then you can also pass on data using inheritance in the Process class, as I explained before. But if you also need to be able to write, without serialization, then I guess your only options are shared_memory or perhaps multiprocessing.Value or multiprocessing.Array.

The task queue can then just be a standard queue, and the results queue is the new queue class. You pass the shared_memory object in the constructor of that results queue and store it. You then also keep track of which slots in the memory are already taken by, e.g., using a multiprocessing.Array object with booleans (with locking, but the overhead of this is unnoticable) or a list of Event objects. Then, the put function will write the result of your function to a free slot in the shared memory and put the index in the actual queue (we're only serializing a single integer here). You then also change the boolean value to True indicating it is used. In the get function it will receive the index and you can there retrieve the corresponding results from shared memory and return that to the user. You can then also set the index to free again by resetting the boolean. Then the put knows it can use that slot again. Whenever there are no free slots available, the put has to wait for a slot to come available again. You can play with the size of the shared memory to optimize for speed or memory usage.

The downside of shared_memory is that it only supports limited data types, though. If that's all you need, then that could be a good option. Otherwise, I think there's no way of overcoming serialization. Either through a socket (standard pipe/queue) or through a file on disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants