Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF-MT: Fixes and enhancements #445

Merged
merged 11 commits into from
Oct 14, 2022
Merged

BF-MT: Fixes and enhancements #445

merged 11 commits into from
Oct 14, 2022

Conversation

detlefarend
Copy link
Member

@detlefarend detlefarend commented Oct 8, 2022

Description

Background

Checklists:

  • Read the CONTRIBUTION guide (required).
  • Update the history on the source code (required).

@detlefarend detlefarend added bug Something isn't working BF Basic Functions/Infrastructure next release labels Oct 8, 2022
@detlefarend detlefarend added this to the Sprint 10/2022 milestone Oct 8, 2022
@detlefarend detlefarend self-assigned this Oct 8, 2022
@detlefarend
Copy link
Member Author

Hi Steve, could you please try again?

@detlefarend detlefarend linked an issue Oct 8, 2022 that may be closed by this pull request
3 tasks
@steveyuwono
Copy link
Collaborator

Hi @detlefarend , I still got the same errors and also while running the howto 01 and 02. Previously I ran them in Ubuntu using a virtual machine and they were fine. I will continue checking this on Monday.
image

Regarding #446, PZoo just updated their version to 1.22.0 (less than a day). I also found this and have solved this issue in the MPC branch. I have rebased this branch and it is looking fine. They also introduce their own space, called gymnasium, which means that they can not recognize gym spaces anymore.

@detlefarend detlefarend linked an issue Oct 9, 2022 that may be closed by this pull request
3 tasks
@detlefarend detlefarend changed the title BF-MT: Fixed the Windows freeze problem BF-MT: Fixes and enhancements Oct 9, 2022
@detlefarend detlefarend added the enhancement New feature or request label Oct 9, 2022
@detlefarend
Copy link
Member Author

Hi @steveyuwono, I added issue #449 to this PR and I moved the call of freeze_support() to the main sections of both howtos and removed it from bf.mt. In the best case both howtos are running even under Windows and nothing else is left to do...

@steveyuwono
Copy link
Collaborator

Hi @detlefarend, thank you for the updates. I have tried this again but unfortunately, the same issue occurs. I tested on my laptop, and the error message does not appear, but the run is always stuck in the same process as where the error message appears on my PC. Have you tested this file under Windows? Does it work in yours?

@detlefarend
Copy link
Member Author

Hi @steveyuwono, I didn't test it under Windows. It runs properly on my Linux PC... Just to exclude some error sources:

  • You tried it in branch bf_mt_windows_freeze
  • Under Linux it runs properly
  • The problem occurs under Windows only

Which Python version are you running?

@steveyuwono
Copy link
Collaborator

steveyuwono commented Oct 10, 2022

Hi @steveyuwono, I didn't test it under Windows. It runs properly on my Linux PC... Just to exclude some error sources:

  • You tried it in branch bf_mt_windows_freeze
  • Under Linux it runs properly
  • The problem occurs under Windows only

Which Python version are you running?

3.9 for my PC and 3.7 for my laptop. Shall I try on Marlon's or someone else's PC to check whether it is a device-specific issue?

@detlefarend
Copy link
Member Author

Hi @steveyuwono, I didn't test it under Windows. It runs properly on my Linux PC... Just to exclude some error sources:

  • You tried it in branch bf_mt_windows_freeze
  • Under Linux it runs properly
  • The problem occurs under Windows only

Which Python version are you running?

3.9 for my PC and 3.7 for my laptop. Shall I try on Marlon's or someone else's PC to check whether it is a device-specific issue?

Yes please

@steveyuwono
Copy link
Collaborator

Hi @steveyuwono, I didn't test it under Windows. It runs properly on my Linux PC... Just to exclude some error sources:

  • You tried it in branch bf_mt_windows_freeze
  • Under Linux it runs properly
  • The problem occurs under Windows only

Which Python version are you running?

3.9 for my PC and 3.7 for my laptop. Shall I try on Marlon's or someone else's PC to check whether it is a device-specific issue?

Yes please

Also not working on Marlon's PC

@rizkydiprasetya
Copy link
Contributor

rizkydiprasetya commented Oct 11, 2022

@detlefarend Tried the following on branch bf_mt_windows_freeze

  • Mac, doesnt work
  • Linux, works
  • Windows, doesnt work

@detlefarend
Copy link
Member Author

Thanks for checking. Calling freeze_support() is obviously not enough. I'll take a look...

@detlefarend
Copy link
Member Author

Hi guys, please temporarily add the following code directly under if/main before the call of freeze_support():

start_method = mp.get_start_method()
print(start_method)
if start_method != 'fork': mp.set_start_method(method='fork', force=True)

@steveyuwono
Copy link
Collaborator

Hi guys, please temporarily add the following code directly under if/main before the call of freeze_support():

start_method = mp.get_start_method()
print(start_method)
if start_method != 'fork': mp.set_start_method(method='fork', force=True)

The method of 'fork' can not be processed.
image

@rizkydiprasetya
Copy link
Contributor

works on Mac, without calling the freeze_support(). And I would suggest to put the mp.set_start_method("fork") on mt.py instead on Howto.

@rizkydiprasetya
Copy link
Contributor

maybe some reference, windows doesnt support fork.

https://superfastpython.com/multiprocessing-start-method/

@rizkydiprasetya
Copy link
Contributor

Another reference, that fork is not safe in MacOS: python/cpython#77906
As I check again, I get unstable result by setting up the mp to 'fork' on mac.

Maybe we need to find another way instead of manually set to fork.

@detlefarend
Copy link
Member Author

Hi guys, I'm one step further now. I installed vscode and mlpro on my business pc and I can reproduce the problem now. The problem occurs when creating the shared object (not when creating a new process). I'll keep on track and let you know...

@detlefarend
Copy link
Member Author

Hi guys, both howtos are running now under Windows as well. I solved two problems: 1) Multiprocessing is to be started in "main" context only. 2) Python standard package multiprocessing uses pickle for serialization. I replaced it by a fork multiprocess which uses dill. Must be installed explicitely.

Under Linux the performance is as expected. Under Windows (my underpowered business laptop) the performance of the multiprocessing parts is not satisfying.

Could you please try both howtos under Windows/Mac and let me know the speed factors of howto mt.001? Thx!

Here are some related links:
https://pypi.org/project/multiprocess/
https://stackoverflow.com/questions/40234771/replace-pickle-in-python-multiprocessing-lib

@rizkydiprasetya
Copy link
Contributor

rizkydiprasetya commented Oct 12, 2022

I ran 3 times on mac, showed the following.

Screenshot 2022-10-12 at 19 47 03

@rizkydiprasetya
Copy link
Contributor

rizkydiprasetya commented Oct 12, 2022

Additional info:

  • If I decrease the number of task, then it runs fine. Above 30 tasks, it becomes unstable on native Mac. The chance of getting this error is like 3/10. If I put 50 tasks, then the chance of getting this error is 10/10.
  • I tried with more than 30 tasks in dockerized Ubuntu on Mac, works fine.

@rizkydiprasetya
Copy link
Contributor

rizkydiprasetya commented Oct 12, 2022

I might found where causing the problem. It is on the BaseManager socket listener. By default, it can hold up to 16 backlog of connection. I changed this value manually on the multiprocess package. It might be that my Mac is spamming too fast. After the changes, I have no error. As a result for speed factor:

  • Windows on Ryzen 7, 8 Cores 16 Threads
    • Multithread : 2.86
    • Multiprocessing : 2.55
  • Dockerized Ubuntu, 8 Cores 8 Threads
    • Multithread : 5.18
    • Multiprocessing : 16.42
  • Mac M1 Pro, 10 Cores
    • Multithread : 5.57
    • Multiprocessing : 25.84

@rizkydiprasetya
Copy link
Contributor

Further checks, I think on my Mac, the BaseManager is bottlenecking. If I increase the task and the backlog number. The speed factor stays the same for Multiprocessing. Too many backlog connection to the shared data.

Probably this will happen also if the system has more cores. I will check this on our server.

@detlefarend
Copy link
Member Author

Hi Rizky, I reworked both howtos:

Howto 001 (async algo)

  • Entire demo code is now running in main context to avoid side effects in multiprocessing
  • Number of tasks: 20
  • Duration/task: 1 sec
  • Call of freeze_support() added to stabilize the demo under Windows (but I didn't notice any effect)

Howto 002 (workflow with task-hierarchy)

  • Multiprocessing part removed

Could you please run Howto 001 under Linux, Mac, Windows once again?

@rizkydiprasetya
Copy link
Contributor

rizkydiprasetya commented Oct 13, 2022

Here is the result with no error:

  • Ubuntu (Dockerized):
    • Multithread : 6.6
    • Multiprocessing : 15.31
  • Windows (Lab PC):
    • Multithread : 4.05
    • Multiprocessing : 7.35
  • Mac:
    • Multithread : 7.01
    • Multiprocessing : 18.63

I have a question, how did you calculate this speed factor? Because as I increase the task, I gain more speed factor.

@detlefarend
Copy link
Member Author

Here is the result with no error:

* Ubuntu (Dockerized):
  
  * Multithread : 6.6
  * Multiprocessing : 15.31

* Windows (Lab PC):
  
  * Multithread : 4.05
  * Multiprocessing : 7.35

* Mac:
  
  * Multithread : 7.01
  * Multiprocessing : 18.63

That looks good!

I have a question, how did you calculate this speed factor? Because as I increase the task, I gain more speed factor.
It's calculated in line 102:

grafik

self._num_tasks -> number of tasks (20)
self._duration_sec -> fixed runtime of a single task (1 sec)
self._duration_real_sec -> measured runtime for the entire run of 20 tasks (sync/async)

@detlefarend
Copy link
Member Author

Shall we merge the stuffs?

@steveyuwono
Copy link
Collaborator

Shall we merge the stuffs?

Running properly. We can merge this

@detlefarend detlefarend merged commit bd3b5ec into main Oct 14, 2022
@detlefarend detlefarend deleted the bf_mt_windows_freeze branch October 14, 2022 07:23
@rizkydiprasetya
Copy link
Contributor

Yes this can be merge. I have only backlog problem from BaseManager on Mac. But, I think for now is enough, we are currently only using Windows and Linux. I will do more checks later on for Macs. I am still no sure whether this is a problem woth macs or due to fast computing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BF Basic Functions/Infrastructure bug Something isn't working enhancement New feature or request
Projects
None yet
3 participants