Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create async monitoring service #2034

Merged
merged 144 commits into from Jan 10, 2024
Merged

Create async monitoring service #2034

merged 144 commits into from Jan 10, 2024

Conversation

joj0s
Copy link
Contributor

@joj0s joj0s commented Jun 24, 2022

This PR contains a very basic initial implementation of the new asyncio based monitoring service. It currently only works for the local backend.

Also every file that I change is formatted using autopep8 so that we can hopefully format most of the files as we go on.

@joj0s
Copy link
Contributor Author

joj0s commented Jun 24, 2022

You can run ganga as PYTHONASYNCIODEBUG=1 ganga. This enables asyncio's debug mode which logs issues such as non thread safe calls and slow coroutines (>100ms) on the event loop.

@joj0s
Copy link
Contributor Author

joj0s commented Jun 29, 2022

@mesmith75 @alexanderrichards I pushed some changes to handle graceful shutdown. Basically I added a method that cancels all remaining tasks except for the one currently running. I do that in the thread's stop method which is called here by GangaThreadPool since I am subclassing GangaThread.

My question is, do you think that is enough? In your experience are there any termination signals that are not handled by GangaTheadPool in which case I should add some signal handlers?

@alexanderrichards
Copy link
Contributor

My question is, do you think that is enough? In your experience are there any termination signals that are not handled by GangaTheadPool in which case I should add some signal handlers?

From memory GangThreadPool should handle most cases. Although I can't remember where we register the signal handler for things like SIGINT/SIGTERM and it doesn't appear to be in the GangaThreadPool itself

@joj0s
Copy link
Contributor Author

joj0s commented Sep 13, 2023

@mesmith75 You are right, right now we only ensure that the environment is set up for DIRACOS, however the Python interpreter is inherited from the main process. Multiprocessing can't quite spawn new independent executables the way subprocess does, that is why we set the process environment manually, but I will try to use the library's set_executable method to set the Python interpreter to whatever DIRACOS uses.

@joj0s
Copy link
Contributor Author

joj0s commented Sep 15, 2023

@mesmith75 @egede here is the situation:

I will try to use the library's set_executable method to set the Python interpreter to whatever DIRACOS uses.

This solution will not work due to the multiprocessing library's limitations when working with an interactive Python shell such as our case. In particular, when we need to spawn a new process with a different Python interpreter instead of forking our current process which we have been doing so far.
References: https://stackoverflow.com/a/23641560, ipython/ipython#10894 (comment)

The current DIRAC concurrency setup yields by far the best performance results due to only instantiating the DIRAC process once and running all DIRAC tasks in an internal thread pool. However, it requires complex synchronization primitives such as events and queues which can only be used with the multiprocessing library and not the subprocess one. This means that if we are to stick with this implementation, we have to make sure that the internal DIRAC subprocess uses a DIRACOS version which is compatible with the system Python version, either tweaking the DIRACOS version selection process to match the system's Python interpreter version,or by enforcing a Python version limitation to the user which is compatible with the current automatic DIRACOS selection.

Alternatively, I could make use of our current asyncio based monitoring service and use asyncio's internal subprocess API for all communication between Ganga and the DIRAC subprocesses. This basically means that the DIRAC concurrency scheme will inherently remain the same, but the communication bottleneck with the main Ganga process will be lessened thanks to the existing asyncio monitoring infrastructure. This, along with the optimizations we previously made to Ganga's DIRAC monitoring methods should lead to significant performance improvements compared to the current Ganga system. However, this method won't be near as performant as the above implementation, but it will avoid the Python interpreter version limitation.

@egede
Copy link
Member

egede commented Oct 2, 2023

I am picking up on this again.

While we in the past have been forced to have different python versions for the DIRAC API and for Ganga, I do not see that as a strong constraint any longer. So we could for installations involving DIRAC, we could enforce the use of a specific python version. We could also make the Ganga version installed in /cvmfs use the python version distributed with DIRAC explicit which would solve the issue for 95% of our users.

@mesmith75
Copy link
Contributor

Although I am hesitant to start tying us to a particular python version or installation, that is not a terrible solution - the current Dirac installation uses an up-to-date python version so we wouldn't be losing out in features.

The only drawback is that we lose the access to all the packages in LCG (pandas etc) that we have currently been making available on the CVMFS installation but perhaps that is a small price to pay for significantly better performance.

@joj0s
Copy link
Contributor Author

joj0s commented Oct 10, 2023

@mesmith75 @egede So based on your input I will develop and submit a solution to match the Dirac version with the system version so it can be tested out.

setup.py Outdated Show resolved Hide resolved
egede
egede previously approved these changes Jan 9, 2024
@mesmith75 mesmith75 merged commit 5f2f98a into develop Jan 10, 2024
8 of 10 checks passed
@mesmith75 mesmith75 deleted the new_monitoring_service branch January 10, 2024 11:54
mesmith75 added a commit that referenced this pull request Jan 10, 2024
…ing_service

PEP8 fixes for PR #2034 (new_monitoring_service) by autopep8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants