Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Simple ProcessPoolExecutor example code fails in VSCode-Python #1228

Closed
DonJayamanne opened this issue Mar 11, 2019 · 6 comments
Closed

Simple ProcessPoolExecutor example code fails in VSCode-Python #1228

DonJayamanne opened this issue Mar 11, 2019 · 6 comments

Comments

@DonJayamanne
Copy link
Contributor

@ericdrobinson commented on Fri Mar 08 2019

The following simple script was adapted from the "Executing code in thread or process pools" documentation for Python 3.7. The adaptations allow the code to be run in Python 3.6.

import asyncio
import concurrent.futures
import os

def cpu_bound():
    print(os.getpid())
    return sum(i * i for i in range(10 ** 7))

async def main():
    print(os.getpid())
    loop = asyncio.get_event_loop()

    # Run in a custom process pool:
    with concurrent.futures.ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, cpu_bound)
        print('custom process pool', result)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    loop.close()
    print('done')

Running the code above with the Python: Current File (Integrated Terminal) debugger profile appears to result in a deadlock (no pid is printed from the cpu_bound function). Running that very same code directly with Python 3.6 via the command line works as expected. If the ProcessPoolExecutor is replaced with ThreadPoolExecutor the code appears to run fine using the debugger.

Environment data

  • VS Code version: 1.32.1
  • Extension version: 2019.2.5558
  • OS and version: macOS 10.14.3
  • Python version (& distribution if applicable, e.g. Anaconda): Anaconda 3.6.7
  • Type of virtual environment used (N/A | venv | virtualenv | conda | ...): conda
  • Relevant/affected Python packages and their versions: NA

Expected behaviour

The script runs to completion and prints two pid numbers and the result of the computation:

10385
10386
custom process pool 333333283333335000000
done

Actual behaviour

The script prints the first pid and then pauses until the user manually stops the debugger. Output:

10412

Steps to reproduce:

  1. Setup a basic Python 3.6 environment in VSCode.
  2. Create an __init__.py file and add the above script to it.
  3. Use the vscode-python extension to attempt to debug the __init__.py file with the "Python: Current File (Integrated Terminal)" launch profile.

Logs

Output for Python in the Output panel: None.

Output from Console under the Developer Tools panel: None. (Not strictly true. There is an "[Extension Host] undefined session received in acceptDebugSessionStarted" error but it seems unrelated as it appears regardless of the contents of the file being debugged.)

@karthiknadig
Copy link
Member

Note due to #943, you have to set multiprocessing to use spawn. With the following code and master version of ptvsd i cannot repro this issue:

import multiprocessing
multiprocessing.set_start_method('spawn', True)

import os
import concurrent.futures
import asyncio


def cpu_bound():
    print(os.getpid())
    return sum(i * i for i in range(10 ** 7))


async def main():
    print(os.getpid())
    loop = asyncio.get_event_loop()

    # Run in a custom process pool:
    with concurrent.futures.ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, cpu_bound)
        print('custom process pool', result)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    loop.close()
    print('done')

Launch json:

        {
            "name": "Terminal",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "pathMappings": [
                {
                    "localRoot": "${workspaceFolder}",
                    "remoteRoot": "."
                }
            ],
            "subProcess": true,
        },

This is what i see in the terminal:

6281
6297
custom process pool 333333283333335000000
done

@karthiknadig
Copy link
Member

@ericdrobinson Let me know if the above change to code works for you.

@ericdrobinson
Copy link

ericdrobinson commented Mar 27, 2019

@karthiknadig That did indeed work for the specific example I outlined in the initial ticket.

Unfortunately, the code I posted in the initial ticket is a very minimal repro-case. The actual environment I'm working with is a bit more complicated.

Specifically, I first ran into this issue while working with azure-functions-python-worker. I created the minimal repro case above to help focus the investigation. The experience I had in the azure-functions-python-worker context was the same: the code would start to run fine but would never appear to do anything after the run_in_executor call.

However, when I tried your fix in the actual project it broke things. Specifically, I encountered the following exception:

Exception has occurred: concurrent.futures.process.BrokenProcessPool
A process in the process pool was terminated abruptly while the future was running or pending.
  File ".../HttpTrigger/__init__.py", line 76, in main
    results = await loop.run_in_executor(singleton_executor, do_work, req_data)
  File "/usr/local/Cellar/azure-functions-core-tools/2.4.419/workers/python/worker.py", line 39, in main
    args.grpc_max_msg_len))
  File "/usr/local/Cellar/azure-functions-core-tools/2.4.419/workers/python/worker.py", line 46, in <module>
    main()

As I understand it, the azure-functions-python-worker layer of the Functions stack runs the code in my module in its own ThreadPool. Is it possible that it may be too late to call set_start_method as suggested with such an architecture?

@Anapo14
Copy link
Contributor

Anapo14 commented Apr 3, 2019

Hi @ericdrobinson! Is there anyway you can provide us with more information? For example, your launch.json and settings.json files as well as some of the code you are working with? It's difficult for us to repro this issue without that information. As well, have you tried using the Azure Functions extension for VS Code? It allows you to debug Azure Functions directly from VS Code. Currently, Python support is in preview mode, but it might help if you take a look at the link here.

@ericdrobinson
Copy link

ericdrobinson commented Apr 4, 2019

@Anapo14 Please find my responses broken out below:

Is there anyway you can provide us with more information? For example, your launch.json

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: Terminal (integrated)",
      "type": "python",
      "request": "launch",
      "program": "${file}",
      "console": "integratedTerminal",
    },
    {
      "name": "Attach to Python Functions",
      "type": "python",
      "request": "attach",
      "port": 9091,
      "host": "localhost",
      "preLaunchTask": "runFunctionsHost"
    }
  ]
}

and settings.json files

{
  "azureFunctions.projectRuntime": "~2",
  "azureFunctions.projectLanguage": "Python",
  "azureFunctions.templateFilter": "Verified",
  "azureFunctions.deploySubpath": "[REDACTED].zip",
  "azureFunctions.preDeployTask": "funcPack",
  "files.exclude": {
    "obj": true,
    "bin": true
  },
  "azureFunctions.pythonVenv": ".env",
  "debug.internalConsoleOptions": "neverOpen",
  "editor.rulers": [79],
  "python.pythonPath": ".env/bin/python"
}

as well as some of the code you are working with?

No, because that's proprietary and not something I'm at liberty to post in an open forum such as this.

That said, I will point out that running the exact code I initially posted in a simple HTTP Trigger will produce exactly the updated experience I described.

To that end, please see the following:

Environment Data

  • Visual Studio Code version: 1.32.3
  • Extension version: 2019.2.6352
  • Azure Functions Extension version: 0.16.0
  • Azure Functions Core Tools version: 2.4.419
  • Function Runtime version: 2.0.12332.0

Steps to Reproduce

  1. Create a Python Function following the instructions outlined here.
  2. After the "Create an HTTP triggered function" step, replace the default code with the following:
    import asyncio
    import concurrent.futures
    import multiprocessing
    import os
    
    multiprocessing.set_start_method('spawn', force=True)
    
    def cpu_bound():
    
        print(os.getpid())
    
        # CPU-bound operations will block the event loop:
        # in general it is preferable to run them in a
        # process pool.
        return sum(i * i for i in range(10 ** 7))
    
    async def main(req, context):
    
        print(os.getpid())
        loop = asyncio.get_event_loop()
    
        # Run in a custom process pool:
        with concurrent.futures.ProcessPoolExecutor() as pool:
            result = await loop.run_in_executor(
                pool, cpu_bound)
            print('custom process pool', result)
  3. Run the function locally via the "Attach to Python Functions" launch configuration.

Behaviour

As configured above, the debugger pauses on the following exception:

Exception has occurred: concurrent.futures.process.BrokenProcessPool
A process in the process pool was terminated abruptly while the future was running or pending.
  File ".../HttpTrigger/__init__.py", line 25, in main
    pool, cpu_bound)
  File "/usr/local/Cellar/azure-functions-core-tools/2.4.419/workers/python/worker.py", line 39, in main
    args.grpc_max_msg_len))
  File "/usr/local/Cellar/azure-functions-core-tools/2.4.419/workers/python/worker.py", line 46, in <module>
    main()

At this point, the Terminal output (for the terminal named "Task - runFunctionsHost") is as follows:

Executing 'Functions.HttpTrigger' (Reason='This function was programmatically called via the host APIs.', Id=fa8783e1-1243-4928-b5b8-6c013ac527c6)
13968
Process Process-7:
Traceback (most recent call last):
  File "~/anaconda3/envs/azure-func/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "~/anaconda3/envs/azure-func/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "~/anaconda3/envs/azure-func/lib/python3.6/concurrent/futures/process.py", line 169, in _process_worker
    call_item = call_queue.get(block=True)
  File "~/anaconda3/envs/azure-func/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
ModuleNotFoundError: No module named '__azure__'

If you comment out the multiprocessing.set_start_method('spawn', force=True) line and try again, you end up in the same exact "hung" scenario outlined in the original bug report. See:

Executing 'Functions.HttpTrigger' (Reason='This function was programmatically called via the host APIs.', Id=aa5b9a5a-a435-4aa7-a1ff-d024b7228090)
14071

I should mention that I also "chronicled" my attempts to figure out how to use the ProcessPoolExecutor in Python Functions in this azure-functions-python-worker issue comment. In that comment I referred to the issue that I opened which spawned the creation of this one. I mention this as it may provide some extra context as to what has lead me to this bug.


As well, have you tried using the Azure Functions extension for VS Code?

Yes. This is where I first ran into the issue described in the initial report. Please see the above sections of this comment for a deeper dive and a working example that shows how the workflow you suggest here breaks with the suggested workaround.

@fabioz
Copy link
Contributor

fabioz commented Apr 15, 2019

It seems that the issue is that multiprocessing is using fork in this scenario (this is being tracked in #943, so, I'm closing this one as a duplicate).

Unfortunately it seems that in the example given fork is required for the program to work, so, to debug in this scenario, you need to use programmatic breakpoints in the child process which has been spawn by fork and make an attach launch (see: https://code.visualstudio.com/docs/python/debugging). I know it's more work, but unfortunately right now there's no other way to use ptvsd with fork.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants