-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Bug
Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":
pydantic version: 1.5.1
pydantic compiled: False
install path: /home/badger/.cache/pypoetry/virtualenvs/antsibull-F3umzYPl-py3.8/lib/python3.8/site-packages/pydantic
python version: 3.8.3 (default, Feb 26 2020, 00:00:00) [GCC 9.3.1 20200408 (Red Hat 9.3.1-2)]
platform: Linux-5.6.15-200.fc31.x86_64-x86_64-with-glibc2.2.5
optional deps. installed: ['typing-extensions']
...
Use case
I'm going to give you two code snippets because it might not be obvious from the simplest case why I would want to do it.
A simple approximation of my use case is here: https://gist.github.com/abadger/bfd55741c281ccb534f7bbc8fe9b6202
I am trying to use pydantic to validate and normalize data from a large number of data sources I need to run each validation separately so that I can know which data sources are providing invalid data. I decided to split it up amongst multiple CPUs by using asyncio's run_in_executor with a ProcessPoolExecutor. However, when the pydantic.constr validation failed, I would get a BrokenProcessPool error on everything that had been queued but not run rather than a pydantic ValidationError on the specific task which failed.
Root cause
I was able to workaround the problem by catching the pydantic exception and raising a ValueError with all of the information I needed. This lead me to the root cause: pydantic errors are not unpicklable. Because of that, the exception raised in the worker process is pickled there and sent back to the parent process. The parent process attempts to unpickle it, encounters the error, and then gives the generic, unhelpful BrokenProcessPool error and cancels the other pending tasks.
Here's a reproducer for the root cause:
import pickle
from pydantic.errors import StrRegexError
p = pickle.dumps(StrRegexError(pattern='test'))
print(pickle.loads(p))
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# TypeError: __init__() missing 1 required positional argument: 'pattern'Looking at the python stdlib bugtracker there are many open bugs with interactions between pickle and exceptions. I didn't see this one so I added this: https://bugs.python.org/issue40917 Some others that might cause different bugs with pydantics exceptions:
Given so many potential bugs, I'm not sure if this is solvable in pydantic code or has to wait for pickle fixes. However, if it's not solvable, adding my workaround and an explanation of what's happening to the docs would be nice. That way searching for pydantic, ProcessPoolExecutor, pickle, multiprocessing might save the next person some time wondering why only a portion of their data was being converted.