Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] bpo-36142: Rework Python init to add _PyPreConfig #12087

Closed
wants to merge 6 commits into from

Conversation

@vstinner
Copy link
Member

vstinner commented Feb 28, 2019

I added a _PyCoreConfig structure to Python 3.7 which contains almost
all parameters used to configure Python. Problems: _PyCoreConfig uses
bytes and Unicode strings (char* and wchar_t*) whereas it is also
used to setup the memory allocator and (filesystem, locale and stdio)
encodings.

I propose to add a new _PyPreConfig which is the "strict minimum"
configuration to setup encodings and the memory allocator. In
practice, it also contains parameters which directly or indirectly
impacts the allocator and encodings. For example, isolated impacts
use_environment which impacts the allocator (PYTHONMALLOC environment
variable). Another example: dev_mode=1 sets the allocator to "debug".

The command line arguments are now parsed twice. _PyPreConfig only
parses a few parameters like -E, -I and -X. A temporary _PyPreCmdline
is used to store command line arguments like -X options.

I moved structures closer to where they are used. "Global" _PyMain
structure has been removed. _PyCmdline now lives way shorter than
previously and is moved from main.c to coreconfig.c. The idea is to
better control when and how memory is allocated.

In term of API, we get something like:

_PyCoreConfig config = _PyCoreConfig_INIT;
config.preconfig.stdio_encoding = "iso8859-1";
config.preconfig.stdio_errors = "replace";
config.user_site_directory = 0;
...

_PyInitError err = _Py_InitializeFromConfig(&config);
if (_Py_INIT_FAILED(err)) {
    _Py_ExitInitError(err);
}
...
Py_Finalize();
return 0;

"config.preconfig.stdio_errors" syntax isn't great, but it's simpler
to implement than duplicating all _PyPreConfig fields into
_PyCoreConfig.

Changes:

  • Add Python/preconfig.c file

  • Add _PyPreConfig structure: pre-initialization configuration for
    allocator and encodings.

  • _PyCoreConfig now has a "_PyPreConfig preconfig;" field

  • Add _PyArgv structure: helper to pass command line argument in
    bytes (char*) or Unicode (wchar_t*)

  • Add exitcode to _PyInitError

    • Add 'exitcode' field to _PyInitError structure
    • Add _Py_INIT_EXIT() macro
    • Rename _Py_FatalInitError() to _Py_ExitInitError()
  • Add new fields to _PyCoreConfig:

    • run_command
    • run_filename
    • run_module
    • skip_source_first_line
  • Add _PyMain_Usage()

  • Hardcode short and long options in _PyOS_GetOpt().

  • Make _PyCoreConfig_xxx() functions internal (Py_BUILD_CORE).

  • Move _PyCmdline from main.c to coreconfig.c

  • main.c: Remove _PyMain structure

https://bugs.python.org/issue36142

@vstinner vstinner requested a review from python/windows-team as a code owner Feb 28, 2019
@vstinner vstinner changed the title bpo-36142: Rework Python init to add _PyPreConfig [WIP] bpo-36142: Rework Python init to add _PyPreConfig Feb 28, 2019
@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Feb 28, 2019

@ncoghlan @ericsnowcurrently @methane @zooba: Even if this PR doesn't fix all problems, it should make the Python initialization a little bit better :-) What do you think of the overall design?

About the current PR, I'm not sure that it's a good idea to push such giant commit at all. I can split it into several changes, but it may be even more verbose since each change has to modify multiple files.

I added a _PyCoreConfig structure to Python 3.7 which contains almost
all parameters used to configure Python. Problems: _PyCoreConfig uses
bytes and Unicode strings (char* and wchar_t*) whereas it is also
used to setup the memory allocator and (filesystem, locale and stdio)
encodings.

I propose to add a new _PyPreConfig which is the "strict minimum"
configuration to setup encodings and the memory allocator. In
practice, it also contains parameters which directly or indirectly
impacts the allocator and encodings. For example, isolated impacts
use_environment which impacts the allocator (PYTHONMALLOC environment
variable). Another example: dev_mode=1 sets the allocator to "debug".

The command line arguments are now parsed twice. _PyPreConfig only
parses a few parameters like -E, -I and -X. A temporary _PyPreCmdline
is used to store command line arguments like -X options.

I moved structures closer to where they are used. "Global" _PyMain
structure has been removed. _PyCmdline now lives way shorter than
previously and is moved from main.c to coreconfig.c. The idea is to
better control when and how memory is allocated.

In term of API, we get something like:

    _PyCoreConfig config = _PyCoreConfig_INIT;
    config.preconfig.stdio_encoding = "iso8859-1";
    config.preconfig.stdio_errors = "replace";
    config.user_site_directory = 0;
    ...

    _PyInitError err = _Py_InitializeFromConfig(&config);
    if (_Py_INIT_FAILED(err)) {
        _Py_ExitInitError(err);
    }
    ...
    Py_Finalize();
    return 0;

"config.preconfig.stdio_errors" syntax isn't great, but it's simpler
to implement than duplicating all _PyPreConfig fields into
_PyCoreConfig.

Changes:

* Add Python/preconfig.c file
* Rename Include/coreconfig.h to Include/cpython/coreconfig.h.
* Add Include/internal/pycore_coreconfig.h
* Add _PyPreConfig structure: pre-initialization configuration for
  allocator and encodings.
* _PyCoreConfig now has a "_PyPreConfig preconfig;" field
* Add _PyArgv structure: helper to pass command line argument in
  bytes (char*) or Unicode (wchar_t*)
* Add exitcode to _PyInitError

  * Add 'exitcode' field to _PyInitError structure
  * Add _Py_INIT_EXIT() macro
  * Rename _Py_FatalInitError() to _Py_ExitInitError()

* Add new fields to _PyCoreConfig:

  * run_command
  * run_filename
  * run_module
  * skip_source_first_line

* Add _PyMain_Usage()
* _Py_InitializeCore() no longer changes the memory allocator.
* Hardcode short and long options in _PyOS_GetOpt().
* Make _PyCoreConfig_xxx() functions internal (Py_BUILD_CORE).
* Move _PyCmdline from main.c to coreconfig.c
* main.c: Remove _PyMain structure
@vstinner vstinner force-pushed the vstinner:preparse branch from 94b2bd2 to 111371d Mar 1, 2019
@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Mar 1, 2019

I continued to work in the PR after I posted it to finish to cleanup the code. I succeeded to fix how and when the memory allocator is set.

The drawback is that when Py_Initialize() is called before Py_Main(), PYTHONMALLOC is now ignored:

  • Py_Initialize() is no longer able to change the memory allocator: this API doesn't git with _PyPreConfig design (pre-initialize Python and only later really initialize Python).
  • Py_Main() now tries to change the memory allocator. Problem: Py_Initialize() already allocated memory which remains alive after Py_Initialize(). There is an interpreter which contains _PyCoreConfig for example.

Technically, everything would be possible, like supporting this corner case. But it would make the code way more complex. It requires to change temporarily the memory allocator each time we allocate memory, to later be able to use the same allocator to release memory. Use such pattern:

PyMemAllocatorEx old_alloc;
_PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
...
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);

That's what I did in Python 3.7, but it's really ugly and risky. If any function isn't protected properly with this pattern, we use the wrong allocator and everything goes wrong (Python will likely crash).

vstinner added 2 commits Mar 1, 2019
@vstinner vstinner requested a review from Mar 1, 2019
@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Mar 1, 2019

I'm now confident that a new "pre-initialization" step is required before _PyCoreConfig initialization (Py_Initialize). It makes the code more reliable and safer.

@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Mar 1, 2019

This PR is just too big to be merged at once. Since I now found what I would like to do, I will split this PR into multiple smaller changes.

vstinner added 2 commits Mar 1, 2019
Rename and move it to preconfig.c.
@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Mar 1, 2019

I started to rewrite this giant PR from scratch as simpler changes which are way easier to follow and understand. I close this PR.

@vstinner vstinner closed this Mar 1, 2019
@vstinner vstinner deleted the vstinner:preparse branch Mar 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.