Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate the _datetime extension module #117398

Open
Tracked by #103092
neonene opened this issue Mar 31, 2024 · 24 comments
Open
Tracked by #103092

Isolate the _datetime extension module #117398

neonene opened this issue Mar 31, 2024 · 24 comments
Labels
extension-modules C modules in the Modules dir topic-subinterpreters type-feature A feature request or enhancement

Comments

@neonene
Copy link
Contributor

neonene commented Mar 31, 2024

Feature or enhancement

Proposal:

I hope this issue will complete _datetime isolation.

My concerns (and answers)
  • Py_MOD_PER_INTERPRETER_GIL_SUPPORTED should be applied in sync with _zoninfo?

    • YES: Possible.
  • Can a module state have a C-API structure, keeping a capsule just for comatibility?

    • NO: Possible, but a user should not touch the module state.
  • C-API supports only the main interpreter? Otherwise, PyInterpreterState is acceptable to point each structure?

    • NO: PyDateTimeAPI cannot emit an error. Also, no PyInterpreterState member is accessible from datetime.h. UPDATE: Seems to be possible by using a global function pointer instead of a function.

Specific issue:

Links to previous discussion of this feature:

Linked PRs (closed)

Linked PRs

@neonene neonene added the type-feature A feature request or enhancement label Mar 31, 2024
@neonene
Copy link
Contributor Author

neonene commented Mar 31, 2024

PR #117399 fails except Windows. Is there an easy way to access a PyInterpreterState member? I'll try later allowing C-API only for the main-interpreter with a global variable.

@erlend-aasland
Copy link
Contributor

I briefly discussed the C API capsule with @pganssle on the 2023 (or was it 2022?) language summit. I also asked the Steering Council to comment about backwards compatibility concerns regarding datetime.h (which is not included by Python.h):

Putting it in the PyInterpreterState struct is an option, but I'm not sure how much we want to bloat that struct. We probably want to deprecate the current capsulated API and introduce a new capsulated API.

vstinner added a commit to vstinner/cpython that referenced this issue May 5, 2024
Move types to the datetime_state structure of the _datetime
extension.
vstinner added a commit to vstinner/cpython that referenced this issue May 5, 2024
Move types to the datetime_state structure of the _datetime
extension.
@vstinner
Copy link
Member

vstinner commented May 5, 2024

Converting the _datetime extension to multiphase initialization and/or converting static types to heap types is complicated because:

  • <Include/datetime.h> C API
  • datetime capsule (C API)
  • If the _datetime extension is reloaded, the C API expects to meet the same types. Otherwise, PyDateTime_Check() fails and everything else fails.

It reminds me the very complicated case of the PyAST C API. We managed to convert the _ast extension to heap types and multiphase init by moving state to the interpreter state. Other "trade offs" attemps for the _ast extension ended by introducing crashes which were hard to trigger and hard to fix.

So I propose this plan to isolate the datetime module:

  • Move references to types to datetime_state structure: I wrote a minimum PR gh-117398: Move types to datetime state #118606 for that
  • Convert static types to heap types, but create them only once, and don't delete them (on purpose, for now).
  • Move datetime_state to PyInterpreterState.
  • At the point, discuss how to deal with remaining issues: multiphase init, C API, capsule, etc.

@neonene
Copy link
Contributor Author

neonene commented May 5, 2024

Where can we see your rationale for the inflating PyInterpreterState with PyAST C-API? IIUC, the practice is supposed to be bad: #103092 (comment).

How do you associate the PyAST issue with PyDateTime in terms of the mechanizm and impact?

@encukou, @ericsnowcurrently

@vstinner
Copy link
Member

vstinner commented May 6, 2024

How do you associate the PyAST issue with PyDateTime in terms of the mechanizm and impact?

Both provide a C API and we wanted to isolate their C extension.

Where can we see your rationale for the inflating PyInterpreterState with PyAST C-API?

I will try to dig into the bug tracker later. In short, there were 2-3 crashes related to the isolation of the _ast extension. Crashes related to the C API.

vstinner added a commit to vstinner/cpython that referenced this issue May 8, 2024
Move types to the datetime_state structure of the _datetime
extension.
@neonene
Copy link
Contributor Author

neonene commented May 10, 2024

Python-ast.c (3.10.0 alpha 0: b1cc6ba)

int PyAST_Check(PyObject* obj)
{
    astmodulestate *state = get_global_ast_state();
    if (state == NULL) {
        return -1;
    }
    return PyObject_IsInstance(obj, state->AST_type);
}

Previously, PyAST_Check() imported the _ast module through get_global_ast_state(). The implicit import could get an unsafe pseudo module by replacing __import__() with a lazy import (strongly discouraged since Python 3.3).

datetime.h

#define PyDate_Check(op) PyObject_TypeCheck((op), PyDateTimeAPI->DateType)
#define PyDate_CheckExact(op) Py_IS_TYPE((op), PyDateTimeAPI->DateType)

PyDate_Check() requires the real module to be imported in advance by calling PyCapsule_Import() explicitly.

At least, the regression case in PyAST C-API above cannot be applied to PyDateTime C-API, I think.

vstinner added a commit to vstinner/cpython that referenced this issue May 10, 2024
Move types to the datetime_state structure of the _datetime
extension.
@vstinner
Copy link
Member

The purpose of this issue is to convert the _datetime extension to the multiphase C API. The problem is to make sure that we are dealing with the same types if multiple instances of the extension are created.

Example:

import sys

import _datetime
now = _datetime.datetime.now()
del _datetime
del sys.modules['_datetime']

import _datetime
print(isinstance(now, _datetime.datetime))

In general, we don't require this. For example, it's false with the _random extension:

import sys

import _random
rng = _random.Random()
del _random
del sys.modules['_random']

import _random
print(isinstance(rng, _random.Random))

The problem for _datetime is the C API and the capsule. Currently, the C API is based on PyDateTimeAPI variable (one variable is created every time the datetime.h header is included.) I would prefer PyDateTime_Check() to return true even if there are two instances of the datetime module.

I would care less if the C API would be stateful: pass explicitly a module/state/something. But changing the C API is out of the scope of this issue.

@erlend-aasland
Copy link
Contributor

But changing the C API is out of the scope of this issue.

The SC said that the datetime C API is protected by PEP-387, even though datetime.h is not explicitly included in Python.h. IMO, we should deprecate the current C API and introduce a new one. That is the easiest solution technically, and the easiest to relate to for users. I agree with Victor that it deserves its own issue (and perhaps also a Discourse topic).

vstinner added a commit that referenced this issue May 10, 2024
Move types to the datetime_state structure of the _datetime
extension.
@vstinner
Copy link
Member

I merged my first change "Move types to datetime state".

@neonene: Do you want to propose changes to convert static types to heap types? Maybe start with a PR to convert only 4 types to make the PR short enough so it's easier to review.

Types:

    PyTypeObject *date_type;
    PyTypeObject *datetime_type;
    PyTypeObject *delta_type;
    PyTypeObject *isocalendar_date_type;
    PyTypeObject *time_type;
    PyTypeObject *tzinfo_type;
    PyTypeObject *timezone_type;

@vstinner
Copy link
Member

IMO, we should deprecate the current C API and introduce a new one.

If we manage to isolate the _datetime extension without breaking the C API, I'm not sure that it's needed.

@neonene
Copy link
Contributor Author

neonene commented May 10, 2024

I'll wait python/steering-council#243, which is a necessary procedure.

@ncoghlan
Copy link
Contributor

ncoghlan commented May 12, 2024

A key functional test case in relation to python/steering-council#243 is ensuring that previous references to the capsule API still work after a module reload. Specifically, this sequence:

  1. capsule reference is acquired
  2. module is reloaded
  3. capsule reference is used

This will have historically worked due to the implicit module caching in the single-phase init support for extension modules, but I'm concerned that a full migration to true reload support will break it (and likely crash the interpreter in the process).

I'd also be genuinely surprised if such a test case already exists, hence the green CI on #118337 may not be enough to show that the migration to multi-phase init hasn't broken anything.

@neonene
Copy link
Contributor Author

neonene commented May 12, 2024

Any reasonable producer is welcome. I think the concern about the capsule is equivalent to that about the module state, because the capsule reference is in sync with the new module. If the behavior were problematic, not only _datetime but also all modules should rely on PyInterpreterState.

@ericsnowcurrently
Copy link
Member

FYI, I'm hoping to wrap this up (or get really close) during the PyCon US sprints. I'll be here through Thursday. @pganssle and @erlend-aasland are here too, so I'll get as much help as I can.

CC @1st1

@ncoghlan
Copy link
Contributor

@neonene the missing test coverage I was referring to on the SC issue is described in this comment: #117398 (comment)

@neonene
Copy link
Contributor Author

neonene commented May 23, 2024

the missing test coverage I was referring to on the SC issue is described in this comment: #117398 (comment)

Can you explain what needs to be tested, using _datetime API?

Concerning the new generic C-API design, I doubt at this point the need to keep pointers unchanged across module-reloads. The new API can be offered exclusively for third-party modules with new usage, and the migration will not be required if the module-authors care about backwards compatibility.

@ericsnowcurrently
Copy link
Member

FYI, I chatted with @pganssle and @erlend-aasland (and @encukou) at the PyCon US sprints and we resolved on the following plan for beta 2:

  1. basic multi-phase init
  2. convert to heap types
  3. convert to module state (mostly)
  4. disable the C-API for subinterpreters (but preserve status quo for main interp)

Then we can separately tackle getting the C-API working for subinterpreters.

@erlend-aasland is planning on tackling the first three, since he's mostly done it all gh-102995. I'm making a plan for step 4. The timeline is short (beta 2 is in ~10 days, per Thomas), but given all the prior work I'm confident we can get it done in time.

FWIW, various previously merged PRs will help make some of these steps easier, so thanks!

@ericsnowcurrently
Copy link
Member

Here's my tentative plan for the datetime C-API:

  • all 5 types will stay static and we'll use the trick we use for builtin types to address cross-interpreter issues
  • the "TimeZone_UTC" will become a static global singleton, instead of allocating it from the heap
  • the zero-offset delta object (needed for UTC) will also become a static global singleton
  • we'll make sure the functions in the datetime C-API operate relative to the current interpreter when called

I'll need to double-check that the difference between per-interpreter and per-module is benign.

@ericsnowcurrently
Copy link
Member

@neonene does that plan make sense based on the work you have done?

@neonene
Copy link
Contributor Author

neonene commented May 23, 2024

I think the plans do not conflict with _datetimemodule.c PoC, which is not in sync with main after c2627d6, though. Feel free to separate or take over.

  1. basic multi-phase init

Now that #119472 is applied, maybe the sequence could be reconsidered?

@neonene
Copy link
Contributor Author

neonene commented May 24, 2024

@ericsnowcurrently

  • all 5 types will stay static and we'll use the trick we use for builtin types to address cross-interpreter issues

Does the tp_dict of them keep valid after the interned strings cleanup? (Issue: #118608)

@ericsnowcurrently
Copy link
Member

tp_dict, tp_sublasses, and the weakrefs are all a problem. I'd like to see if we can use the same solution as for the builtin static types.

@neonene neonene closed this as completed May 24, 2024
@neonene neonene reopened this May 24, 2024
@neonene
Copy link
Contributor Author

neonene commented May 24, 2024

  1. disable the C-API for subinterpreters (but preserve status quo for main interp)

Then we can separately tackle getting the C-API working for subinterpreters.

At least the zoneinfo module will not work on subinterpreters (e.g. #112100) with this separation, right?

(Sorry for the close by mistake)

@ericsnowcurrently
Copy link
Member

That step 4 is mostly obsolete. With the solution I merged for the datetime C-API, the problems of isolation for subinterpreters there are mostly resolved. We only have to make sure the exposed types are isolated (e.g. by applying the same solution as for builtin types).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-subinterpreters type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

6 participants