Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubIntepreters don't work with DontWriteBytecodeFlag in Python 3.10 #358

Closed
ndjensen opened this issue Dec 3, 2021 · 12 comments
Closed
Labels

Comments

@ndjensen
Copy link
Member

ndjensen commented Dec 3, 2021

I built Python 3.10 in a Docker container on Linux and tested Jep against it and the tests all passed. However, when I turned on Python 3.10 in appveyor it failed the Windows build with an error in the PyConfig pre-inits.

======================================================================
FAIL: test_inits (test_preinits.TestPreInits)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\projects\jep\src\test\python\test_preinits.py", line 10, in test_inits
    jep_pipe(build_java_process_cmd('jep.test.TestPreInitVariables'))
  File "C:\Python310-x64\lib\contextlib.py", line 281, in helper
    return _GeneratorContextManager(func, args, kwds)
  File "C:\Python310-x64\lib\contextlib.py", line 103, in __init__
    self.gen = func(*args, **kwds)
  File "C:\projects\jep\src\test\python\jep_pipe.py", line 36, in jep_pipe
    assert False, stderr
AssertionError: b'Traceback (most recent call last):\n  File "C:\\Python310-x64\\lib\\io.py", line 52, in <module>\n  File "C:\\Python310-x64\\lib\\abc.py", line 184, in <module>\n  File "C:\\Python310-x64\\lib\\abc.py", line 106, in __new__\nRuntimeError: super(): __class__ cell not found\nFatal Python error: _PyThreadState_Delete: tstate 00000133DB27B8A0 is still current\nPython runtime state: initialized\n\nCurrent thread 0x000016bc (most recent call first):\n  <no Python frame>\n'
@ndjensen ndjensen added the defect label Dec 3, 2021
@bsteffensmeier
Copy link
Member

I am seeing the same error when building jep in the python:3.10 docker container using the Dockerfile below. I commented out various parts of TestPreInitVariables.java and found the problem goes away if we don't test Py_OptimizeFlag.

FROM python:3.10
RUN apt-get update && apt-get install --no-install-recommends -y openjdk-17-jdk-headless && apt-get clean
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
WORKDIR /jep
COPY commands commands
COPY src src
COPY setup.py .
COPY README.rst .
RUN python3.10 setup.py test install

ENTRYPOINT /usr/local/bin/jep

@bsteffensmeier
Copy link
Member

The problem is not specific to the Py_OptimizeFlag like I suspected. Instead the problem seems to be related to using subinterpreters when the bytecode is not loaded from *.pyc files in __pycache__. Since different optimization levels will trigger different pyc files to be used the failure occurs when a new optimization level is used for the first time in a subinterpreter.

I was able to eliminate the problem by precompiling python libraries with the optimization level set, either by just running the setup.py as python3 -O setup.py test or by running python -m compileall -o 1 /usr/local/lib/python3.10/*.py before running setup.py.

I was also able to trigger the problem with other tests by setting the env var PYTHONDONTWRITEBYTECODE, I had to also propagate that var into the tests so that no python process was writing pyc files. After doing that many tests started to fail and I checked several and they are using SubInterpreters.

It looks like right now SubInterpreters only work if there are cached pyc files for the system python libraries, we still need to do more digging to see if there is anything we can do about it or if this is a bug in python.

@ndjensen
Copy link
Member Author

ndjensen commented Dec 4, 2021

After doing that many tests started to fail and I checked several and they are using SubInterpreters.

Are you saying SharedInterpreters are ok?

I found on this ticket they changed the format of .pyc files, but they said it went into Python 3.10.1 and they don't seem concerned with breakage so it may be irrelevant.

@bsteffensmeier
Copy link
Member

bsteffensmeier commented Dec 5, 2021

Are you saying SharedInterpreters are ok?

Yes, I have only seen this problem in SubInterpreters

I found on this ticket they changed the format of .pyc files, but they said it went into Python 3.10.1 and they don't seem concerned with breakage so it may be irrelevant.

I've been trying to pinpoint when this may have started in python without much luck. I've gone all the way back to April with ad442a6 and i see the same problem. In Oct 2020 with 22220ae the problem isn't happening. I'm not sure how to narrow it down further without building and testing every python commit which is very time consuming.

If you would like to replicate it with the python you built yourself you just need to delete pycache in your python lib directory(where ever abc.py is)

@bsteffensmeier
Copy link
Member

bsteffensmeier commented Dec 7, 2021

I have narrowed it down to a specific commit in cpython where the problem first shows up: python/cpython@ea25180

This change is changing the way unicode objects are interned in sub-interpreters. I have not pieced together why this is causing the error we see but my top suspect is that the string compariosn for __class__ in super_init_without_args is failing which causes the cell to not be found. That comparison occurs here: https://github.com/python/cpython/blob/v3.10.0/Objects/typeobject.c#L8872

@vstinner Do you see any way ea25180 would cause problems for subinterpreters only when the bytecode is not loaded from pyc files?

@ndjensen ndjensen changed the title Jep does not work with some Python 3.10 installs SubIntepreters don't work with Python 3.10 Dec 7, 2021
@vstinner
Copy link

vstinner commented Dec 7, 2021

@bsteffensmeier: Hi, how can I reproduce the issue? Can you write a reproducer which doesn't use jep?

In Python 3.10, I modified _PyUnicode_FromId() to make _Py_IDENTIFIER() per interpreter. Each interpreter has its own Unicode object. Previously, the object was shared between all interpreters.

@bsteffensmeier
Copy link
Member

bsteffensmeier commented Dec 7, 2021

@bsteffensmeier: Hi, how can I reproduce the issue? Can you write a reproducer which doesn't use jep?

I will work on writing a simple reproducer but if you have any test program that uses subinterpreters then you should be able to reproduce it by deleting the pycache from you python libs(whichever directory has abc.py) and then setting PYTHONDONTWRITEBYTECODE=1 and run your program. It should fail as soon as soon as Py_NewInterpreter() is called to create the subinterpreter.

Edit: my initial attempts at a reproducer are not working, this may be more complicated than I suspected.

@bsteffensmeier
Copy link
Member

Edit: my initial attempts at a reproducer are not working, this may be more complicated than I suspected.

In my initial attempts I was not setting PYTHONDONTWRITEBYTECODE correctly. Any application with sub-interpreters should fail if that is set. To get a repeatable test I am testing in docker using the python:3.10 image. In this example I am setting Py_DontWriteBytecodeFlag in code but it also breaks if you set the env var.

FROM python:3.10

RUN rm -r /usr/local/lib/python3.10/__pycache__/
COPY subinterp_test.c .
RUN gcc $(python3.10-config --embed --cflags) subinterp_test.c $(python3.10-config --embed --ldflags) -o subinterp_test
RUN ./subinterp_test

And here is subinterp_test.c, it is just the minimal c code to set Py_DontWriteBytecodeFlag and start a subinterpreter.

#include "Python.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>

static PyThreadState* main_state;

void *on_thread(void *vargp){
    PyEval_AcquireThread(main_state);
    PyThreadState* state = Py_NewInterpreter();
    PyRun_SimpleString("print('In a sub interpreter')\n");
    Py_EndInterpreter(state);
    PyThreadState_Swap(main_state);
    PyEval_ReleaseThread(main_state);
    return NULL;
}

int main() {
    Py_DontWriteBytecodeFlag = 1;
    Py_Initialize();
    main_state = PyThreadState_Get();
    PyRun_SimpleString("print('Before creating sub interpreter')\n");
    Py_BEGIN_ALLOW_THREADS

    pthread_t thread_id;
    pthread_create(&thread_id, NULL, on_thread, NULL);
    pthread_join(thread_id, NULL);

    Py_END_ALLOW_THREADS
    PyRun_SimpleString("print('After running sub interpreter')\n");
    Py_Finalize();

    exit(0);
}

And finally here is the error when I build it:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/io.py", line 52, in <module>
  File "/usr/local/lib/python3.10/abc.py", line 184, in <module>
  File "/usr/local/lib/python3.10/abc.py", line 106, in __new__
RuntimeError: super(): __class__ cell not found
Fatal Python error: _PyThreadState_Delete: tstate 0x7f51e001c710 is still current
Python runtime state: initialized

Current thread 0x00007f51e6625700 (most recent call first):
  <no Python frame>
Aborted (core dumped)

@bsteffensmeier bsteffensmeier changed the title SubIntepreters don't work with Python 3.10 SubIntepreters don't work with DontWriteBytecodeFlag in Python 3.10 Dec 7, 2021
@vstinner
Copy link

vstinner commented Dec 7, 2021

Oh great! You did a great job to write a short reproducer! I reported the issue upstream https://bugs.python.org/issue46006 and I posted my analysis there.

@vstinner
Copy link

vstinner commented Dec 7, 2021

The bug triggers if the abc module is not cached (if there is no PYC file for the abc module). The workaround (until the bug is fixed) is to make sure that Python stdlib has cached PYC files. I'm not sure why the Windows CI doesn't have these PYC files. You can try to workaround the bug by compiling the stdlib Python files. See for example the compileall module:
https://docs.python.org/dev/library/compileall.html

@bsteffensmeier
Copy link
Member

The bug triggers if the abc module is not cached (if there is no PYC file for the abc module). The workaround (until the bug is fixed) is to make sure that Python stdlib has cached PYC files. I'm not sure why the Windows CI doesn't have these PYC files. You can try to workaround the bug by compiling the stdlib Python files. See for example the compileall module: https://docs.python.org/dev/library/compileall.html

@vstinner Thank you for following up on this! I agree the problem does not exist when the cached PYC files are present. I don't know if there are any real use cases that will run into this, we just happened to have a unit test that would trigger it. I am considering just skipping this particular test anytime jep is built against 3.10.0 or 3.10.1. If anyone is actually running into this in their environment they would have to implement the workarounds documented here.

@bsteffensmeier
Copy link
Member

Closing because this has been fixed in upstream python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants