Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move importlib functions to within C #408

Merged
merged 13 commits into from May 27, 2022
Merged

move importlib functions to within C #408

merged 13 commits into from May 27, 2022

Conversation

ahgamut
Copy link
Collaborator

@ahgamut ahgamut commented May 14, 2022

@jart this moves some of the frequently-used functions from Python code in _bootstrap_external.py to C in import.c.

All tests pass in MODE=, MODE=tiny.

The idea is that almost every line of Python code can cause a memory allocation, and so functions that are frequently called end up wasting memory repeatedly. By moving the Python code to C, it becomes more clear how the function calls occur, and we can attempt more optimizations at the lower C level.

For example: during APE startup, _validate_bytecode_header and code_to_bytecode are called back-to-back for every .pyc import. This leads to a repeated sequence like below

    def get_code(self, fullname):
        # small UnicodeObject allocation for path
        path = self.get_filename(fullname)
        # allocate a large BytesObject from the `.pyc` file
        data = self.get_data(path)
        # allocate another large BytesObject because slicing data[:12] after validating bytecode header
        bytes_data = _validate_bytecode_header(data, name=fullname, path=path)
        # allocate a CodeObject after evaluating the BytesObject
        return _compile_bytecode(bytes_data, name=fullname, bytecode_path=path)

In C we can mostly avoid the BytesObjects allocations and many function calls by just calling PyMarshal_ReadObjectFromFile with a valid FILE *.

It's a small speedup, but it adds up if all tests run slightly faster than before.

when _bootstrap_external is undergoing setup(), I modify the functions
and classes in its namespace (ie its __dict__) with those from _imp.
@ahgamut
Copy link
Collaborator Author

ahgamut commented May 14, 2022

Testing on MODE=dbg is showing some ubsan errors.

@ahgamut
Copy link
Collaborator Author

ahgamut commented May 14, 2022

diff --git a/libc/zipos/read.c b/libc/zipos/read.c
index 2767dd10c..4fc80fa4d 100644
--- a/libc/zipos/read.c
+++ b/libc/zipos/read.c
@@ -43,7 +43,7 @@ ssize_t __zipos_read(struct ZiposHandle *h, const struct iovec *iov,
   x = y = opt_offset != -1 ? opt_offset : h->pos;
   for (i = 0; i < iovlen && y < h->size; ++i, y += b) {
     b = min(iov[i].iov_len, h->size - y);
-    memcpy(iov[i].iov_base, h->mem + y, b);
+    if(iov[i].iov_base) memcpy(iov[i].iov_base, h->mem + y, b);
   }
   if (opt_offset == -1) h->pos = y;
   return y - x;

silences the warnings.

test_modulefinder triggers a ubsan error.

.
�[J�[1;31masan error�[0m: heap use after free 1-byte load at 0x100081d7261c shadow 0x200903a64c3
o/dbg/third_party/python/pythontester.com.dbg
                                        x                                       
�[31mFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF�[39m
    |-3     |-3     |-3     |-3     |-3     |-3     |-3     |-3     |-3         
████████████████████████████████████████████████████████████████████████████████
000000400000-000001a7d000 .text
000001a7d000-0000038f9000 .data
00007fff0000-00008000ffff
000080070000-0000808affff
02008fff0000-0200903dffff ←shadow
020090690000-020090dbffff
020091b80000-020091b9ffff
0e007fff0000-0e008004ffff
100040330000-10004038ffff
100080000000-100081e8ffff ←address
1000834f0000-100086e3ffff
6ffffffe0000-7000001dffff

the crash was caused by
0x000000000044a283: __die at libc/log/die.c:42
0x0000000001761113: __asan_report_load at libc/intrin/asan.c:1207
0x0000000001460de6: r_byte at third_party/python/Python/marshal.c:762
0x0000000001460f81: r_object at third_party/python/Python/marshal.c:963
0x0000000001471fa3: read_object at third_party/python/Python/marshal.c:1517
0x000000000147226b: marshal_loads at third_party/python/Python/marshal.c:1818
0x0000000000ef05f4: _PyMethodDef_RawFastCallKeywords at third_party/python/Objects/call.c:673
0x0000000000ef0ef2: _PyCFunction_FastCallKeywords at third_party/python/Objects/call.c:752
0x0000000001385386: call_function at third_party/python/Python/ceval.c:4829
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3375
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eee7ba: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:457
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eee7ba: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:457
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x000000000137d531: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x0000000000efa7a9: method_call at third_party/python/Objects/classobject.c:331
0x0000000000ef83bc: PyObject_Call at third_party/python/Objects/call.c:271
0x00000000013464e2: do_call_core at third_party/python/Python/ceval.c:4917
0x0000000001362c29: _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3476
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x00000000010fccf8: slot_tp_call at third_party/python/Objects/typeobject.c:6267
0x0000000000ef131c: _PyObject_FastCallKeywords at third_party/python/Objects/call.c:225
0x000000000135fb72: call_function at third_party/python/Python/ceval.c:4877
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x0000000000efa7a9: method_call at third_party/python/Objects/classobject.c:331
0x0000000000ef83bc: PyObject_Call at third_party/python/Objects/call.c:271
0x00000000013464e2: do_call_core at third_party/python/Python/ceval.c:4917
0x0000000001362c29: _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3476
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x00000000010fccf8: slot_tp_call at third_party/python/Objects/typeobject.c:6267
0x0000000000ef131c: _PyObject_FastCallKeywords at third_party/python/Objects/call.c:225
0x000000000135fb72: call_function at third_party/python/Python/ceval.c:4877
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x0000000000efa7a9: method_call at third_party/python/Objects/classobject.c:331
0x0000000000ef83bc: PyObject_Call at third_party/python/Objects/call.c:271
0x00000000013464e2: do_call_core at third_party/python/Python/ceval.c:4917
0x0000000001362c29: _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3476
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x00000000010fccf8: slot_tp_call at third_party/python/Objects/typeobject.c:6267
0x0000000000ef131c: _PyObject_FastCallKeywords at third_party/python/Objects/call.c:225
0x000000000135fb72: call_function at third_party/python/Python/ceval.c:4877
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x0000000000eeb892: function_code_fastcall at third_party/python/Objects/call.c:309
0x0000000000eeed3d: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:433
0x00000000013810a9: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3392
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef48c8: _PyObject_FastCallDict at third_party/python/Objects/call.c:124
0x0000000000ef538c: _PyObject_Call_Prepend at third_party/python/Objects/call.c:925
0x00000000010fbb9d: slot_tp_init at third_party/python/Objects/typeobject.c:6502
0x00000000010e74b0: type_call at third_party/python/Objects/typeobject.c:958
0x0000000000ef131c: _PyObject_FastCallKeywords at third_party/python/Objects/call.c:225
0x000000000137a42d: call_function at third_party/python/Python/ceval.c:4877
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3375
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x000000000134e933: PyEval_EvalCodeEx at third_party/python/Python/ceval.c:4230
0x000000000134e993: PyEval_EvalCode at third_party/python/Python/ceval.c:709
0x0000000001338a25: builtin_exec_impl at third_party/python/Python/bltinmodule.c:1164
0x00000000013393ad: builtin_exec at ./third_party/python/Python/clinic/bltinmodule.inc:284
0x0000000000ef05f4: _PyMethodDef_RawFastCallKeywords at third_party/python/Objects/call.c:673
0x0000000000ef0ef2: _PyCFunction_FastCallKeywords at third_party/python/Objects/call.c:752
0x0000000001381af3: call_function at third_party/python/Python/ceval.c:4829
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eee7ba: _PyFunction_FastCallKeywords at third_party/python/Objects/call.c:457
0x000000000137d531: call_function at third_party/python/Python/ceval.c:4874
 (inlined by) _PyEval_EvalFrameDefault at third_party/python/Python/ceval.c:3407
0x000000000134d11b: _PyEval_EvalCodeWithName at third_party/python/Python/ceval.c:4205
0x0000000000eed37f: _PyFunction_FastCallDict at third_party/python/Objects/call.c:401
0x0000000000ef84ef: PyObject_Call at third_party/python/Objects/call.c:252
0x000000000071146d: RunModule at third_party/python/Modules/main.c:250
0x0000000000714dca: Py_Main at third_party/python/Modules/main.c:783
0x000000000048b199: RunPythonModule at third_party/python/repl.c:333
0x0000000000402a33: main at third_party/python/repl.c:353
0x000000000040a1b5: cosmo at libc/runtime/cosmo.S:69
0x00000000004021a4: _start at libc/crt/crt.S:85

There was a comment there warning of memory faults. asan/ubsan proved it
right.
instead of reading the FILE* in little parts via
PyMarshal_ReadObjectFromFile, now I just load the whole file into memory
and call PyMarshal_ReadObjectFromString instead. This prevents ubsan
warnings.
@ahgamut
Copy link
Collaborator Author

ahgamut commented May 14, 2022

The following change enabled me to fix my code that was causing the ubsan warning:

diff --git a/libc/intrin/ubsan.c b/libc/intrin/ubsan.c
index ec357de0f..c65eb4a88 100644
--- a/libc/intrin/ubsan.c
+++ b/libc/intrin/ubsan.c
@@ -216,6 +216,7 @@ static void __ubsan_warning(const struct UbsanSourceLocation *loc,
                             const char *description) {
   kprintf("%s:%d: %subsan warning: %s is undefined behavior%s\n", loc->file,
           loc->line, SUBTLE, description, RESET);
+  __die();
 }
 
 dontdiscard __ubsan_die_f *__ubsan_abort(const struct UbsanSourceLocation *loc,

@ahgamut
Copy link
Collaborator Author

ahgamut commented May 14, 2022

Now all tests pass in MODE=, MODE=tiny, and MODE=dbg.

now we know what happens inside, so we just call the actual function
PyEval_EvalCode with the right setup.
@ahgamut
Copy link
Collaborator Author

ahgamut commented May 14, 2022

@jart e043046 and fccb0f1 is all the code that needs to be changed in _bootstrap.py for locking during imports

spec._initializing is only there for a single check + optimization
within import.c that ends up calling _lock_unlock_module. Since we are
ignoring locks for the time being, this check can also be ignored.
@jart
Copy link
Owner

jart commented May 26, 2022

-    memcpy(iov[i].iov_base, h->mem + y, b);
+    if(iov[i].iov_base) memcpy(iov[i].iov_base, h->mem + y, b);

To avoid undefined behavior there, you'd like want:

-    memcpy(iov[i].iov_base, h->mem + y, b);
+    if(b) memcpy(iov[i].iov_base, h->mem + y, b);

@jart
Copy link
Owner

jart commented May 26, 2022

It's a small speedup, but it adds up if all tests run slightly faster than before.

I'd be interested in seeing the numbers. Startup latency is something I value. It's difficult to do with a change like this, because the system calls themselves usually have a lot of overhead. For instance, a stat() system call can be on the order of a microsecond. Whereas a crossover between Python and C my best guess (not having measured it) would probably be around 100ns. There's also some latency when assets are pulled out and inflated from the zip.

@ahgamut
Copy link
Collaborator Author

ahgamut commented May 27, 2022

So I implemented this because I read a blog post about Python startup times, and the question I had was "How can we use the benefits of APE ZIP store + Cosmopolitan Libc to speedup imports in APE Python?"

I tried to do this earlier when submitting #248, but at that time I wasn't as good at writing CPython with Cosmopolitan Libc's support.

Performance is measured in that blog post by just checking time for importing all available stdlib packages.
Simple gist from that post modified for APE Python: import_stdlib.py

time with current HEAD commit and MODE=optlinux:

real	0m0.390s
user	0m0.378s
sys	0m0.013s

time with current HEAD commit + this PR merged and MODE=optlinux:

real	0m0.366s
user	0m0.350s
sys	0m0.017s

If you like I can run the pyperformance benchmark for startup times. We can also look at python.com -m cProfile import_stdlib.py to see how the calls are distributed.

For instance, a stat() system call can be on the order of a microsecond. Whereas a crossover between Python and C my best guess (not having measured it) would probably be around 100ns. There's also some latency when assets are pulled out and inflated from the zip.

By moving things like this into C, ftrace/strace logs become clearer when trying imports, and so this part of import process (most used during startup) will be easier to optimize in C than in Python. (Debugging the C is easier too thanks to the wonderful ubsan runtime ❤️)

For example, we can add a bool into the SourcelessFileLoader struct to avoid an unnecessary stat call, which is a change that would be difficult to observe+do/not be as effective if done in Python. I imagine other ZIP store/Cosmo-specific tricks also become viable.

@jart jart merged commit 7e9fb0a into jart:master May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants