-
-
Notifications
You must be signed in to change notification settings - Fork 3k
[mypyc] Add primitive for bytes.decode #10951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mypyc/lib-rt/str_ops.c
Outdated
} | ||
|
||
#define PyUnicode_UTF8(op) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better not use a macro here, as it buys us little and makes the code harder to understand and maintain. If you need to share code, use an inline function.
mypyc/lib-rt/str_ops.c
Outdated
#define PyUnicode_UTF8(op) \ | ||
(assert(PyUnicode_Check(op)), \ | ||
assert(PyUnicode_IS_READY(op)), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to crash the process in case the string object is not ready, which is not the right thing to do? Instead, you should probably use PyUnicode_READY
and return NULL if it fails.
mypyc/primitives/str_ops.py
Outdated
decode_types: List[RType] = [bytes_rprimitive, str_rprimitive, str_rprimitive] | ||
decode_constants: List[Tuple[int, RType]] = [(0, pointer_rprimitive), | ||
(0, pointer_rprimitive)] | ||
for i in range(len(decode_types)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The for loop doesn't save a lot of code and makes this harder to understand. I'd prefer just having three normal primitive definitions.
|
||
str_ssize_t_size_op = custom_op( | ||
arg_types=[str_rprimitive], | ||
return_type=c_pyssize_t_rprimitive, | ||
c_function_name='CPyStr_Size_size_t', | ||
error_kind=ERR_NEG_INT) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer three separate helper functions.
mypyc/lib-rt/str_ops.c
Outdated
} | ||
|
||
PyObject* CPy_DecodeWithErrors(PyObject *obj, PyObject *encoding, PyObject *errors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, I'd prefer to have three primitive definitions that all call a single C function, i.e. we'd only duplicate the method_op
declarations, not C implementations. Some primitives can provide fixed extra arguments, but without using the for loop for clarity. Sorry for the extra back and forth!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
@@ -598,6 +600,7 @@ def test_decode() -> None: | |||
assert b.decode('gbk') == '浣犲ソ' | |||
assert b.decode('latin1') == 'ä½\xa0好' | |||
|
|||
[case testEncode] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it would be reasonable to have them in a single test case, since these are related (all string operations) and having fewer run tests will speed up tests. But it's not a big deal. The original name (testChrOrdEncodeDecode
) wasn't the clearest though. Also we already have testStringOps
, which would cover these as well. Not sure what's the best way to organize our tests.
Description
Implements part of mypyc/mypyc#880