-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completed the Forth based AsStrings reader. #616
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few things.
src/uproot/interpretation/strings.py
Outdated
if not self._code_complete: | ||
self._forth_code = """ | ||
input stream | ||
|
||
output out-main uint8 | ||
output out-offsets int64 | ||
|
||
0 out-offsets <- stack | ||
|
||
begin | ||
stream !B-> stack dup 255 = if drop stream !I-> stack then dup out-offsets +<- stack stream #!B-> out-main | ||
again""" | ||
|
||
self._code_complete = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to be careful about not regenerating the "1-5" Forth code string and the "4" Forth code string, since they're constants (.pyc
-compile-time constants for Python: they're not made, they're just looked up).
src/uproot/interpretation/strings.py
Outdated
if not self._vm_made and self._code_complete: | ||
self._forth_vm.vm = awkward.forth.ForthMachine64(self._forth_code) | ||
self._vm_made = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this self._vm_made
guard is a good idea (along with making it thread-local).
src/uproot/interpretation/strings.py
Outdated
if not self._vm_made and self._code_complete: | ||
self._forth_vm.vm = awkward.forth.ForthMachine64( | ||
self._forth_code, raise_read_beyond=False | ||
) | ||
self._vm_made = True | ||
self._forth_vm.vm.begin({"stream": numpy.array(data)}) | ||
offsets = self._forth_vm.vm.output_Index64("out-offsets") | ||
data = self._forth_vm.vm.output_NumpyArray("out-main") | ||
return awkward.layout.ListOffsetArray64( | ||
awkward.layout.Index64(offsets), awkward.layout.NumpyArray(data) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this code is the same as above and the two possible Forth source codes are constants, you could do it without an if statement that selects between "1-5"
and "4"
. They can be keys in a constant, global dict that maps those two keys to the corresponding Forth codes.
src/uproot/interpretation/strings.py
Outdated
if ( | ||
isinstance(library, uproot.interpretation.library.Awkward) | ||
and self._special_case | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if any(not isinstance(x, StringArray) for x in basket_arrays.values()):
or the opposite of that for the StringArray case. Avoid extra attributes like self._special_case
, especially mutable ones, especially if they don't have descriptive names. :)
src/uproot/interpretation/strings.py
Outdated
) | ||
output = library.finalize(output, branch, self, entry_start, entry_stop) | ||
|
||
return output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A short-circuit return
at the beginning of a function can be good if the if-block it's in is short and represents an exceptional case. It lets readers ignore that exception and focus on the "normal" cases below.
Here, the if-statement is a decision between two normal cases; bracketing them with if
-else
(and maybe elif
) clarifies this. This one needs an else
.
No description provided.