Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for contributing, I've made some comments below.
|
||
|
||
int output_size = 0; | ||
for (int i = 0, j = 0; i < this.value.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this for loop is accomplishing, could you clarify?
It looks like it's meant to compute the output_size, but the code below doesn't seem to be using it in a meaningful way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I'm sorry about that I've forgotten it from an older version of the code, the C implementation first calculates the size of the output then makes a buffer accordingly, we don't need that now.
} | ||
} | ||
|
||
StringBuilder sb = new StringBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably not use StringBuilder here -- a Bytes object contains only bytes (8 bits) and is backed by a byte[]
(Java byte array), while a String contain Unicode codepoints (by default using UTF-16 charset, which means double the usage of memory).
You should probably just use an ArrayList, initializing it with capacity of this.value.length
, let it expand to the right size and at the end convert it back to an array of byte
primitives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information, I updated it +1
public org.python.Object expandtabs(java.util.List<org.python.Object> args, java.util.Map<java.lang.String, org.python.Object> kwargs, java.util.List<org.python.Object> default_args, java.util.Map<java.lang.String, org.python.Object> default_kwargs) { | ||
throw new org.python.exceptions.NotImplementedError("bytes.expandtabs has not been implemented."); | ||
public org.python.Object expandtabs(org.python.types.Object tabsize) { | ||
// CPython implementation https://github.com/python/cpython/blob/master/Objects/stringlib/transmogrify.h#L21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that this link's L21
line number will become outdated as the master
branch changes – it will be better to link it to a tag (v3.6.4
) or a specific commit, either of which won't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very neat suggestion -never thought about it this way before-, Thanks! I'll fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello!
Thanks for contributing, I think this is overall in the right direction -- I've made some comments on some things I noticed below.
int tabsizeInt = 8; | ||
|
||
if (tabsize != null) { | ||
tabsizeInt = (int) ((org.python.types.Int) tabsize).value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a test for a wrong argument (for example, b''.expandtabs(" ")
and b''.expandtabs(None)
), and throw the correct exception and error message here in case tabsize is invalid.
tabsizeInt = (int) ((org.python.types.Int) tabsize).value; | ||
} | ||
|
||
ArrayList<Character> buffer = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be of type Byte
instead of Character
(one Character uses two bytes in UTF-16 -- the default Java charset).
You could also pass an initial capacity here to ArrayList<>(this.values.length)
to reduce the number of reallocations needed.
byte[] bytes = new byte[buffer.size()]; | ||
for (int i = 0; i < buffer.size(); i++) { | ||
bytes[i] = (byte) ((char) buffer.get(i)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversion from ArrayList<Byte>
to byte[]
would probably be useful as a listToArray
method, to be reused by other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should I place that new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can start by just creating a public static method in this same class.
Another option is to create a BytesUtils.java
file for adding static methods like these, since they are probably going to be used in ByteArray.java
as well.
throw new org.python.exceptions.NotImplementedError("bytes.expandtabs has not been implemented."); | ||
public org.python.Object expandtabs(org.python.types.Object tabsize) { | ||
// CPython implementation https://github.com/python/cpython/blob/3.6/Objects/stringlib/transmogrify.h#L21 | ||
// TODO: check overflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is needed checking here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the C implementation, it checks for integer overflows. I'm not sure if we should check it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, tbh I'm having trouble to follow the logic in this implementation. Maybe you got too much inspiration from the C implementation? =)
It's a bit strange that this is so different from the one for Str, have you checked that one? Why is this so much more involved, is this handling some cases which the other doesn't?
Hi there! It looks like this PR might be dead, so we're closing it for now. Feel free to re-open it if you'd like to continue, or think about directing your efforts to https://github.com/beeware/briefcase or https://github.com/beeware/toga. Both of these have more active development right now. 😄 |
Implementation of
bytearray.expandtabs()
Any missing test cases?
Do I need to check for overflows anywhere?
One more question, I should've used the already implemented
expandtabs()
inStr
? like it's the same concept, but just for different types, I'm thinking about creating some utility classes for stuff likeiterable
or some base abstract type to have the implementation of those common methods between iterables instead of implementing them multiple times in the codebase ?