Skip to content
This repository has been archived by the owner on May 31, 2020. It is now read-only.

Bytearray expandtabs #729

Closed
wants to merge 5 commits into from
Closed

Bytearray expandtabs #729

wants to merge 5 commits into from

Conversation

shakram02
Copy link

@shakram02 shakram02 commented Feb 18, 2018

Implementation of bytearray.expandtabs()

Any missing test cases?
Do I need to check for overflows anywhere?

One more question, I should've used the already implemented expandtabs() in Str? like it's the same concept, but just for different types, I'm thinking about creating some utility classes for stuff like iterable or some base abstract type to have the implementation of those common methods between iterables instead of implementing them multiple times in the codebase ?

Copy link
Collaborator

@eliasdorneles eliasdorneles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for contributing, I've made some comments below.



int output_size = 0;
for (int i = 0, j = 0; i < this.value.length; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this for loop is accomplishing, could you clarify?

It looks like it's meant to compute the output_size, but the code below doesn't seem to be using it in a meaningful way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I'm sorry about that I've forgotten it from an older version of the code, the C implementation first calculates the size of the output then makes a buffer accordingly, we don't need that now.

}
}

StringBuilder sb = new StringBuilder();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably not use StringBuilder here -- a Bytes object contains only bytes (8 bits) and is backed by a byte[] (Java byte array), while a String contain Unicode codepoints (by default using UTF-16 charset, which means double the usage of memory).

You should probably just use an ArrayList, initializing it with capacity of this.value.length, let it expand to the right size and at the end convert it back to an array of byte primitives.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information, I updated it +1

public org.python.Object expandtabs(java.util.List<org.python.Object> args, java.util.Map<java.lang.String, org.python.Object> kwargs, java.util.List<org.python.Object> default_args, java.util.Map<java.lang.String, org.python.Object> default_kwargs) {
throw new org.python.exceptions.NotImplementedError("bytes.expandtabs has not been implemented.");
public org.python.Object expandtabs(org.python.types.Object tabsize) {
// CPython implementation https://github.com/python/cpython/blob/master/Objects/stringlib/transmogrify.h#L21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that this link's L21 line number will become outdated as the master branch changes – it will be better to link it to a tag (v3.6.4) or a specific commit, either of which won't change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very neat suggestion -never thought about it this way before-, Thanks! I'll fix it

Copy link
Collaborator

@eliasdorneles eliasdorneles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!
Thanks for contributing, I think this is overall in the right direction -- I've made some comments on some things I noticed below.

int tabsizeInt = 8;

if (tabsize != null) {
tabsizeInt = (int) ((org.python.types.Int) tabsize).value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a test for a wrong argument (for example, b''.expandtabs(" ") and b''.expandtabs(None)), and throw the correct exception and error message here in case tabsize is invalid.

tabsizeInt = (int) ((org.python.types.Int) tabsize).value;
}

ArrayList<Character> buffer = new ArrayList<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be of type Byte instead of Character (one Character uses two bytes in UTF-16 -- the default Java charset).
You could also pass an initial capacity here to ArrayList<>(this.values.length) to reduce the number of reallocations needed.

byte[] bytes = new byte[buffer.size()];
for (int i = 0; i < buffer.size(); i++) {
bytes[i] = (byte) ((char) buffer.get(i));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion from ArrayList<Byte> to byte[] would probably be useful as a listToArray method, to be reused by other code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should I place that new method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can start by just creating a public static method in this same class.
Another option is to create a BytesUtils.java file for adding static methods like these, since they are probably going to be used in ByteArray.java as well.

throw new org.python.exceptions.NotImplementedError("bytes.expandtabs has not been implemented.");
public org.python.Object expandtabs(org.python.types.Object tabsize) {
// CPython implementation https://github.com/python/cpython/blob/3.6/Objects/stringlib/transmogrify.h#L21
// TODO: check overflows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is needed checking here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the C implementation, it checks for integer overflows. I'm not sure if we should check it here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, tbh I'm having trouble to follow the logic in this implementation. Maybe you got too much inspiration from the C implementation? =)
It's a bit strange that this is so different from the one for Str, have you checked that one? Why is this so much more involved, is this handling some cases which the other doesn't?

@phildini
Copy link
Member

Hi there! It looks like this PR might be dead, so we're closing it for now. Feel free to re-open it if you'd like to continue, or think about directing your efforts to https://github.com/beeware/briefcase or https://github.com/beeware/toga. Both of these have more active development right now. 😄

@phildini phildini closed this Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants