Skip to content

fix(tod.concatenate): Convert index map to unicode.#201

Merged
tristpinsm merged 1 commit intomasterfrom
tpm/unicode_im
Jun 30, 2022
Merged

fix(tod.concatenate): Convert index map to unicode.#201
tristpinsm merged 1 commit intomasterfrom
tpm/unicode_im

Conversation

@tristpinsm
Copy link
Copy Markdown
Contributor

No description provided.

@jrs65
Copy link
Copy Markdown
Contributor

jrs65 commented Jun 29, 2022

@tristpinsm do we really need to deal with the case where the concatenation axis has a string type? I struggle to see when a time like axis that is fine for concatenation would have a string type.

@jrs65
Copy link
Copy Markdown
Contributor

jrs65 commented Jun 29, 2022

That would obviously simplify this PR as you could remove two of the chunks.

@tristpinsm
Copy link
Copy Markdown
Contributor Author

@tristpinsm do we really need to deal with the case where the concatenation axis has a string type? I struggle to see when a time like axis that is fine for concatenation would have a string type.

Yeah, I dunno. If you did pass it a dataset that fits that description with convert_dataset_strings=True wouldn't you expect it to do the conversion? I also can't think of an example where that happens but I don't see any reason why it's impossible.

In any case I'm fine with removing that part, just let me know.

Copy link
Copy Markdown
Contributor

@jrs65 jrs65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to go for me. I think in the end it makes sense to keep the general behaviour for concatenating along string-types axes, there isn't anything which restricts it in the current implementation and it could be nice in the future.

Comment thread caput/tod.py Outdated
if axis in concatenation_axes:
# Initialize the dataset.
dtype = index_map.dtype
if convert_dataset_strings and memh5.has_bytestring(index_map.dtype):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory you could remove the memh5.has_bytestring clause as the conversion should be a no-op on non-bytestring types. Not sure if it's really worth it though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say that at least leaving the clause in would save it some work on non- string types, but given that's not true as .has_bytestring does a very similar recursive crawl through the dtype structure, so you don't actually save anything. So probably remove it to simplify the implementation.

@tristpinsm tristpinsm merged commit 33316b2 into master Jun 30, 2022
@tristpinsm tristpinsm deleted the tpm/unicode_im branch June 30, 2022 18:12
@jrs65
Copy link
Copy Markdown
Contributor

jrs65 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants