Skip to content

Conversation

ngoldbaum
Copy link
Member

This makes the dtype store the string data in a struct:

typedef struct ss {
    size_t len;
    char buf[];
} ss;

So far this is just a refactoring but the hope is that storing the length will avoid O(N) strlen calls and make it easier to write code that doesn't accidentally introduce buffer overflows.

Bikeshedding on the naming and the API are very welcome.

@seberg
Copy link
Member

seberg commented Jan 25, 2023

Storing the length explicitly also means you can represent strings like "a\0" correctly (NumPy strings cannot).

I don't feel like bike-shedding storage. There are probably many options. If there is a chance of wanting to play with schemes that store length in the array itself then using accessor static inline functions or macros might make sense for getting length and chars/buf.

In theory int is enough for length storage for the forseeable future, but I guess it doesn't matter.

Comment on lines 37 to 42
if (len1 > len2) {
maxlen = len1;
}
else {
maxlen = len2;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you short-circuit the strncmp if the ss lengths are different?

Copy link
Contributor

@peytondmurray peytondmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional small change 🚀 thanks for this!

@ngoldbaum ngoldbaum merged commit e9449e2 into numpy:main Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants