Skip to content

Conversation

@lithomas1
Copy link
Contributor

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Just submitting this for CI purposes, nothing to review yet.

@lithomas1 lithomas1 added Enhancement Strings String extension data type and string data labels Apr 23, 2024
ndarray[uint8_int64_object_t] out,
ndarray[object] arr,
ndarray out,
ndarray arr,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make iteration slower.

Look into numpy C API iteration functions.

(e.g. look at Validator._validate)

If this gives perf improvement, maybe can be split out.


@classmethod
def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy: bool = False):
na_mask, any_na = libmissing.isnaobj(np.array(scalars, dtype=object), check_for_any_na=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can use _lib.convert_nans_to_NA instead?

Would get rid of diff in missing.pyx.

return super()._str_endswith(pat, na)
pat = np.asarray(pat, dtype=get_numpy_string_dtype_instance())
result = np.strings.endswith(self._ndarray, pat)
return BooleanArray(result, isna(self._ndarray))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache na mask?


if ret.dtype == object and ret_dtype is not None:
# cast from object back to StringDType
return ret.astype(ret_dtype, copy=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In map_infer_mask there is a convert=True argument, that calls ``maybe_convert_objects on it.

Alternately, we could look at setting the dtype argument to an instance of StringDType, and hope that PyArray_SETITEM will work on it?

return dtype == object or dtype.kind in "SUT"

def get_numpy_string_dtype_instance(
na_object=libmissing.NA,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is na_object parameter used?

)
raise TypeError(msg)
dtype = self._inferred_dtype
if dtype not in allowed_types:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are numpy dtypes hashable?

Might just want to add to allowed_types

return np.dtype("object")

return np_find_common_type(*types)
return np.result_type(*types)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This (other changes in this file) are causing test failures.

if self.values.dtype.kind == 'T':
return NumpyStringArray(self.values)
else:
return NumpyExtensionArray(self.values)
Copy link
Contributor

@ngoldbaum ngoldbaum Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because NumpyExtensionArray inherits from ObjectStringArrayMixin, so unless I make this change there's no way to call the string method overrides in NumpyStringArray.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 30, 2024
@ngoldbaum
Copy link
Contributor

Closing in favor of #58394.

@ngoldbaum
Copy link
Contributor

Oh wait I can't, I guess the bot will get to it if Thomas doesn't :)

@lithomas1 lithomas1 closed this May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Stale Strings String extension data type and string data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants