-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert DataFrame columns with type RangeIndex to strings #932
Conversation
hvplot/converter.py
Outdated
@@ -671,6 +671,9 @@ def _process_data(self, kind, data, x, y, by, groupby, row, col, | |||
data = data.to_frame() | |||
if is_intake(data): | |||
data = process_intake(data, use_dask or persist) | |||
if isinstance(getattr(data, "columns", None), pd.RangeIndex): | |||
data = data.rename(columns=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoxbro the thing I was not sure with this change is that doesn't this line actually create a copy of the data? self.source_data
is supposed to be a reference to the original data.
The approach I've implemented locally is to temporarily replace the columns of the source data with columns that contain string objects, in a try/finally block.
The way I was going may just be overkill!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It makes a copy. Looking at the previous if-statements, I would not say self.source_data
refers to the original data, e.g., a series is converted to a dataframe. I see self.data_source
as a reference to the original data, though I have not taken a thorough look at the rest of the codebase to see if this is the case.
Lines 668 to 677 in eed0e87
self.data_source = data | |
self.is_series = is_series(data) | |
if self.is_series: | |
data = data.to_frame() | |
if is_intake(data): | |
data = process_intake(data, use_dask or persist) | |
if isinstance(getattr(data, "columns", None), pd.RangeIndex): | |
data = data.rename(columns=str) | |
self.source_data = data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.data_source
is unused. We don't get any warning for that as flake is set up to basically ignore everything. Something else to fix (I'll ask Philipp if I can run black
once).
Alright @philippjfr we'll need your input here to see how to best handle that :) Since the latest release of HoloViews users get a warning when they make a plot from a DataFrame that has non-string columns. This is what we're trying to deal with here. The converter makes a reference Lines 664 to 679 in 4e55da4
This reference is used only once later in Lines 1215 to 1234 in 4e55da4
The warning is raised when the Note: the converter processes the input data in many ways, and interestingly that includes converting its non-string columns to string with the method Simon's suggesting to make a change early on, before setting
Line 875 in 4e55da4
But not this one that seems to be needed for streaming dataframes: Line 1567 in 4e55da4
(there are no tests for streaming data so we need to be extra careful here, maybe keeping that for another PR but if so we should definitely open an issue) An alternative to Simon's suggestion would be to temporarily modify in place the columns of the source data. The goal is to try to avoid making a copy. This is relevant only if making a copy is a bad idea. That would look something like the following: source_cols = None
if hasattr(self.source_data, 'columns') and any(isinstance(col, str) for col in self.source_data.columns):
source_cols = data.columns
self.source_data.columns = [str(col) for col in self.source_data.columns] # This doesn't make a copy?
try:
...
dataset = hv.Dataset(self.source_data)
...
obj = method(x, y)
obj._dataset = dataset
finally:
if source_cols:
self.source_data.columns = source_cols Let us know how you would approach that 🙏 |
This should be seen as a small (and definitely not exhaustive) step toward working with non-strings columns and
hv.Dimension
.Related to holoviz/holoviews#5353, and with the upcoming change from
DeprecationWarning
toFutureWarning
in holoviz/holoviews#5472 will now give users warnings in their existing notebooks.Most of these warnings will properly come from an initialized
pd.DataFrame
orpd.Series
without defining the column names. This PR checks if thecolumn
ispd.RangeIndex
and converts it to strings. An example is given below, with the upcoming change toFutureWarning
and removinglru_cache
to see all warnings for more information, see holoviz/holoviews#5472.