Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for concat #40

Merged
merged 4 commits into from
Aug 2, 2017
Merged

Add support for concat #40

merged 4 commits into from
Aug 2, 2017

Conversation

jcrist
Copy link
Contributor

@jcrist jcrist commented Aug 2, 2017

  • Add a top-level concat function for concatenating dataframes,
    series, or indices together
  • Remove old concat method, as it didn't match pandas api
  • Fix bugs in to_pandas implementation to properly handle indices

- Add a top-level `concat` function for concatenating dataframes,
  series, or indices together
- Remove old `concat` method, as it didn't match pandas api
- Fix bugs in `to_pandas` implementation to properly handle indices
self._ordered == other._ordered and
self._codes_impl == other._codes_impl)
except Exception:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hide the Exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you can't compare pandas custom dtypes and numpy dtypes. We hide the exception so that you can compare categorical and numeric implementations. Although in this case we can skip the try block because we check if other is CategoricalSeriesImpl. The try/except for NumericalSeriesImpl is still needed though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except will just hide accidental code error. If other is a NumericalSeriesImpl, it will just return None, which will be interpreted as False. I think the try-except should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, except for the "return None" bit - I prefer comparison methods to always return booleans directly. Fixed.

pygdf/series.py Outdated
if o._impl != head._impl:
raise ValueError("All series must be of same type")

data = head._impl.concat(objs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to delegate to _impl to handle concat. The GPU dataframe will always require plain-old-data types. (We can only handle data that can be copied b/w cpu and gpu memory.) It is always safe to just copy the data for concat just like the one in numerical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For categorical data we need to allocate a buffer of series.cat.codes.dtype, while for numeric we need to allocate a buffer of series.dtype. I figured it made more sense to special case inside the implementations than check if the series was categorical and special case in the method.

Copy link
Contributor

@sklam sklam Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self._data.dtype (Buffer.dtype) will always have the physical dtype. The SeriesImpl has ~~~a~~~ the logical meaning of things. Here, we will only need physical info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I didn't realize _data also had a dtype attribute, I was relying on self.dtype (which could be category) when I implemented it before. Fixed.

@jcrist jcrist mentioned this pull request Aug 2, 2017
@sklam sklam merged commit ec36487 into rapidsai:master Aug 2, 2017
@sklam
Copy link
Contributor

sklam commented Aug 2, 2017

Thanks!

shwina pushed a commit that referenced this pull request Oct 16, 2019
raydouglass pushed a commit that referenced this pull request Nov 7, 2023
Make xdf a subpackage of cudf instead of standalone
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants