-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#4245: Define base interface for dataframe exchange protocol #4246
Changes from 1 commit
b061acf
9ef2fa6
625059d
741ef0c
265515d
37cd8a2
83b474f
7f966a0
826b0a0
2296535
c83c529
c1900e3
83fa0b4
2bb3a6e
4fc997b
44827f8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -556,35 +556,6 @@ def __constructor__(self, *args, **kwargs): | |
""" | ||
return type(self)(*args, **kwargs) | ||
|
||
def __dataframe__(self, nan_as_null: bool = False, allow_copy: bool = True) -> dict: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why moved? We should be able to export both kinds of objects (Series and DataFrame), so the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The spec is only about a DataFrame object, but not a Series. |
||
""" | ||
Get a DataFrame exchange protocol object representing data of the Modin DataFrame. | ||
|
||
See more about the protocol in https://data-apis.org/dataframe-protocol/latest/index.html. | ||
|
||
Parameters | ||
---------- | ||
nan_as_null : bool, default: False | ||
A keyword intended for the consumer to tell the producer | ||
to overwrite null values in the data with ``NaN`` (or ``NaT``). | ||
This currently has no effect; once support for nullable extension | ||
dtypes is added, this value should be propagated to columns. | ||
allow_copy : bool, default: True | ||
A keyword that defines whether or not the library is allowed | ||
to make a copy of the data. For example, copying data would be necessary | ||
if a library supports strided buffers, given that this protocol | ||
specifies contiguous buffers. Currently, if the flag is set to ``False`` | ||
and a copy is needed, a ``RuntimeError`` will be raised. | ||
|
||
Returns | ||
------- | ||
dict | ||
A dictionary object following the dataframe protocol specification. | ||
""" | ||
return self._query_compiler.to_dataframe( | ||
nan_as_null=nan_as_null, allow_copy=allow_copy | ||
) | ||
|
||
def abs(self): | ||
self._validate_dtypes(numeric_only=True) | ||
return self.__constructor__(query_compiler=self._query_compiler.abs()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use
@abc.abstractmethod
decorator? How it's done forto_pandas
method. Same forfrom_dataframe
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to be able to instantiate a DataFrame with the underlying QC at this line - https://github.com/modin-project/modin/pull/4246/files#diff-028a94e3db5c73c7ddedc6f7b6efb79e07e1360bb1401531b0a9f5d0ca1f1928R55. Otherwise, you will get the error: Can't instantiate abstract class ... with abstract methods from_dataframe, to_dataframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but isn't that the main way Modin defines an interface?
I propose to define the method as abstract in
BaseQueryCompiler
class, and let the child classes:PandasQueryCompiler
andDFAlgQueryCompiler
throwNotImplementedError
exception.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, BaseQueryCompiler should define the interface to be overridden by child classes. We could either make the methods abstract now or as part of efforts discussed here. I don't have strong preferences on this. However, doing that what you proposed
You will still be getting the error mentioned above. We should override the methods in TestQC.
Do you think we should make the methods abstract in the Base QC as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no hurry, we can do it here. Otherwise, we need to flag this question and implement it elsewhere. (Yes, it also need to de done in
TestQC
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be strong efforts to do that so let's put it here. However, this PR should be merged by end of today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not quite correct. We should override the methods in TestQC only, PandasQC and OmnisciQC will override the methods as part of new issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed