Skip to content

Commit

Permalink
FIX-#4044: Update module layout with the ModinDataframe (#4045)
Browse files Browse the repository at this point in the history
Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru>
Signed-off-by: Rehan Durrani <rehan@ponder.io>
  • Loading branch information
RehanSD and YarShev committed Jan 25, 2022
1 parent 5d84042 commit 7fcafa7
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 14 deletions.
21 changes: 21 additions & 0 deletions docs/flow/modin/core/dataframe/base/dataframe.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
ModinDataframe
""""""""""""""

The :py:class:`~modin.core.dataframe.base.dataframe.dataframe.ModinDataframe` is the parent class for all dataframes - regardless of what storage format they are backed by. Its purpose is to define the algebra operators that must be exposed by a dataframe.

This class exposes the dataframe algebra and is meant to be subclassed by all dataframe implementations.
Descendants of this class implement the algebra, and act as the intermediate level
between the query compiler and the underlying execution details (e.g. the conforming partition manager). The class provides
a significantly reduced set of operations that can be composed to form any pandas query.

The :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` is an example of a descendant of this class. It currently has implementations for some of the operators
exposed in this class, and is currently being refactored to include implementations for all of the algebra operators. Please
refer to the :doc: `PandasDataframe documentation<flow/modin/core/dataframe/pandas/dataframe.rst>` for more information.

The :py:class:`~modin.core.dataframe.base.dataframe.dataframe.ModinDataframe` is independent of implementation specific details such as partitioning, storage format, or execution engine.

Public API
----------

.. autoclass:: modin.core.dataframe.base.dataframe.dataframe.ModinDataframe
:members:
21 changes: 9 additions & 12 deletions docs/flow/modin/core/dataframe/base/index.rst
Original file line number Diff line number Diff line change
@@ -1,28 +1,25 @@
Purpose
=======

``BaseDataframe`` serves the purpose of describing and defining the :doc:`Core Dataframe Algebra </flow/modin/core/dataframe/algebra>`.
The :py:class:`~modin.core.dataframe.base.dataframe.dataframe.ModinDataframe` serves the purpose of describing and defining the :doc:`Core Dataframe Algebra </flow/modin/core/dataframe/algebra>`.

It is the core construction element which serves as the client for :doc:`Modin Query Compiler</flow/modin/core/storage_formats/base/query_compiler>` and which implementations are actually executing the queries from the compiler by invoking functions over partition(s).
It is the core construction element and serves as the client for the :doc:`Modin Query Compiler</flow/modin/core/storage_formats/base/query_compiler>`. Descendants that offer implementations execute the queries from the compiler by invoking functions over partitions via a partition manager.

To execute the queries, a typical implementation also itroduces partitions and
partition manager, interfaces for which we might consider standardising in the future.
For now they're totally implementation-specific.
The partitions and partition manager interfaces are currently implementation-specific, but may
be standardized in the future.

Base dataframe and axis partitions are the interfaces that must be implemented by any :doc:`execution backend</flow/modin/core/execution/dispatching>` that wants to be plugged in Modin.
The :py:class:`~modin.core.dataframe.base.dataframe.dataframe.ModinDataframe` and axis partitions are the interfaces that must be implemented by any :doc:`execution backend</flow/modin/core/execution/dispatching>` in order for it to be plugged in to Modin.
These classes are mostly abstract, however very simple and generic enough methods like
:py:meth:`~modin.core.dataframe.base.partitioning.BaseDataframeAxisPartition.force_materialization` can be implemented at the base level because for now we do not expect them to differ in any implementation.

Modin BaseDataframe Interface
=============================
ModinDataframe Interface
========================

* :doc:`ModinDataframe <dataframe>` is an abstract class which represents the algebra operators a dataframe must expose.
* :doc:`BaseDataframeAxisPartition <partitioning/axis_partition>` is an abstract class, representing a joined group of partitions along some axis (either rows or labels).

.. note::
Common interfaces for most of the Modin Dataframe objects are not defined yet. Currently, all of the implementations
inherit :doc:`Dataframe implementation for pandas storage format</flow/modin/core/dataframe/pandas/index>`.

.. toctree::
:hidden:

dataframe
partitioning/axis_partition
2 changes: 1 addition & 1 deletion docs/flow/modin/core/dataframe/pandas/dataframe.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PandasDataframe
"""""""""""""""

:py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` should be a direct descendant of ``BaseDataframe`` which is being factored right now. Its purpose is to implement the abstract interfaces for usage with all ``pandas``-based :doc:`storage formats<flow/modin/core/storage_formats/index.html>`.
:py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` is a direct descendant of :py:class:`~modin.core.dataframe.base.dataframe.dataframe.ModinDataframe`. Its purpose is to implement the abstract interfaces for usage with all ``pandas``-based :doc:`storage formats<flow/modin/core/storage_formats/index.html>`.
:py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` could be inherited and augmented further by any specific implementation which needs it to take special care of some behavior or to improve performance for certain execution engine.

The class serves as the intermediate level
Expand Down
2 changes: 1 addition & 1 deletion docs/flow/modin/core/dataframe/pandas/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ as it implements everything an execution backend needs to be fully conformant to
dataframe
partitioning/partition
partitioning/axis_partition
partitioning/partition_manager
partitioning/partition_manager

0 comments on commit 7fcafa7

Please sign in to comment.