Skip to content

Commit

Permalink
DOCS-#3850: Elaborate on modin.core.dataframe.base and .pandas (#3851)
Browse files Browse the repository at this point in the history
Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru>
Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>
  • Loading branch information
vnlitvinov and YarShev committed Dec 16, 2021
1 parent 20abddd commit cf1e541
Show file tree
Hide file tree
Showing 5 changed files with 32 additions and 1 deletion.
15 changes: 15 additions & 0 deletions docs/flow/modin/core/dataframe/base/index.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
Purpose
=======

``BaseDataframe`` serves the purpose of describing and defining the :doc:`Core Dataframe Algebra </flow/modin/core/dataframe/algebra>`.

It is the core construction element which serves as the client for :doc:`Modin Query Compiler</flow/modin/core/storage_formats/base/query_compiler>` and which implementations are actually executing the queries from the compiler by invoking functions over partition(s).

To execute the queries, a typical implementation also itroduces partitions and
partition manager, interfaces for which we might consider standardising in the future.
For now they're totally implementation-specific.

Base dataframe and axis partitions are the interfaces that must be implemented by any :doc:`execution backend</flow/modin/core/execution/dispatching>` that wants to be plugged in Modin.
These classes are mostly abstract, however very simple and generic enough methods like
:py:meth:`~modin.core.dataframe.base.partitioning.BaseDataframeAxisPartition.force_materialization` can be implemented at the base level because for now we do not expect them to differ in any implementation.

Modin BaseDataframe Interface
=============================

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ BaseDataframeAxisPartition
The class is base for any axis partition class and serves as the last level on which
operations that were conveyed from the partition manager are being performed on an entire column or row.

**Note**: ``modin.core.dataframe.base`` intentionally does not describe any particular partition interface,
as it is the partition manager responsibility (if said partition manager is implemented), i.e. it is
too low-level to be present on the base, abstract level.

The class provides an API that has to be overridden by the child classes in order to manipulate
on a list of block partitions (making up column or row partition) they store.

Expand Down
5 changes: 4 additions & 1 deletion docs/flow/modin/core/dataframe/pandas/dataframe.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
PandasDataframe
"""""""""""""""

The class is base for any frame class of ``pandas`` storage format and serves as the intermediate level
:py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` should be a direct descendant of ``BaseDataframe`` which is being factored right now. Its purpose is to implement the abstract interfaces for usage with all ``pandas``-based :doc:`storage formats<flow/modin/core/storage_formats/index.html>`.
:py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` could be inherited and augmented further by any specific implementation which needs it to take special care of some behavior or to improve performance for certain execution engine.

The class serves as the intermediate level
between ``pandas`` query compiler and conforming partition manager. All queries formed
at the query compiler layer are ingested by this class and then conveyed jointly with the stored partitions
into the partition manager for processing. Direct partitions manipulation by this class is prohibited except
Expand Down
6 changes: 6 additions & 0 deletions docs/flow/modin/core/dataframe/pandas/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Modin PandasDataframe Objects
=============================

``modin.core.dataframe.pandas`` is the package which houses common implementations
of different Modin internal classes used by most `pandas`-based :doc:`storage formats</flow/modin/core/storage_formats>`.

It also double-serves as the full example of how to implement Modin execution backend pieces (sans the :doc:`execution part</flow/modin/core/execution/dispatching>` which is absent here),
as it implements everything an execution backend needs to be fully conformant to Modin expectations.

* :doc:`PandasDataframe <dataframe>` is the class conforming to Dataframe Algebra.
* :doc:`PandasDataframePartition <partitioning/partition>` implements ``Partition`` interface holding ``pandas.DataFrame``.
* :doc:`PandasDataframeAxisPartition <partitioning/axis_partition>` is a joined group of ``PandasDataframePartition``-s along some axis (either rows or labels)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
PandasDataframeAxisPartition
""""""""""""""""""""""""""""

The class implements abstract interface methods from :py:class:`~modin.core.dataframe.base.partitioning.axis_partition.BaseDataframeAxisPartition`
giving the means for a sibling :doc:`partition manager<partition_manager>` to actually work with the axis-wide partitions.

The class is base for any axis partition class of ``pandas`` storage format.

Subclasses must implement ``list_of_blocks`` which represents data wrapped by the :py:class:`~modin.core.dataframe.pandas.partitioning.partition.PandasDataframePartition`
Expand Down

0 comments on commit cf1e541

Please sign in to comment.