Skip to content

Latest commit

 

History

History
36 lines (28 loc) · 2.54 KB

dataframe.rst

File metadata and controls

36 lines (28 loc) · 2.54 KB

PandasDataframe

:py~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe is a direct descendant of :py~modin.core.dataframe.base.dataframe.dataframe.ModinDataframe. Its purpose is to implement the abstract interfaces for usage with all pandas-based storage formats</flow/modin/core/storage_formats/index>. :py~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe could be inherited and augmented further by any specific implementation which needs it to take special care of some behavior or to improve performance for certain execution engine.

The class serves as the intermediate level between pandas query compiler and conforming partition manager. All queries formed at the query compiler layer are ingested by this class and then conveyed jointly with the stored partitions into the partition manager for processing. Direct partitions manipulation by this class is prohibited except cases if an operation is strictly private or protected and called inside of the class only. The class provides significantly reduced set of operations that fit plenty of pandas operations.

Main tasks of :py~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe are storage of partitions, manipulation with labels of axes and providing set of methods to perform operations on the internal data.

As mentioned above, PandasDataframe shouldn't work with stored partitions directly and the responsibility for modifying partitions array has to lay on partitioning/partition_manager. For example, method ~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.map_full_axis redirects applying function to ~PandasDataframePartitionManager.map_partitions_full_axis method.

Modin PandasDataframe can be created from pandas.DataFrame, pyarrow.Table (methods ~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.from_pandas, ~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.from_arrow are used respectively). Also, PandasDataframe can be converted to np.array, pandas.DataFrame (methods ~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.to_numpy, ~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.to_pandas are used respectively).

Manipulation with labels of axes happens using internal methods for changing labels on the new, adding prefixes/suffixes etc.

Public API

modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe