Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 1.65 KB

defaulting_to_pandas.rst

File metadata and controls

33 lines (26 loc) · 1.65 KB

Defaulting to pandas

Currently Modin does not support distributed execution for all methods from pandas API. The remaining unimplemented methods are being executed in a mode called "default to pandas". This allows users to continue using Modin even though their workloads contain functions not yet implemented in Modin. Here is a diagram of how we convert to pandas and perform the operation:

image

We first convert to a pandas DataFrame, then perform the operation. There is a performance penalty for going from a partitioned Modin DataFrame to pandas because of the communication cost and single-threaded nature of pandas. Once the pandas operation has completed, we convert the DataFrame back into a partitioned Modin DataFrame. This way, operations performed after something defaults to pandas will be optimized with Modin.

The exact methods we have implemented are listed in the respective subsections:

  • DataFrame </supported_apis/dataframe_supported>
  • Series </supported_apis/series_supported>
  • utilities </supported_apis/utilities_supported>
  • I/O </supported_apis/io_supported>

We have taken a community-driven approach to implementing new methods. We did a study on pandas usage to learn what the most-used APIs are. Modin currently supports 93% of the pandas API based on our study of pandas usage, and we are actively expanding the API. To request implementation, file an issue at https://github.com/modin-project/modin/issues or send an email to feature_requests@modin.org.