Skip to content

Latest commit

 

History

History
177 lines (124 loc) · 6.7 KB

integrations.rst

File metadata and controls

177 lines (124 loc) · 6.7 KB

Third Party Library Integrations

Modin is a drop-in replacement for Pandas, so we want it to interoperate with third-party libraries just as Pandas does. To see where Modin performs well and where it needs to improve, we've selected a number of important machine learning + visualization + statistics libraries, and then looked at examples (from their documentation, if possible) about how they work with Pandas. Then we ran those same workflows with Modin, and tracked what worked, and what failed.

In the table below, you'll see, for each third-party library we tested, the number of successful test calls / total test calls, and a qualitative description of how both Pandas and Modin integrate with that library.

In the deeper dive, you can view the Jupyter notebook we have used to test API calls and the corresponding Github issues filed. If you come across other issues/ examples in your own workflows we encourage you to file an issue or contribute a PR!

Note

These interoperability metrics are preliminary and not all APIs for each library have been tested. Feel free to add more!

Modin Interoperability by Library

Library API successes / calls Interoperability
seaborn 73% (11/15) Pandas: Accepts Pandas DataFrames as inputs for producing plot
Modin: Mostly accepts Modin DataFrames as inputs for producing plots, but fails completely in some cases (pairplot, lmplot), and in others (catplot, objects.Plot) only works for some parameter combinations
plotly 78% (7 / 9) Pandas: Accepts Pandas DataFrames as inputs for producing plots, including specifying X and Y parameters as df columns
Modin: Mostly accepts Modin DataFrames as inputs for producing plots (the exception is choropleth), but fails when specifying X and Y parameters as df columns
matplotlib 100% (5 / 5) Pandas: Accepts Pandas DataFrames as inputs for producing plots like scatter, barh, etc.
Modin: Accepts Modin DataFrames as inputs for producing plots like scatter, barh, etc.
altair 0% (0 / 1) Pandas: Accepts Pandas DataFrames as inputs for producing charts through Chart
Modin: Does not accept Modin DataFrames as inputs for producing charts through Chart
bokeh 0% (0 / 1) Pandas: Loads Pandas DataFrames through ColumnDataSource
Modin: Does not load Modin DataFrames through ColumnDataSource
sklearn 100% (6 / 6) Pandas: Many functions take Pandas DataFrames as inputs
Modin: Many functions take Modin DataFrames as inputs
Hugging Face (Transformers, Datasets) 100% (2 / 2) Pandas: Loads Pandas DataFrames into Datasets, and processes Pandas DataFrame rows as inputs using Transformers.InputExample (deprecated)
Modin: Loads Modin DataFrames into Datasets (though slowly), and processes Modin DataFrame rows as inputs through Transformers.InputExample (deprecated)
Tensorflow 75% (3 / 4) Pandas: Converts Pandas dataframes to tensors
Modin: Converts Modin DataFrames to tensors, but specialized APIs like Keras might not work yet
NLTK 100% (1 / 1) Pandas: Performs transformations like tokenization on Pandas DataFrames
Modin: Performs transformations like tokenization on Modin DataFrames
XGBoost 100% (1 / 1) Pandas: Loads Pandas DataFrames through the DMatrix function
Modin: Loads Modin DataFrames through the DMatrix function
statsmodels 50% (1 / 2) Pandas: Can accept Pandas DataFrames when fitting models
Modin: Sometimes accepts Modin DataFrames when fitting models (e.g., formula.api.ols), but does not in others (e.g., api.OLS)

A Deeper Dive

seaborn

Jupyter Notebook

Github Issues

plotly

Jupyter Notebook

Github Issues

matplotlib

Jupyter Notebook

altair

Jupyter Notebook

Github Issues

bokeh

Jupyter Notebook

Github Issues

sklearn

Jupyter Notebook

Hugging Face

Jupyter Notebook

Tensorflow

Jupyter Notebook

Github Issues

NLTK

Jupyter Notebook

XGBoost

Jupyter Notebook

statsmodels

Jupyter Notebook

Github Issues

Appendix: System Information

The example scripts here were run on the following system:

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Big Sur 11.5.2
  • Modin version: 0.18.0+3.g4114183f
  • Ray version: 2.0.1
  • Python version: 3.9.7.final.0
  • Machine: MacBook Pro (16-inch, 2019)
  • Processor: 2.3 GHz 8-core Intel Core i9 processor
  • Memory: 16 GB 2667 MHz DDR4