Modin Supported Methods

For your convenience, we have compiled a list of currently implemented APIs and methods available in Modin. This documentation is updated as new methods and APIs are merged into the master branch, and not necessarily correct as of the most recent release. In order to install the latest version of Modin, follow the directions found on the installation page.

Questions on implementation details

If you have a question about the implementation details or would like more information about an API or method in Modin, please contact the Modin developer mailing list.

API Completeness

We have taken a community-driven approach to implementing new methods. We did a study on pandas usage to learn what the most-used APIs are. We currently support 93% of the pandas API based on usage, and are actively expanding the API.

DataFrame

The following table lists both implemented and not implemented methods. If you have need of an operation that is listed as not implemented, feel free to open an issue on the GitHub repository. Contributions are also welcome!

DataFrame method	Implemented?	Limitations/Notes for Current implementation
`T`	Y
`__abs__`	Y
`__add__`	Y
`__and__`	Y
`__array__`	Y	Will not result in a distributed object
`__array_wrap__`	Y	Will not result in a distributed object
`__bool__`	Y
`__contains__`	Y
`__copy__`	Y	Copy will always make a shallow copy
`__deepcopy__`	Y	Copy will always make a shallow copy
`__delitem__`	Y
`__div__`	Y	Requires shuffle when operating on two DataFrames
`__eq__`	Y	Requires shuffle when operating on two DataFrames
`__finalize__`	N	N/A, Not Yet Implemented
`__floordiv__`	Y	Requires shuffle when operating on two DataFrames
`__ge__`	Y	Requires shuffle when operating on two DataFrames
`__getitem__`	Y	Returns a pandas Series (see Series section below) `key` parameter as type DataFrame not yet supported `MultiIndex` columns not yet supported
`__getstate__`	N	N/A, Not Yet Implemented
`__gt__`	Y	Requires shuffle when operating on two DataFrames
`__hash__`	N	N/A, Not Yet Implemented
`__iadd__`	Y	See `__add__`
`__ifloordiv__`	Y	See `__floordiv__`
`__imod__`	Y	See `__mod__`
`__imul__`	Y	See `__mul__`
`__invert__`	N	N/A, Not Yet Implemented
`__ipow__`	Y	See `__pow__`
`__isub__`	Y	See `__sub__`
`__iter__`	Y
`__itruediv__`	Y	See `__truediv__`
`__le__`	Y	Requires shuffle when operating on two DataFrames
`__len__`	Y
`__lt__`	Y	Requires shuffle when operating on two DataFrames
`__mod__`	Y	Requires shuffle when operating on two DataFrames
`__mul__`	Y	Requires shuffle when operating on two DataFrames
`__ne__`	Y	Requires shuffle when operating on two DataFrames
`__neg__`	Y
`__nonzero__`	Y
`__or__`	Y
`__pow__`	Y	Requires shuffle when operating on two DataFrames
`__radd__`	Y	See `__add__` `level` parameter not yet supported
`__rdiv__`	Y	See `__div__` `level` parameter not yet supported
`__repr__`	Y	Blocking call: Must retrieve data from remote
`__rfloordiv__`	Y	See `__floordiv__` `level` parameter not yet supported
`__rmod__`	Y	See `__mod__` `level` parameter not yet supported
`__rmul__`	Y	See `__mul__` `level` parameter not yet supported
`__round__`	N	N/A, Not Yet Implemented
`__rpow__`	Y	See `__pow__` `level` parameter not yet supported
`__rsub__`	Y	See `__sub__` `level` parameter not yet supported
`__rtruediv__`	Y	See `__truediv__` `level` parameter not yet supported
`__setitem__`	Y	Can only set if `key` parameter is type `str`
`__setstate__`	N	N/A, Not Yet Implemented
`__sizeof__`	N	N/A, Not Yet Implemented
`__str__`	Y	Blocking call: Must retrieve data from remote
`__sub__`	Y	Requires shuffle when operating on two DataFrames
`__truediv__`	Y	Requires shuffle when operating on two DataFrames
`__unicode__`	N	N/A, Not Yet Implemented
`__xor__`	Y
`abs`	Y
`add`	Y	See `__add__` `level` parameter not yet supported
`add_prefix`	Y
`add_suffix`	Y
`agg`	Y	Not yet optimized: Can return DataFrame or Series Passing a dictionary for the `func` parameter not yet supported Passing the string name of a numpy operation for the `func` parameter not yet supported
`aggregate`	Y	See `agg`
`align`	N	N/A, Not Yet Implemented
`all`	Y	`level` parameter not yet supported
`any`	Y	`level` parameter not yet supported
`append`	Y	Can be further optimized to be non-blocking
`apply`	Y	See `agg`
`applymap`	Y
`as_blocks`	N	N/A, Not Yet Implemented
`as_matrix`	Y	Will not result in a distributed object
`asfreq`	N	N/A, Not Yet Implemented
`asof`	N	N/A, Not Yet Implemented
`assign`	N	N/A, Not Yet Implemented
`astype`	Y
`at`	N	N/A, Not Yet Implemented
`at_time`	N	N/A, Not Yet Implemented
`axes`	Y
`between_time`	N	N/A, Not Yet Implemented
`bfill`	Y
`blocks`	N	N/A, Not Yet Implemented
`bool`	Y
`boxplot`	Y
`clip`	Y
`clip_lower`	Y
`clip_upper`	Y
`columns`	Y
`combine`	N	N/A, Not Yet Implemented
`combine_first`	N	N/A, Not Yet Implemented
`compound`	N	N/A, Not Yet Implemented
`consolidate`	N	N/A, Not Yet Implemented
`convert_objects`	N	N/A, Not Yet Implemented
`copy`	Y	Copy will always make a shallow copy
`corr`	N	N/A, Not Yet Implemented
`corrwith`	N	N/A, Not Yet Implemented
`count`	Y	`level` parameter not yet supported
`cov`	N	N/A, Not Yet Implemented
`cummax`	Y
`cummin`	Y
`cumprod`	Y
`cumsum`	Y
`describe`	Y
`diff`	Y
`div`	Y	See `__div__` `level` parameter not yet supported
`divide`	Y	See `__div__` `level` parameter not yet supported
`dot`	N	N/A, Not Yet Implemented
`drop`	Y	`level` parameter not yet supported
`drop_duplicates`	N	N/A, Not Yet Implemented
`dropna`	Y
`dtypes`	Y
`duplicated`	N	N/A, Not Yet Implemented
`empty`	Y
`eq`	Y	See `__eq__` `level` parameter not yet supported
`equals`	Y	Requires shuffle, can be further optimized
`eval`	Y
`ewm`	N	N/A, Not Yet Implemented
`expanding`	N	N/A, Not Yet Implemented
`ffill`	Y
`fillna`	Y	`value` parameter of type DataFrame not yet supported
`filter`	Y
`first`	N	N/A, Not Yet Implemented
`first_valid_index`	Y
`floordiv`	Y	See `__floordiv__` `level` parameter not yet supported
`from_csv`	Y
`from_dict`	Y
`from_items`	Y
`from_records`	Y
`ftypes`	Y
`ge`	Y	See `__ge__` `level` parameter not yet supported
`get`	Y
`get_dtype_counts`	Y
`get_ftype_counts`	Y
`get_value`	N	N/A, Not Yet Implemented
`get_values`	N	N/A, Not Yet Implemented
`groupby`	Y	Not yet optimized, will require Distributed Series `level` parameter not yet supported `by` with a list of columns not yet supported
`gt`	Y	See `__gt__` `level` parameter not yet supported
`head`	Y
`hist`	N	N/A, Not Yet Implemented
`iat`	N	N/A, Not Yet Implemented
`idxmax`	Y
`idxmin`	Y
`iloc`	Y
`index`	Y
`infer_objects`	N	N/A, Not Yet Implemented
`info`	Y
`insert`	Y
`interpolate`	N	N/A, Not Yet Implemented
`is_copy`	N	N/A, Not Yet Implemented
`isin`	Y
`isna`	Y
`isnull`	Y
`items`	Y
`iteritems`	Y
`iterrows`	Y
`itertuples`	Y
`ix`	N	N/A, Not Yet Implemented
`join`	Y	Specifying `on` parameter not yet supported
`keys`	Y
`kurt`	N	N/A, Not Yet Implemented
`kurtosis`	N	N/A, Not Yet Implemented
`last`	N	N/A, Not Yet Implemented
`last_valid_index`	Y
`le`	Y	See `__le__` `level` parameter not yet supported
`loc`	Y
`lookup`	N	N/A, Not Yet Implemented
`lt`	Y	See `__lt__` `level` parameter not yet supported
`mad`	N	N/A, Not Yet Implemented
`mask`	N	N/A, Not Yet Implemented
`max`	Y	`level` parameter not yet supported
`mean`	Y	`level` parameter not yet supported
`median`	Y	`level` parameter not yet supported
`melt`	N	N/A, Not Yet Implemented
`memory_usage`	Y
`merge`	Y	Only implemented for `left_index=True` and `right_index=True`
`min`	Y	`level` parameter not yet supported
`mod`	Y	`level` parameter not yet supported
`mode`	Y
`mul`	Y	See `__mul__` `level` parameter not yet supported
`multiply`	Y	See `__mul__` `level` parameter not yet supported
`ndim`	Y
`ne`	Y	See `__ne__` `level` parameter not yet supported
`nlargest`	N	N/A, Not Yet Implemented
`notna`	Y
`notnull`	Y
`nsmallest`	N	N/A, Not Yet Implemented
`nunique`	Y
`pct_change`	N	N/A, Not Yet Implemented
`pipe`	Y
`pivot`	N	N/A, Not Yet Implemented
`pivot_table`	N	N/A, Not Yet Implemented
`plot`	Y
`pop`	Y
`pow`	Y	See `__pow__` `level` parameter not yet supported
`prod`	Y	`level` parameter not yet supported
`product`	Y	`level` parameter not yet supported
`quantile`	Y
`query`	Y	Local variables not yet supported
`radd`	Y	See `__add__` `level` parameter not yet supported
`rank`	Y
`rdiv`	Y	See `__div__` `level` parameter not yet supported
`reindex`	Y	`level` parameter not yet supported
`reindex_axis`	N	N/A, Not Yet Implemented
`reindex_like`	N	N/A, Not Yet Implemented
`rename`	Y	`level` parameter not yet supported
`rename_axis`	Y
`reorder_levels`	N	N/A, Not Yet Implemented
`replace`	N	N/A, Not Yet Implemented
`resample`	N	N/A, Not Yet Implemented
`reset_index`	Y	`level` parameter not yet supported
`rfloordiv`	Y	See `__floordiv__` `level` parameter not yet supported
`rmod`	Y	See `__mod__` `level` parameter not yet supported
`rmul`	Y	See `__mul__` `level` parameter not yet supported
`rolling`	N	N/A, Not Yet Implemented
`round`	Y
`rpow`	Y	See `__pow__` `level` parameter not yet supported
`rsub`	Y	See `__sub__` `level` parameter not yet supported
`rtruediv`	Y	See `__truediv__` `level` parameter not yet supported
`sample`	Y
`select`	N	N/A, Not Yet Implemented
`select_dtypes`	Y
`sem`	N	N/A, Not Yet Implemented
`set_axis`	Y
`set_index`	Y
`set_value`	N	N/A, Not Yet Implemented
`shape`	Y
`shift`	N	N/A, Not Yet Implemented
`size`	Y
`skew`	Y	`level` parameter not yet supported
`slice_shift`	N	N/A, Not Yet Implemented
`sort_index`	Y	`level` parameter not yet supported
`sort_values`	Y	Not optimized, will require a distributed Series
`sortlevel`	N	N/A, Not Yet Implemented
`squeeze`	N	N/A, Not Yet Implemented
`stack`	N	N/A, Not Yet Implemented
`std`	Y	`level` parameter not yet supported
`style`	N	N/A, Not Yet Implemented
`sub`	Y	See `__sub__` `level` parameter not yet supported
`subtract`	Y	See `__sub__` `level` parameter not yet supported
`sum`	Y	`level` parameter not yet supported
`swapaxes`	N	N/A, Not Yet Implemented
`swaplevel`	N	N/A, Not Yet Implemented
`tail`	Y
`take`	N	N/A, Not Yet Implemented
`to_clipboard`	Y
`to_csv`	Y
`to_dense`	N	N/A, Not Yet Implemented
`to_dict`	Y
`to_excel`	Y
`to_feather`	Y
`to_gbq`	Y
`to_hdf`	Y
`to_html`	Y
`to_json`	Y
`to_latex`	Y
`to_msgpack`	Y
`to_panel`	N	N/A, Not Yet Implemented
`to_parquet`	Y
`to_period`	N	N/A, Not Yet Implemented
`to_pickle`	Y
`to_records`	Y
`to_sparse`	N	N/A, Not Yet Implemented
`to_sql`	Y
`to_stata`	Y
`to_string`	Y
`to_timestamp`	N	N/A, Not Yet Implemented
`to_xarray`	N	N/A, Not Yet Implemented
`transform`	Y
`transpose`	Y
`truediv`	Y	See `__truediv__` `level` parameter not yet supported
`truncate`	N	N/A, Not Yet Implemented
`tshift`	N	N/A, Not Yet Implemented
`tz_convert`	N	N/A, Not Yet Implemented
`tz_localize`	N	N/A, Not Yet Implemented
`unstack`	N	N/A, Not Yet Implemented
`update`	Y	`raise_conflict=True` not yet supported
`values`	Y
`var`	Y	`level` parameter not yet supported
`where`	Y	`level` parameter not yet supported
`xs`	N	N/A, Not Yet Implemented

Series

Currently, whenever a Series is used or returned, we use a pandas Series. In the future, we're going to implement a distributed Series, but until then there will be some performance bottlenecks. The pandas Series is completely compatible with all operations that both require and return one in Modin.

IO

A number of IO methods default to pandas. We have parallelized read_csv and read_parquet, though many of the remaining methods can be relatively easily parallelized. Some of the operations default to the pandas implementation, meaning it will read in serially as a single, non-distributed DataFrame and distribute it. Performance will be affected by this.

IO method	Implemented?	Limitations/Notes for Current implementation
`read_csv`	Y
`read_parquet`	Y
`read_json`	Y	Defaults to pandas implementation
`read_html`	Y	Defaults to pandas implementation
`read_clipboard`	Y	Defaults to pandas implementation
`read_excel`	Y	Defaults to pandas implementation
`read_hdf`	Y	Defaults to pandas implementation
`read_feather`	Y	Defaults to pandas implementation
`read_msgpack`	Y	Defaults to pandas implementation
`read_stata`	Y	Defaults to pandas implementation
`read_sas`	Y	Defaults to pandas implementation
`read_pickle`	Y	Defaults to pandas implementation
`read_sql`	Y	Defaults to pandas implementation

List of Other Supported Operations Available on Import

If you import modin.pandas as pd the following operations are available from pd.<op>, e.g. pd.concat. If you do not see an operation that pandas enables and would like to request it, feel free to open an issue. Make sure you tell us your primary use-case so we can make it happen faster!

concat
eval
unique
value_counts
cut
to_numeric
factorize
test
qcut
match
to_datetime
get_dummies
Panel
date_range
Index
MultiIndex
Series
bdate_range
DatetimeIndex
to_timedelta
set_eng_float_format
set_option
CategoricalIndex
Timedelta
Timestamp
NaT
PeriodIndex
Categorical

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas_supported.rst

pandas_supported.rst

Modin Supported Methods

Questions on implementation details

API Completeness

DataFrame

Series

IO

List of Other Supported Operations Available on Import

Files

pandas_supported.rst

Latest commit

History

pandas_supported.rst

File metadata and controls

Modin Supported Methods

Questions on implementation details

API Completeness

DataFrame

Series

IO

List of Other Supported Operations Available on Import