Modin Supported Methods

For your convenience, we have compiled a list of currently implemented APIs and methods available in Modin. This documentation is updated as new methods and APIs are merged into the master branch, and not necessarily correct as of the most recent release. In order to install the latest version of Modin, follow the directions found on the installation page.

Questions on implementation details

If you have a question about the implementation details or would like more information about an API or method in Modin, please contact the Modin developer mailing list.

API Completeness

Currently, we support ~71% of the pandas API. The exact methods we have implemented are listed below.

We have taken a community-driven approach to implementing new methods. We did a study on pandas usage to learn what the most-used APIs are. Modin currently supports 93% of the pandas API based on our study of pandas usage, and we are actively expanding the API.

Defaulting to pandas

The remaining unimplemented methods default to pandas. This allows users to continue using Modin even though their workloads contain functions not yet implemented in Modin. Here is a diagram of how we convert to pandas and perform the operation:

We first convert to a pandas DataFrame, then perform the operation. There is a performance penalty for going from a partitioned Modin DataFrame to pandas because of the communication cost and single-threaded nature of pandas. Once the pandas operation has completed, we convert the DataFrame back into a partitioned Modin DataFrame. This way, operations performed after something defaults to pandas will be optimized with Modin.

DataFrame

The following table lists both implemented and not implemented methods. If you have need of an operation that is listed as not implemented, feel free to open an issue on the GitHub repository. Contributions are also welcome!

DataFrame method	Implemented?	Limitations/Notes for Current implementation
`T`	Y
`__abs__`	Y
`__add__`	Y
`__and__`	Y
`__array__`	Y	Will not result in a distributed object
`__array_wrap__`	Y	Will not result in a distributed object
`__bool__`	Y
`__contains__`	Y
`__copy__`	Y	Copy will always make a shallow copy
`__deepcopy__`	Y	Copy will always make a shallow copy
`__delitem__`	Y
`__div__`	Y	Requires shuffle when operating on two DataFrames
`__eq__`	Y	Requires shuffle when operating on two DataFrames
`__finalize__`	N	Defaults to pandas
`__floordiv__`	Y	Requires shuffle when operating on two DataFrames
`__ge__`	Y	Requires shuffle when operating on two DataFrames
`__getitem__`	Y	Returns a pandas Series (see Series section below) `key` parameter as type DataFrame not yet supported `MultiIndex` columns defaults to pandas
`__getstate__`	N	Defaults to pandas
`__gt__`	Y	Requires shuffle when operating on two DataFrames
`__hash__`	N	Defaults to pandas
`__iadd__`	Y	See `__add__`
`__ifloordiv__`	Y	See `__floordiv__`
`__imod__`	Y	See `__mod__`
`__imul__`	Y	See `__mul__`
`__invert__`	N	Defaults to pandas
`__ipow__`	Y	See `__pow__`
`__isub__`	Y	See `__sub__`
`__iter__`	Y
`__itruediv__`	Y	See `__truediv__`
`__le__`	Y	Requires shuffle when operating on two DataFrames
`__len__`	Y
`__lt__`	Y	Requires shuffle when operating on two DataFrames
`__mod__`	Y	Requires shuffle when operating on two DataFrames
`__mul__`	Y	Requires shuffle when operating on two DataFrames
`__ne__`	Y	Requires shuffle when operating on two DataFrames
`__neg__`	Y
`__nonzero__`	Y
`__or__`	Y
`__pow__`	Y	Requires shuffle when operating on two DataFrames
`__radd__`	Y	See `__add__`
`__rdiv__`	Y	See `__div__`
`__repr__`	Y	Blocking call: Must retrieve data from remote
`__rfloordiv__`	Y	See `__floordiv__`
`__rmod__`	Y	See `__mod__`
`__rmul__`	Y	See `__mul__`
`__round__`	N	Defaults to pandas
`__rpow__`	Y	See `__pow__`
`__rsub__`	Y	See `__sub__`
`__rtruediv__`	Y	See `__truediv__`
`__setitem__`	Y	Can only set if `key` parameter is type `str`
`__setstate__`	N	Defaults to pandas
`__sizeof__`	N	Defaults to pandas
`__str__`	Y	Blocking call: Must retrieve data from remote
`__sub__`	Y	Requires shuffle when operating on two DataFrames
`__truediv__`	Y	Requires shuffle when operating on two DataFrames
`__unicode__`	N	Defaults to pandas
`__xor__`	Y
`abs`	Y
`add`	Y	See `__add__`
`add_prefix`	Y
`add_suffix`	Y
`agg`	Y	Not yet optimized: Can return DataFrame or Series Passing a dictionary for the `func` parameter not yet supported Passing the string name of a numpy operation for the `func` parameter defaults to pandas
`aggregate`	Y	See `agg`
`align`	N	Defaults to pandas
`all`	Y
`any`	Y
`append`	Y	Can be further optimized to be non-blocking
`apply`	Y	See `agg`
`applymap`	Y
`as_blocks`	N	Defaults to pandas
`as_matrix`	Y	Will not result in a distributed object
`asfreq`	N	Defaults to pandas
`asof`	N	Defaults to pandas
`assign`	N	Defaults to pandas
`astype`	Y
`at`	N	Defaults to pandas
`at_time`	N	Defaults to pandas
`axes`	Y
`between_time`	N	Defaults to pandas
`bfill`	Y
`blocks`	N	Defaults to pandas
`bool`	Y
`boxplot`	Y
`clip`	Y
`clip_lower`	Y
`clip_upper`	Y
`columns`	Y
`combine`	N	Defaults to pandas
`combine_first`	N	Defaults to pandas
`compound`	N	Defaults to pandas
`consolidate`	N	Defaults to pandas
`convert_objects`	N	Defaults to pandas
`copy`	Y	Copy will always make a shallow copy
`corr`	N	Defaults to pandas
`corrwith`	N	Defaults to pandas
`count`	Y
`cov`	N	Defaults to pandas
`cummax`	Y
`cummin`	Y
`cumprod`	Y
`cumsum`	Y
`describe`	Y
`diff`	Y
`div`	Y	See `__div__`
`divide`	Y	See `__div__`
`dot`	N	Defaults to pandas
`drop`	Y
`drop_duplicates`	N	Defaults to pandas
`dropna`	Y
`dtypes`	Y
`duplicated`	N	Defaults to pandas
`empty`	Y
`eq`	Y	See `__eq__`
`equals`	Y	Requires shuffle, can be further optimized
`eval`	Y
`ewm`	N	Defaults to pandas
`expanding`	N	Defaults to pandas
`ffill`	Y
`fillna`	Y	`value` parameter of type DataFrame defaults to pandas
`filter`	Y
`first`	N	Defaults to pandas
`first_valid_index`	Y
`floordiv`	Y	See `__floordiv__`
`from_csv`	Y
`from_dict`	Y
`from_items`	Y
`from_records`	Y
`ftypes`	Y
`ge`	Y	See `__ge__`
`get`	Y
`get_dtype_counts`	Y
`get_ftype_counts`	Y
`get_value`	N	Defaults to pandas
`get_values`	N	Defaults to pandas
`groupby`	Y	Not yet optimized, will require Distributed Series `by` with a list of columns defaults to pandas
`gt`	Y	See `__gt__`
`head`	Y
`hist`	N	Defaults to pandas
`iat`	N	Defaults to pandas
`idxmax`	Y
`idxmin`	Y
`iloc`	Y
`index`	Y
`infer_objects`	N	Defaults to pandas
`info`	Y
`insert`	Y
`interpolate`	N	Defaults to pandas
`is_copy`	N	Defaults to pandas
`isin`	Y
`isna`	Y
`isnull`	Y
`items`	Y
`iteritems`	Y
`iterrows`	Y
`itertuples`	Y
`ix`	N	Defaults to pandas
`join`	Y
`keys`	Y
`kurt`	N	Defaults to pandas
`kurtosis`	N	Defaults to pandas
`last`	N	Defaults to pandas
`last_valid_index`	Y
`le`	Y	See `__le__`
`loc`	Y
`lookup`	N	Defaults to pandas
`lt`	Y	See `__lt__`
`mad`	N	Defaults to pandas
`mask`	N	Defaults to pandas
`max`	Y
`mean`	Y
`median`	Y
`melt`	N	Defaults to pandas
`memory_usage`	Y
`merge`	Y	Only implemented for `left_index=True` and `right_index=True`, defaults to pandas otherwise
`min`	Y
`mod`	Y
`mode`	Y
`mul`	Y	See `__mul__`
`multiply`	Y	See `__mul__`
`ndim`	Y
`ne`	Y	See `__ne__`
`nlargest`	N	Defaults to pandas
`notna`	Y
`notnull`	Y
`nsmallest`	N	Defaults to pandas
`nunique`	Y
`pct_change`	N	Defaults to pandas
`pipe`	Y
`pivot`	N	Defaults to pandas
`pivot_table`	N	Defaults to pandas
`plot`	Y
`pop`	Y
`pow`	Y	See `__pow__`
`prod`	Y
`product`	Y
`quantile`	Y
`query`	Y	Local variables not yet supported
`radd`	Y	See `__add__`
`rank`	Y
`rdiv`	Y	See `__div__`
`reindex`	Y
`reindex_axis`	N	Defaults to pandas
`reindex_like`	N	Defaults to pandas
`rename`	Y
`rename_axis`	Y
`reorder_levels`	N	Defaults to pandas
`replace`	N	Defaults to pandas
`resample`	N	Defaults to pandas
`reset_index`	Y
`rfloordiv`	Y	See `__floordiv__`
`rmod`	Y	See `__mod__`
`rmul`	Y	See `__mul__`
`rolling`	N	Defaults to pandas
`round`	Y
`rpow`	Y	See `__pow__`
`rsub`	Y	See `__sub__`
`rtruediv`	Y	See `__truediv__`
`sample`	Y
`select`	N	Defaults to pandas
`select_dtypes`	Y
`sem`	N	Defaults to pandas
`set_axis`	Y
`set_index`	Y
`set_value`	N	Defaults to pandas
`shape`	Y
`shift`	N	Defaults to pandas
`size`	Y
`skew`	Y
`slice_shift`	N	Defaults to pandas
`sort_index`	Y
`sort_values`	Y	Not optimized, will require a distributed Series
`sortlevel`	N	Defaults to pandas
`squeeze`	N	Defaults to pandas
`stack`	N	Defaults to pandas
`std`	Y
`style`	N	Defaults to pandas
`sub`	Y	See `__sub__`
`subtract`	Y	See `__sub__`
`sum`	Y
`swapaxes`	N	Defaults to pandas
`swaplevel`	N	Defaults to pandas
`tail`	Y
`take`	N	Defaults to pandas
`to_clipboard`	Y
`to_csv`	Y
`to_dense`	N	Defaults to pandas
`to_dict`	Y
`to_excel`	Y
`to_feather`	Y
`to_gbq`	Y
`to_hdf`	Y
`to_html`	Y
`to_json`	Y
`to_latex`	Y
`to_msgpack`	Y
`to_panel`	N	Defaults to pandas
`to_parquet`	Y
`to_period`	N	Defaults to pandas
`to_pickle`	Y
`to_records`	Y
`to_sparse`	N	Defaults to pandas
`to_sql`	Y
`to_stata`	Y
`to_string`	Y
`to_timestamp`	N	Defaults to pandas
`to_xarray`	N	Defaults to pandas
`transform`	Y
`transpose`	Y
`truediv`	Y	See `__truediv__`
`truncate`	N	Defaults to pandas
`tshift`	N	Defaults to pandas
`tz_convert`	N	Defaults to pandas
`tz_localize`	N	Defaults to pandas
`unstack`	N	Defaults to pandas
`update`	Y	`raise_conflict=True` not yet supported
`values`	Y
`var`	Y
`where`	Y
`xs`	N	Defaults to pandas

Series

Currently, whenever a Series is used or returned, we use a pandas Series. In the future, we're going to implement a distributed Series, but until then there will be some performance bottlenecks. The pandas Series is completely compatible with all operations that both require and return one in Modin.

IO

A number of IO methods default to pandas. We have parallelized read_csv and read_parquet, though many of the remaining methods can be relatively easily parallelized. Some of the operations default to the pandas implementation, meaning it will read in serially as a single, non-distributed DataFrame and distribute it. Performance will be affected by this.

IO method	Implemented?	Limitations/Notes for Current implementation
`read_csv`	Y
`read_table`	Y
`read_parquet`	Y
`read_json`	Y	Defaults to pandas implementation
`read_html`	Y	Defaults to pandas implementation
`read_clipboard`	Y	Defaults to pandas implementation
`read_excel`	Y	Defaults to pandas implementation
`read_hdf`	Y
`read_feather`	Y	Defaults to pandas implementation
`read_msgpack`	Y	Defaults to pandas implementation
`read_stata`	Y	Defaults to pandas implementation
`read_sas`	Y	Defaults to pandas implementation
`read_pickle`	Y	Defaults to pandas implementation
`read_sql`	Y	Defaults to pandas implementation

List of Other Supported Operations Available on Import

If you import modin.pandas as pd the following operations are available from pd.<op>, e.g. pd.concat. If you do not see an operation that pandas enables and would like to request it, feel free to open an issue. Make sure you tell us your primary use-case so we can make it happen faster!

pd.concat
pd.eval
pd.unique
pd.value_counts
pd.cut
pd.to_numeric
pd.factorize
pd.test
pd.qcut
pd.match
pd.to_datetime
pd.get_dummies
pd.Panel
pd.date_range
pd.Index
pd.MultiIndex
pd.Series
pd.bdate_range
pd.DatetimeIndex
pd.to_timedelta
pd.set_eng_float_format
pd.set_option
pd.CategoricalIndex
pd.Timedelta
pd.Timestamp
pd.NaT
pd.PeriodIndex
pd.Categorical

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas_supported.rst

pandas_supported.rst

Modin Supported Methods

Questions on implementation details

API Completeness

Defaulting to pandas

DataFrame

Series

IO

List of Other Supported Operations Available on Import

Files

pandas_supported.rst

Latest commit

History

pandas_supported.rst

File metadata and controls

Modin Supported Methods

Questions on implementation details

API Completeness

Defaulting to pandas

DataFrame

Series

IO

List of Other Supported Operations Available on Import