Skip to content

Latest commit

 

History

History
761 lines (744 loc) · 70.9 KB

pandas_supported.rst

File metadata and controls

761 lines (744 loc) · 70.9 KB

Modin Supported Methods

For your convenience, we have compiled a list of currently implemented APIs and methods available in Modin. This documentation is updated as new methods and APIs are merged into the master branch, and not necessarily correct as of the most recent release. In order to install the latest version of Modin, follow the directions found on the installation page.

Questions on implementation details

If you have a question about the implementation details or would like more information about an API or method in Modin, please contact the Modin developer mailing list.

API Completeness

We have taken a community-driven approach to implementing new methods. We did a study on pandas usage to learn what the most-used APIs are. We currently support 93% of the pandas API based on usage, and are actively expanding the API.

DataFrame

The following table lists both implemented and not implemented methods. If you have need of an operation that is listed as not implemented, feel free to open an issue on the GitHub repository. Contributions are also welcome!

DataFrame method Implemented? Limitations/Notes for Current implementation
T Y
__abs__ Y
__add__ Y
__and__ Y
__array__ Y Will not result in a distributed object
__array_wrap__ Y Will not result in a distributed object
__bool__ Y
__contains__ Y
__copy__ Y Copy will always make a shallow copy
__deepcopy__ Y Copy will always make a shallow copy
__delitem__ Y
__div__ Y Requires shuffle when operating on two DataFrames
__eq__ Y Requires shuffle when operating on two DataFrames
__finalize__ N N/A, Not Yet Implemented
__floordiv__ Y Requires shuffle when operating on two DataFrames
__ge__ Y Requires shuffle when operating on two DataFrames
__getitem__ Y

Returns a pandas Series (see Series section below)

key parameter as type DataFrame not yet supported

MultiIndex columns not yet supported

__getstate__ N N/A, Not Yet Implemented
__gt__ Y Requires shuffle when operating on two DataFrames
__hash__ N N/A, Not Yet Implemented
__iadd__ Y See __add__
__ifloordiv__ Y See __floordiv__
__imod__ Y See __mod__
__imul__ Y See __mul__
__invert__ N N/A, Not Yet Implemented
__ipow__ Y See __pow__
__isub__ Y See __sub__
__iter__ Y
__itruediv__ Y See __truediv__
__le__ Y Requires shuffle when operating on two DataFrames
__len__ Y
__lt__ Y Requires shuffle when operating on two DataFrames
__mod__ Y Requires shuffle when operating on two DataFrames
__mul__ Y Requires shuffle when operating on two DataFrames
__ne__ Y Requires shuffle when operating on two DataFrames
__neg__ Y
__nonzero__ Y
__or__ Y
__pow__ Y Requires shuffle when operating on two DataFrames
__radd__ Y

See __add__

level parameter not yet supported

__rdiv__ Y

See __div__

level parameter not yet supported

__repr__ Y Blocking call: Must retrieve data from remote
__rfloordiv__ Y

See __floordiv__

level parameter not yet supported

__rmod__ Y

See __mod__

level parameter not yet supported

__rmul__ Y

See __mul__

level parameter not yet supported

__round__ N N/A, Not Yet Implemented
__rpow__ Y

See __pow__

level parameter not yet supported

__rsub__ Y

See __sub__

level parameter not yet supported

__rtruediv__ Y

See __truediv__

level parameter not yet supported

__setitem__ Y Can only set if key parameter is type str
__setstate__ N N/A, Not Yet Implemented
__sizeof__ N N/A, Not Yet Implemented
__str__ Y Blocking call: Must retrieve data from remote
__sub__ Y Requires shuffle when operating on two DataFrames
__truediv__ Y Requires shuffle when operating on two DataFrames
__unicode__ N N/A, Not Yet Implemented
__xor__ Y
abs Y
add Y

See __add__

level parameter not yet supported

add_prefix Y
add_suffix Y
agg Y

Not yet optimized: Can return DataFrame or Series

Passing a dictionary for the func parameter not yet supported

Passing the string name of a numpy operation for the func parameter not yet supported

aggregate Y See agg
align N N/A, Not Yet Implemented
all Y level parameter not yet supported
any Y level parameter not yet supported
append Y Can be further optimized to be non-blocking
apply Y See agg
applymap Y
as_blocks N N/A, Not Yet Implemented
as_matrix Y Will not result in a distributed object
asfreq N N/A, Not Yet Implemented
asof N N/A, Not Yet Implemented
assign N N/A, Not Yet Implemented
astype Y
at N N/A, Not Yet Implemented
at_time N N/A, Not Yet Implemented
axes Y
between_time N N/A, Not Yet Implemented
bfill Y
blocks N N/A, Not Yet Implemented
bool Y
boxplot Y
clip Y
clip_lower Y
clip_upper Y
columns Y
combine N N/A, Not Yet Implemented
combine_first N N/A, Not Yet Implemented
compound N N/A, Not Yet Implemented
consolidate N N/A, Not Yet Implemented
convert_objects N N/A, Not Yet Implemented
copy Y Copy will always make a shallow copy
corr N N/A, Not Yet Implemented
corrwith N N/A, Not Yet Implemented
count Y level parameter not yet supported
cov N N/A, Not Yet Implemented
cummax Y
cummin Y
cumprod Y
cumsum Y
describe Y
diff Y
div Y

See __div__

level parameter not yet supported

divide Y

See __div__

level parameter not yet supported

dot N N/A, Not Yet Implemented
drop Y level parameter not yet supported
drop_duplicates N N/A, Not Yet Implemented
dropna Y
dtypes Y
duplicated N N/A, Not Yet Implemented
empty Y
eq Y

See __eq__

level parameter not yet supported

equals Y Requires shuffle, can be further optimized
eval Y
ewm N N/A, Not Yet Implemented
expanding N N/A, Not Yet Implemented
ffill Y
fillna Y value parameter of type DataFrame not yet supported
filter Y
first N N/A, Not Yet Implemented
first_valid_index Y
floordiv Y

See __floordiv__

level parameter not yet supported

from_csv Y
from_dict Y
from_items Y
from_records Y
ftypes Y
ge Y

See __ge__

level parameter not yet supported

get Y
get_dtype_counts Y
get_ftype_counts Y
get_value N N/A, Not Yet Implemented
get_values N N/A, Not Yet Implemented
groupby Y

Not yet optimized, will require Distributed Series

level parameter not yet supported

by with a list of columns not yet supported

gt Y

See __gt__

level parameter not yet supported

head Y
hist N N/A, Not Yet Implemented
iat N N/A, Not Yet Implemented
idxmax Y
idxmin Y
iloc Y
index Y
infer_objects N N/A, Not Yet Implemented
info Y
insert Y
interpolate N N/A, Not Yet Implemented
is_copy N N/A, Not Yet Implemented
isin Y
isna Y
isnull Y
items Y
iteritems Y
iterrows Y
itertuples Y
ix N N/A, Not Yet Implemented
join Y Specifying on parameter not yet supported
keys Y
kurt N N/A, Not Yet Implemented
kurtosis N N/A, Not Yet Implemented
last N N/A, Not Yet Implemented
last_valid_index Y
le Y

See __le__

level parameter not yet supported

loc Y
lookup N N/A, Not Yet Implemented
lt Y

See __lt__

level parameter not yet supported

mad N N/A, Not Yet Implemented
mask N N/A, Not Yet Implemented
max Y level parameter not yet supported
mean Y level parameter not yet supported
median Y level parameter not yet supported
melt N N/A, Not Yet Implemented
memory_usage Y
merge Y Only implemented for left_index=True and right_index=True
min Y level parameter not yet supported
mod Y level parameter not yet supported
mode Y
mul Y

See __mul__

level parameter not yet supported

multiply Y

See __mul__

level parameter not yet supported

ndim Y
ne Y

See __ne__

level parameter not yet supported

nlargest N N/A, Not Yet Implemented
notna Y
notnull Y
nsmallest N N/A, Not Yet Implemented
nunique Y
pct_change N N/A, Not Yet Implemented
pipe Y
pivot N N/A, Not Yet Implemented
pivot_table N N/A, Not Yet Implemented
plot Y
pop Y
pow Y

See __pow__

level parameter not yet supported

prod Y level parameter not yet supported
product Y level parameter not yet supported
quantile Y
query Y Local variables not yet supported
radd Y

See __add__

level parameter not yet supported

rank Y
rdiv Y

See __div__

level parameter not yet supported

reindex Y level parameter not yet supported
reindex_axis N N/A, Not Yet Implemented
reindex_like N N/A, Not Yet Implemented
rename Y level parameter not yet supported
rename_axis Y
reorder_levels N N/A, Not Yet Implemented
replace N N/A, Not Yet Implemented
resample N N/A, Not Yet Implemented
reset_index Y level parameter not yet supported
rfloordiv Y

See __floordiv__

level parameter not yet supported

rmod Y

See __mod__

level parameter not yet supported

rmul Y

See __mul__

level parameter not yet supported

rolling N N/A, Not Yet Implemented
round Y
rpow Y

See __pow__

level parameter not yet supported

rsub Y

See __sub__

level parameter not yet supported

rtruediv Y

See __truediv__

level parameter not yet supported

sample Y
select N N/A, Not Yet Implemented
select_dtypes Y
sem N N/A, Not Yet Implemented
set_axis Y
set_index Y
set_value N N/A, Not Yet Implemented
shape Y
shift N N/A, Not Yet Implemented
size Y
skew Y level parameter not yet supported
slice_shift N N/A, Not Yet Implemented
sort_index Y level parameter not yet supported
sort_values Y Not optimized, will require a distributed Series
sortlevel N N/A, Not Yet Implemented
squeeze N N/A, Not Yet Implemented
stack N N/A, Not Yet Implemented
std Y level parameter not yet supported
style N N/A, Not Yet Implemented
sub Y

See __sub__

level parameter not yet supported

subtract Y

See __sub__

level parameter not yet supported

sum Y level parameter not yet supported
swapaxes N N/A, Not Yet Implemented
swaplevel N N/A, Not Yet Implemented
tail Y
take N N/A, Not Yet Implemented
to_clipboard Y
to_csv Y
to_dense N N/A, Not Yet Implemented
to_dict Y
to_excel Y
to_feather Y
to_gbq Y
to_hdf Y
to_html Y
to_json Y
to_latex Y
to_msgpack Y
to_panel N N/A, Not Yet Implemented
to_parquet Y
to_period N N/A, Not Yet Implemented
to_pickle Y
to_records Y
to_sparse N N/A, Not Yet Implemented
to_sql Y
to_stata Y
to_string Y
to_timestamp N N/A, Not Yet Implemented
to_xarray N N/A, Not Yet Implemented
transform Y
transpose Y
truediv Y

See __truediv__

level parameter not yet supported

truncate N N/A, Not Yet Implemented
tshift N N/A, Not Yet Implemented
tz_convert N N/A, Not Yet Implemented
tz_localize N N/A, Not Yet Implemented
unstack N N/A, Not Yet Implemented
update Y raise_conflict=True not yet supported
values Y
var Y level parameter not yet supported
where Y level parameter not yet supported
xs N N/A, Not Yet Implemented

Series

Currently, whenever a Series is used or returned, we use a pandas Series. In the future, we're going to implement a distributed Series, but until then there will be some performance bottlenecks. The pandas Series is completely compatible with all operations that both require and return one in Modin.

IO

A number of IO methods default to pandas. We have parallelized read_csv and read_parquet, though many of the remaining methods can be relatively easily parallelized. Some of the operations default to the pandas implementation, meaning it will read in serially as a single, non-distributed DataFrame and distribute it. Performance will be affected by this.

IO method Implemented? Limitations/Notes for Current implementation
read_csv Y
read_parquet Y
read_json Y Defaults to pandas implementation
read_html Y Defaults to pandas implementation
read_clipboard Y Defaults to pandas implementation
read_excel Y Defaults to pandas implementation
read_hdf Y Defaults to pandas implementation
read_feather Y Defaults to pandas implementation
read_msgpack Y Defaults to pandas implementation
read_stata Y Defaults to pandas implementation
read_sas Y Defaults to pandas implementation
read_pickle Y Defaults to pandas implementation
read_sql Y Defaults to pandas implementation

List of Other Supported Operations Available on Import

If you import modin.pandas as pd the following operations are available from pd.<op>, e.g. pd.concat. If you do not see an operation that pandas enables and would like to request it, feel free to open an issue. Make sure you tell us your primary use-case so we can make it happen faster!

  • concat
  • eval
  • unique
  • value_counts
  • cut
  • to_numeric
  • factorize
  • test
  • qcut
  • match
  • to_datetime
  • get_dummies
  • Panel
  • date_range
  • Index
  • MultiIndex
  • Series
  • bdate_range
  • DatetimeIndex
  • to_timedelta
  • set_eng_float_format
  • set_option
  • CategoricalIndex
  • Timedelta
  • Timestamp
  • NaT
  • PeriodIndex
  • Categorical