The following table lists both implemented and not implemented methods. If you have need of an operation that is listed as not implemented, feel free to open an issue on the GitHub repository, or give a thumbs up to already created issues. Contributions are also welcome!
The following table is structured as follows: The first column contains the method name.
The second column contains link to a description of corresponding pandas method.
The third column is a flag for whether or not there is an implementation in Modin for
the method in the left column. Y
stands for yes, N
stands for no, P
stands
for partial (meaning some parameters may not be supported yet), and D
stands for
default to pandas.
Note
Currently third column reflects implementation status for Ray and Dask engines. By default, support for a method
in the HDK engine could be treated as D
unless Notes
column contains additional information. Similarly,
by default Notes
contains information about Ray
and Dask
engines unless Hdk
is explicitly mentioned.
DataFrame method | pandas Doc link | Implemented? (Y/N/P/D) | Notes for Current implementation |
T |
T | Y | |
abs |
abs | Y | |
add |
add | Y | Ray and Dask: Shuffles data in operations
between DataFrames.
Hdk: P , support binary operations on
scalars and projections of the same frame,
otherwise D |
add_prefix |
add_prefix | Y | |
add_suffix |
add_suffix | Y | |
agg / aggregate |
agg / aggregate | P |
|
align |
align | D | |
all |
all | Y | |
any |
any | Y | |
apply |
apply | Y | See agg |
applymap |
applymap | Y | |
asfreq |
asfreq | D | |
asof |
asof | Y | |
assign |
assign | Y | |
astype |
astype | Y | Hdk: P , int``<-> ``float supported |
at |
at | Y | |
at_time |
at_time | Y | |
axes |
axes | Y | |
between_time |
between_time | Y | |
bfill |
bfill | Y | |
bool |
bool | Y | |
boxplot |
boxplot | D | |
clip |
clip | Y | |
combine |
combine | Y | |
combine_first |
combine_first | Y | |
compare |
compare | Y | |
copy |
copy | Y | |
corr |
corr | P | Correlation floating point precision may slightly
differ from pandas. For now pearson method is
available only. For other methods and for
numeric_only defaults to pandas. |
corrwith |
corrwith | D | |
count |
count | Y | Hdk: P , only default params supported,
otherwise D |
cov |
cov | P | Covariance floating point precision may slightly
differ from pandas. For numeric_only
defaults to pandas. |
cummax |
cummax | Y | |
cummin |
cummin | Y | |
cumprod |
cumprod | Y | |
cumsum |
cumsum | Y | |
describe |
describe | Y | |
diff |
diff | Y | |
div |
div | Y | See add |
divide |
divide | Y | See add |
dot |
dot | Y | |
drop |
drop | Y | Hdk: P since row drop unsupported |
droplevel |
droplevel | Y | |
drop_duplicates |
drop_duplicates | D | |
dropna |
dropna | Y | Hdk: P since thresh and axis
params unsupported |
dtypes |
dtypes | Y | Hdk: Y |
duplicated |
duplicated | Y | |
empty |
empty | Y | |
eq |
eq | Y | See add |
equals |
equals | Y | Requires shuffle, can be further optimized |
eval |
eval | Y | |
ewm |
ewm | D | |
expanding |
expanding | D | |
explode |
explode | Y | |
ffill |
ffill | Y | |
fillna |
fillna | P | value parameter of type DataFrame defaults to
pandas. Hdk: P , params limit ,
downcast and method unsupported. Also
only axis = 0 supported for now |
filter |
filter | Y | |
first |
first | Y | |
first_valid_index |
first_valid_index | Y | |
floordiv |
floordiv | Y | See add |
from_dict |
from_dict | D | |
from_records |
from_records | D | |
ge |
ge | Y | See add |
get |
get | Y | |
groupby |
groupby | Y | Not yet optimized for all operations.
Hdk: P . count , sum , size ,
mean , nunique , std , skew
supported, otherwise D |
gt |
gt | Y | See add |
head |
head | Y | |
hist |
hist | D | |
iat |
iat | Y | |
idxmax |
idxmax | Y | |
idxmin |
idxmin | Y | |
iloc |
iloc | Y | Hdk: P , read access fully supported,
write access: no row and 2D assignments support |
infer_objects |
infer_objects | Y | Hdk: D |
info |
info | Y | |
insert |
insert | Y | |
interpolate |
interpolate | D | |
isetitem |
isetitem | D | |
isin |
isin | Y | |
isna |
isna | Y | |
isnull |
isnull | Y | |
items |
items | Y | |
iterrows |
iterrows | P | Modin does not parallelize iteration in Python |
itertuples |
itertuples | P | Modin does not parallelize iteration in Python |
join |
join | P | When on is set to right or outer or
when validate is given defaults to pandas |
keys |
keys | Y | |
kurt |
kurt | Y | |
kurtosis |
kurtosis | Y | |
last |
last | Y | |
last_valid_index |
last_valid_index | Y | |
le |
le | Y | See add |
loc |
loc | P | We do not support: boolean array, callable.
Hdk: P , read access fully supported,
write access: no row and 2D assignments support |
lt |
lt | Y | See add |
mask |
mask | D | |
max |
max | Y | Hdk: P , only default params supported,
otherwise D |
mean |
mean | P | Modin defaults to pandas if given the level
param.
Hdk: P . D for level , axis ,
skipna and numeric_only params |
median |
median | P | Modin defaults to pandas if given the level
param. |
melt |
melt | Y | |
memory_usage |
memory_usage | Y | |
merge |
merge | P | Implemented the following cases:
left_index=True and right_index=True ,
how=left and how=inner for all values
of parameters except left_index=True and
right_index=False or left_index=False
and right_index=True .
Defaults to pandas otherwise.
Hdk: P , only non-index joins for
how=left and how=inner with
explicit on are supported |
min |
min | Y | Hdk: P , only default params supported,
otherwise D |
mod |
mod | Y | See add |
mode |
mode | Y | |
mul |
mul | Y | See add |
multiply |
multiply | Y | See add |
ndim |
ndim | Y | |
ne |
ne | Y | See add |
nlargest |
nlargest | Y | |
notna |
notna | Y | |
notnull |
notnull | Y | |
nsmallest |
nsmallest | Y | |
nunique |
nunique | Y | Hdk: P , no support for axis!=0 and
dropna=False |
pct_change |
pct_change | D | |
pipe |
pipe | Y | |
pivot |
pivot | Y | |
pivot_table |
pivot_table | Y | |
plot |
plot | D | |
pop |
pop | Y | |
pow |
pow | Y | See add ; Hdk: D |
prod |
prod | Y | |
product |
product | Y | |
quantile |
quantile | Y | |
query |
query | P | Local variables not yet supported |
radd |
radd | Y | See add |
rank |
rank | Y | |
rdiv |
rdiv | Y | See add ; Hdk: D |
reindex |
reindex | Y | Shuffles data |
reindex_like |
reindex_like | D | |
rename |
rename | Y | |
rename_axis |
rename_axis | Y | |
reorder_levels |
reorder_levels | Y | |
replace |
replace | Y | |
resample |
resample | Y | |
reset_index |
reset_index | P | Hdk: P . D for level parameter
Ray and Dask: D when names or
allow_duplicates is non-default |
rfloordiv |
rfloordiv | Y | See add ; Hdk: D |
rmod |
rmod | Y | See add ; Hdk: D |
rmul |
rmul | Y | See add |
rolling |
rolling | Y | |
round |
round | Y | |
rpow |
rpow | Y | See add ; Hdk: D |
rsub |
rsub | Y | See add ; Hdk: D |
rtruediv |
rtruediv | Y | See add ; Hdk: D |
sample |
sample | Y | |
select_dtypes |
select_dtypes | Y | |
sem |
sem | P | Modin defaults to pandas if given the level
param. |
set_axis |
set_axis | Y | |
set_index |
set_index | Y | |
shape |
shape | Y | Hdk: Y |
shift |
shift | Y | |
size |
size | Y | |
skew |
skew | P | Modin defaults to pandas if given the level
param |
sort_index |
sort_index | Y | |
sort_values |
sort_values | Y | Shuffles data. Order of indexes that have the
same sort key is not guaranteed to be the same
across sorts; Hdk: Y |
sparse |
sparse | N | |
squeeze |
squeeze | Y | |
stack |
stack | Y | |
std |
std | P | Modin defaults to pandas if given the level
param. |
style |
style | D | |
sub |
sub | Y | See add |
subtract |
subtract | Y | See add ; Hdk: D |
sum |
sum | Y | Hdk: P , only default params supported,
otherwise D |
swapaxes |
swapaxes | Y | |
swaplevel |
swaplevel | Y | |
tail |
tail | Y | |
take |
take | Y | |
to_clipboard |
to_clipboard | D | |
to_csv |
to_csv | Y | |
to_dict |
to_dict | D | |
to_excel |
to_excel | D | |
to_feather |
to_feather | D | |
to_gbq |
to_gbq | D | |
to_hdf |
to_hdf | D | |
to_html |
to_html | D | |
to_json |
to_json | D | |
to_latex |
to_latex | D | |
to_orc |
to_orc | D | |
to_parquet |
to_parquet | P | Ray/Dask/Unidist: Parallel implementation only
if path parameter is a string. In that case, the
path parameter specifies a directory where one
file is written per row partition of the Modin
dataframe. |
to_period |
to_period | D | |
to_pickle |
to_pickle | D | Experimental implementation: to_pickle_distributed |
to_records |
to_records | D | |
to_sql |
to_sql | Y | |
to_stata |
to_stata | D | |
to_string |
to_string | D | |
to_timestamp |
to_timestamp | D | |
to_xarray |
to_xarray | D | |
transform |
transform | Y | |
transpose |
transpose | Y | |
truediv |
truediv | Y | See add |
truncate |
truncate | Y | |
tz_convert |
tz_convert | Y | |
tz_localize |
tz_localize | Y | |
unstack |
unstack | Y | |
update |
update | Y | |
values |
values | Y | |
value_counts |
value_counts | D | |
var |
var | P | Modin defaults to pandas if given the level
param. |
where |
where | Y |