Skip to content

Latest commit

 

History

History
673 lines (668 loc) · 86.3 KB

dataframe_supported.rst

File metadata and controls

673 lines (668 loc) · 86.3 KB

pd.DataFrame supported APIs

The following table lists both implemented and not implemented methods. If you have need of an operation that is listed as not implemented, feel free to open an issue on the GitHub repository, or give a thumbs up to already created issues. Contributions are also welcome!

The following table is structured as follows: The first column contains the method name. The second column contains link to a description of corresponding pandas method. The third column is a flag for whether or not there is an implementation in Modin for the method in the left column. Y stands for yes, N stands for no, P stands for partial (meaning some parameters may not be supported yet), and D stands for default to pandas.

Note

Currently third column reflects implementation status for Ray and Dask engines. By default, support for a method in the HDK engine could be treated as D unless Notes column contains additional information. Similarly, by default Notes contains information about Ray and Dask engines unless Hdk is explicitly mentioned.

DataFrame method pandas Doc link Implemented? (Y/N/P/D) Notes for Current implementation
T T Y
abs abs Y
add add Y Ray and Dask: Shuffles data in operations between DataFrames. Hdk: P, support binary operations on scalars and projections of the same frame, otherwise D
add_prefix add_prefix Y
add_suffix add_suffix Y
agg / aggregate agg / aggregate P
  • Dictionary func parameter defaults to pandas
  • Numpy operations default to pandas
align align D
all all Y
any any Y
apply apply Y See agg
applymap applymap Y
asfreq asfreq D
asof asof Y
assign assign Y
astype astype Y Hdk: P, int<-> float supported
at at Y
at_time at_time Y
axes axes Y
between_time between_time Y
bfill bfill Y
bool bool Y
boxplot boxplot D
clip clip Y
combine combine Y
combine_first combine_first Y
compare compare Y
copy copy Y
corr corr P Correlation floating point precision may slightly differ from pandas. For now pearson method is available only. For other methods and for numeric_only defaults to pandas.
corrwith corrwith D
count count Y Hdk: P, only default params supported, otherwise D
cov cov P Covariance floating point precision may slightly differ from pandas. For numeric_only defaults to pandas.
cummax cummax Y
cummin cummin Y
cumprod cumprod Y
cumsum cumsum Y
describe describe Y
diff diff Y
div div Y See add
divide divide Y See add
dot dot Y
drop drop Y Hdk: P since row drop unsupported
droplevel droplevel Y
drop_duplicates drop_duplicates D
dropna dropna Y Hdk: P since thresh and axis params unsupported
dtypes dtypes Y Hdk: Y
duplicated duplicated Y
empty empty Y
eq eq Y See add
equals equals Y Requires shuffle, can be further optimized
eval eval Y
ewm ewm D
expanding expanding D
explode explode Y
ffill ffill Y
fillna fillna P value parameter of type DataFrame defaults to pandas. Hdk: P, params limit, downcast and method unsupported. Also only axis = 0 supported for now
filter filter Y
first first Y
first_valid_index first_valid_index Y
floordiv floordiv Y See add
from_dict from_dict D
from_records from_records D
ge ge Y See add
get get Y
groupby groupby Y Not yet optimized for all operations. Hdk: P. count, sum, size, mean, nunique, std, skew supported, otherwise D
gt gt Y See add
head head Y
hist hist D
iat iat Y
idxmax idxmax Y
idxmin idxmin Y
iloc iloc Y Hdk: P, read access fully supported, write access: no row and 2D assignments support
infer_objects infer_objects Y Hdk: D
info info Y
insert insert Y
interpolate interpolate D
isetitem isetitem D
isin isin Y
isna isna Y
isnull isnull Y
items items Y
iterrows iterrows P Modin does not parallelize iteration in Python
itertuples itertuples P Modin does not parallelize iteration in Python
join join P When on is set to right or outer or when validate is given defaults to pandas
keys keys Y
kurt kurt Y
kurtosis kurtosis Y
last last Y
last_valid_index last_valid_index Y
le le Y See add
loc loc P We do not support: boolean array, callable. Hdk: P, read access fully supported, write access: no row and 2D assignments support
lt lt Y See add
mask mask D
max max Y Hdk: P, only default params supported, otherwise D
mean mean P Modin defaults to pandas if given the level param. Hdk: P. D for level, axis, skipna and numeric_only params
median median P Modin defaults to pandas if given the level param.
melt melt Y
memory_usage memory_usage Y
merge merge P Implemented the following cases: left_index=True and right_index=True, how=left and how=inner for all values of parameters except left_index=True and right_index=False or left_index=False and right_index=True. Defaults to pandas otherwise. Hdk: P, only non-index joins for how=left and how=inner with explicit on are supported
min min Y Hdk: P, only default params supported, otherwise D
mod mod Y See add
mode mode Y
mul mul Y See add
multiply multiply Y See add
ndim ndim Y
ne ne Y See add
nlargest nlargest Y
notna notna Y
notnull notnull Y
nsmallest nsmallest Y
nunique nunique Y Hdk: P, no support for axis!=0 and dropna=False
pct_change pct_change D
pipe pipe Y
pivot pivot Y
pivot_table pivot_table Y
plot plot D
pop pop Y
pow pow Y See add; Hdk: D
prod prod Y
product product Y
quantile quantile Y
query query P Local variables not yet supported
radd radd Y See add
rank rank Y
rdiv rdiv Y See add; Hdk: D
reindex reindex Y Shuffles data
reindex_like reindex_like D
rename rename Y
rename_axis rename_axis Y
reorder_levels reorder_levels Y
replace replace Y
resample resample Y
reset_index reset_index P Hdk: P. D for level parameter Ray and Dask: D when names or allow_duplicates is non-default
rfloordiv rfloordiv Y See add; Hdk: D
rmod rmod Y See add; Hdk: D
rmul rmul Y See add
rolling rolling Y
round round Y
rpow rpow Y See add; Hdk: D
rsub rsub Y See add; Hdk: D
rtruediv rtruediv Y See add; Hdk: D
sample sample Y
select_dtypes select_dtypes Y
sem sem P Modin defaults to pandas if given the level param.
set_axis set_axis Y
set_index set_index Y
shape shape Y Hdk: Y
shift shift Y
size size Y
skew skew P Modin defaults to pandas if given the level param
sort_index sort_index Y
sort_values sort_values Y Shuffles data. Order of indexes that have the same sort key is not guaranteed to be the same across sorts; Hdk: Y
sparse sparse N
squeeze squeeze Y
stack stack Y
std std P Modin defaults to pandas if given the level param.
style style D
sub sub Y See add
subtract subtract Y See add; Hdk: D
sum sum Y Hdk: P, only default params supported, otherwise D
swapaxes swapaxes Y
swaplevel swaplevel Y
tail tail Y
take take Y
to_clipboard to_clipboard D
to_csv to_csv Y
to_dict to_dict D
to_excel to_excel D
to_feather to_feather D
to_gbq to_gbq D
to_hdf to_hdf D
to_html to_html D
to_json to_json D
to_latex to_latex D
to_orc to_orc D
to_parquet to_parquet P Ray/Dask/Unidist: Parallel implementation only if path parameter is a string. In that case, the path parameter specifies a directory where one file is written per row partition of the Modin dataframe.
to_period to_period D
to_pickle to_pickle D Experimental implementation: DataFrame.modin.to_pickle_distributed
to_records to_records D
to_sql to_sql Y
to_stata to_stata D
to_string to_string D
to_timestamp to_timestamp D
to_xarray to_xarray D
transform transform Y
transpose transpose Y
truediv truediv Y See add
truncate truncate Y
tz_convert tz_convert Y
tz_localize tz_localize Y
unstack unstack Y
update update Y
values values Y
value_counts value_counts D
var var P Modin defaults to pandas if given the level param.
where where Y