![LOGO](../../../img/MODIN_ver2_hrz.png)

<center><h2>Scale your pandas workflows by changing one line of code</h2>


# Exercise 3: Not Implemented

**GOAL**: Learn what happens when a function is not yet supported in Modin.

When functionality has not yet been implemented for `HdkOnNative` execution, we default to pandas as follows

![](../../../img/hdk_convert_to_pandas.png)

We convert the Modin dataframe to a pyarrow.Table, perform a lazy tree execution in HDK, render it as a pyarrow.Table, convert it to pandas to perform the operation, and then convert it back to Modin when complete. These operations will have a large overhead due to the communication involved and will take longer than pandas.

When this is happening, a warning will be given to the user to inform them that this operation will take longer than usual. For example, `DataFrame.mask` is not supported. In this case, when a user tries to use it, they will see this warning:

```
UserWarning: `DataFrame.mask` defaulting to pandas implementation.
```

#### Relation engine limitations
As the `HdkOnNative` execution is backed by relation algebra based DB engine, there is a certain set of limitations on operations that could be used in Modin with such an execution. For example arbitrary functions in `DataFrame.apply` are not supported as the HDK engine can't execute python callables against its tables, this means that `DataFrame.apply(python_callable)` will **always** be defaulting to pandas. 

For more info about `HdkOnNative` limitations visit the appropriate section on read-the-docs: [relation algebra limitations](https://modin.readthedocs.io/en/stable/flow/modin/experimental/core/execution/native/implementations/hdk_on_native/index.html#relational-engine-limitations).

If your flow mainly operates with non-relational algebra operations, you should better choose non-HDK execution (for example, `PandasOnRay`).

## Concept for exercise: Default to pandas

In this section of the exercise we will see first-hand how the runtime is affected by operations that are not implemented.

In [None]:
import modin.pandas as pd
import pandas
import numpy as np
import time
import modin.config as cfg
cfg.StorageFormat.put("hdk")

frame_data = np.random.randint(0, 100, size=(2**18, 2**8))
df = pd.DataFrame(frame_data)

In [None]:
pandas_df = pandas.DataFrame(frame_data)

In [None]:
modin_start = time.time()

print(df.mask(df < 50))

modin_end = time.time()
print("Modin mask took {} seconds.".format(round(modin_end - modin_start, 4)))

In [None]:
pandas_start = time.time()

print(pandas_df.mask(pandas_df < 50))

pandas_end = time.time()
print("pandas mask took {} seconds.".format(round(pandas_end - pandas_start, 4)))

## To request a feature please open an issue: https://github.com/modin-project/modin/issues

For a complete list of what is implemented, see the [Supported APIs](https://modin.readthedocs.io/en/latest/supported_apis/index.html) section.