![LOGO](../../../img/MODIN_ver2_hrz.png)

<center><h2>Scale your pandas workflows by changing one line of code</h2>



# Exercise 4: Experimental Features

**GOAL**: Explore some of the experimental features being added to Modin.

### Concept for exercise: Progress Bar


Sometimes when running long functions on DataFrames, it can be hard to tell how much progress has been made, as well as how much longer the function will run. A progress bar allows users to see the estimated progress and completion time of each line they run, in environments such as a shell or Jupyter notebook.

To enable Modin's Progress Bar, add the following lines of code after importing `modin.pandas`:
```python
from tqdm import tqdm
from modin.config import ProgressBar
ProgressBar.enable()
```

In this exercise, we'll see how the progress bar can improve our experience running dataframe queries!

In [1]:
import modin.pandas as pd
import numpy as np
from tqdm import tqdm
from modin.config import ProgressBar, Engine
ProgressBar.enable()
Engine.put("dask")

frame_data = np.random.randint(0, 100, size=(2**18, 2**8))
df = pd.DataFrame(frame_data).add_prefix("col")


    from distributed import Client

    client = Client()



Distributing Dataframe:   0%           Elapsed time: 00:00, estimated remaining time: ?

On longer functions, its nice to be able to see an estimation of how much longer things will take!

In [2]:
df = df.applymap(lambda x: ~x)
df



Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,...,col246,col247,col248,col249,col250,col251,col252,col253,col254,col255
0,-40,-25,-95,-68,-80,-65,-94,-76,-36,-52,...,-68,-97,-15,-13,-98,-62,-63,-87,-30,-92
1,-45,-61,-86,-18,-36,-64,-34,-67,-89,-62,...,-15,-63,-56,-10,-42,-47,-88,-57,-71,-22
2,-53,-99,-34,-4,-39,-25,-68,-17,-82,-73,...,-77,-91,-94,-28,-18,-25,-73,-18,-86,-58
3,-21,-98,-15,-53,-12,-81,-68,-75,-43,-30,...,-50,-70,-55,-95,-91,-27,-27,-43,-14,-44
4,-99,-5,-48,-99,-18,-68,-49,-18,-51,-62,...,-53,-4,-23,-39,-51,-3,-55,-67,-69,-9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
262139,-20,-77,-56,-94,-14,-56,-26,-3,-56,-86,...,-76,-8,-16,-71,-49,-14,-48,-6,-33,-3
262140,-33,-15,-94,-77,-5,-3,-40,-73,-20,-17,...,-83,-12,-80,-28,-15,-6,-86,-24,-76,-5
262141,-57,-30,-21,-75,-83,-32,-55,-66,-22,-10,...,-8,-41,-88,-49,-65,-63,-35,-5,-94,-66
262142,-39,-18,-64,-79,-86,-45,-87,-5,-17,-63,...,-29,-79,-96,-88,-12,-19,-63,-61,-4,-19


### Concept for exercise: Spreadsheet

For those who have worked with Excel, the Spreadsheet API will definitely feel familiar! The Spreadsheet API is a Jupyter notebook widget that allows us to interact with Modin DataFrames in a spreadsheet-like fashion while taking advantage of the underlying capabilities of Modin. The widget makes it quick and easy to explore, sort, filter, and edit data as well as export the changes as reproducible code.

Let's look back at a subset of the 2015 NYC Taxi Data from Exercise 2, and see how the Spreadsheet API can make it easy to play with the data!

In [3]:
!jupyter nbextension enable --py --sys-prefix modin_spreadsheet
ProgressBar.disable()

Enabling notebook extension modin_spreadsheet/extension...
      - Validating: ok


In [None]:
import modin.experimental.spreadsheet as mss

s3_path = "s3://dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv"
modin_df = pd.read_csv(s3_path, parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"], quoting=3, nrows=1000)

In [None]:
spreadsheet = mss.from_dataframe(modin_df)
spreadsheet

### Thank you for participating!