- What is lazy data?
- Using dask with HoloViews
- Difference between .persist and .compute
  - `persist`: Caching intermediate results but allow further lazy evaluation
  - `compute`: Materialize the data into memory, e.g. ibis->pandas or dask->pandas
- When do you have to consider using persist?
  - Call persist at the end after aggregating or reducing your data in some way
  - Why?
    - Avoids plotting code from evaluating the graph multiple times
- When to consider using compute?
  - Certain interfaces don’t support some methods, e.g. `.sort`, `.iloc`, you might want to perform some operation and then materialize the data into a format that does support these operations

### Lazy Data

Lazy data, or lazy loading of data, is a technique used in which data is not actually loaded until it is needed.  Computational expressions are stored as an object, but are not evaluated until told to do so.  If used effectively, this allows for a more efficient use of system resources and performance by allowing a user to separate a **list of computations** from actually **performing them**.  For example, the following are a few examples of how storing evaluations lazily can be important:  

* You can select different compute engines to do the computations stored in the lazy pipeline (e.g. you can parallelize the computation, you can run it on GPUs, or run it on a remote server, etc.).
* You can describe a series of operations where some of them might drastically reduce the size of the problem before computation begins, e.g. by sampling/selecting/slicing a large dataset for a case when only a small portion is actually used.
* You run the operation "out of core", i.e., processing only a chunk at a time rather than loading all of the data at once, when the same operation must be done on many datapoints that together would not fit into memory


### HoloViews and Lazy Data

Holoviews is compatible with a few different libraries that make use of lazy evaluations.
As explained here: https://holoviz.org/tutorial/Large_Data.html, HoloViews can accept [Dask](https://dask.org/) dataframes just as well as Pandas dataframes.  It has the computational infrastructure to accept lazy data objects, and will evaluate the objects as needed in order to display the relevant information.  In addition, HoloViews has more recently implemented the ability to plot [Ibis](https://ibis-project.org/) objects, making it possible to visualize data stored on a remote database.

### Persist vs Compute

When using lazy evaluation of data, there are 2 different calls you can make on the lazy data objects--`.compute()` or `.persist()`.  The call to `.compute()` is just like it sounds, this is the call to actually run the computation stored in your lazy object.  This function will compute the result and instead of a task graph, you will get the actual computation returned to you.  A Dask (or Ibis) dataframe computed would result in a single Pandas dataframe on your local computer. This should only be done if the result can fit into memory.

In some cases, it may be useful to obtain results mid-way through a computation, but leave the results split up among the different parallel processes that make up the entire computation.  If you have a Dask cluster that is running a computation on several partitions of a large Dask dataframe, you can use `.persist()` to compute something on each of the partitions in the cluster.  The result of each of those partitions will be stored on their respective nodes.  The object returned to you will now point to those running processes, where the computation is stored in memory.  The object you see is still a lazy object, but some part of the computation is persisting in memory split among many nodes.  The benefit of this is that you do not have to keep running the same computation over and over again every time you need to do a different computation or plot further down the task chain.

### When would you want to use .persist() when using HoloViews?

If you are going to be doing any computations before plotting (aggregating or reducing your data, for example), calling .persist() on these computations will avoid evaluating that portion of the task graph repeatedly, e.g. as a user zooms or pans on the plot.  These computations will be stored in memory on the cluster nodes, ready to be used when needed for your various plots.  

### When do you use .compute()?  

Some methods you may use will not support data being stored on multiple nodes.  You may need to perform a sort on the entire dataset or get an index location with with `.iloc()`, and these require putting all the data on one node.  Ideally, you would specify a series of evaluations that could reduce your data (by aggregating, selecting, slicing, etc.), then call `.compute()` to obtain a result that will fit on a single machine's memory, even though it originated from a large distributed dataset.
