Notes (to be removed):

- What is lazy data?
- Using dask with HoloViews
- Difference between .persist and .compute
  - `persist`: Caching intermediate results but allow further lazy evaluation
  - `compute`: Materialize the data into memory, e.g. ibis->pandas or dask->pandas
- When do you have to consider using persist?
  - Call persist at the end after aggregating or reducing your data in some way
  - Why?
    - Avoids plotting code from evaluating the graph multiple times
- When to consider using compute?
  - Certain interfaces don’t support some methods, e.g. `.sort`, `.iloc`, you might want to perform some operation and then materialize the data into a format that does support these operations
  
Question:  Need examples?

### Lazy Data

Lazy data, or lazy loading of data, is a technique used in which data is not actually loaded until it is needed.  If used effectively, this allows for a more efficient use of system resources and performance.  Additionally, a user can build up a list of tasks, or build a task graph, that uses the data, but doesn't actually run anything until told to do so.  The *expression* is stored instead of the result.

Libraries such as Dask use lazy evaluation to their benefit.  Dask will build a task graph of lazy evaluations that it will then run in parallel if it can, depending on compute resources.  This not only speeds things up in time, but it allows computations on larger datasets to be possible when you don't have the memory on your local machine to store the entire dataset during computation.

Using a library called Ibis, you can manipulate your data using various Ibis expressions, which are lazy, so they are not carried out right away. These expressions will actually be performed using the resources of a remote database instead of your local computer.  

Lazy expresssions can be quite powerful in that they are essentially making plans to compute something, but packages like Dask or Ibis decide how to compute it.  

### Using Dask with HoloViews

As explained here: https://holoviz.org/tutorial/Large_Data.html, HoloViews can accept Dask dataframes just as well as Pandas dataframes.  It has the infostructure to accept the lazy data objects, and will evaluate the objects as needed in order to display the relevant information.  
In addition, HoloViews can also accept Ibis objects, and can plot data making use of the database to actually compute the expressions stored in the lazy object task graph.  The syntax to create HoloViews plots will be the same whether using a Pandas, Dask, or Ibis object.  

### Persist vs Compute

When using lazy evaluation of data, there are 2 different calls you can make on the lazy data objects--`.compute()` or `.persist()`.  The call to `.compute()` is just like it sounds, this is the call to actually run the computation stored in your lazy object.  This function will compute the result and instead of a task graph, you will get the actual computation returned to you.  A Dask (or Ibis) dataframe computed would result in a single Pandas dataframe on your local computer. This should only be done if the result can fit into memory.

In some cases, it may be useful to obtain results mid-way through a computation, but leave the results split up among the different parallel processes that make up the entire computation.  If you have a Dask cluster that is running a computation on several partitions of a large Dask dataframe, you can use `.persist()` to compute something on each of the partitions in the cluster.  The result of each of those partitions will be stored on their respective nodes.  The object returned to you will now point to those running processes, where the computation is stored in memory.  The object you see is still a lazy object, but some part of the computation is persisting in memory split among many nodes.  The benefit of this is that you do not have to keep running the same computation over and over again every time you need to do a different computation or plot further down the task chain.

### When would you want to use .persist() when using HoloViews?

If you are going to be doing any computations before plotting (aggregating or reducing your data, for example), calling .persist() on these computations will avoid evaluating that portion of the task graph several times.  These computations will be stored in memory on the cluster nodes, ready to be used when needed for your various plots.  

### When do you use .compute()?  

Some methods you may use will not support data being stored on multiple nodes.  You may need to perform a sort on the entire dataset or get an index location with with `.iloc()`, and these require putting all the data on one node.  In this case, it will be best to do as much as you can to reduce or aggregate your data on each node, and then use `.compute()` to bring everything to a single node.  You can then use the methods that do not work across nodes to obtain your desired result.  