This example showcases best practices for loading data into a Syd Viewer. 

It gives three examples for different contexts related to how much data is loaded and how long it takes to load. 

In [1]:
try:
    import google.colab
    # We're in Colab
    !pip install git+https://github.com/landoskape/syd.git

except ImportError:
    pass

In [2]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
from syd import make_viewer, Viewer

In [3]:
# Let's create a toy dataset to create syd viewers from.
# It's called dataset_on_disk because we're pretending it's a large file on disk.
# Here, we create 10 datasets, each with 100 samples and 10000 timepoints (your data will probably be more complex!)
num_datasets = 10
num_samples = 100
num_timepoints = 10000
dataset_on_disk = [np.random.randn(num_samples, num_timepoints) for _ in range(num_datasets)]

# Let's create a dataset loading function that will simulate loading data from a large file on disk.
# I recognize that this is a bit of a contrived example, but it's just for demonstration --
# you can imagine that load_dataset does some heavy dataloading (e.g. with np.load) and even 
# handles preprocessing of the data.
def load_dataset(dataset_index):
    return dataset_on_disk[dataset_index]

## Option 1: Preload the data into memory and access it as a global variable

When to use this option?
Option 1 is best when you have a small dataset and you want to make a quick viewer. 

For example, if you plan to use a jupyter notebook to explore your data and you know that you'll have some result
processed in the working memory of the notebook, you can make a viewer like this.

Option 1 is useful for simple viewers - especially ones for just taking a look at the data without
complex processing etc. 

In [None]:
# Option 1: Preload the data into memory and access it as a global variable
dataset = [load_dataset(i) for i in range(num_datasets)]

# Make the plot function after preloading the data into memory -- 
# It'll know to use the "dataset" variable to get the data.
def plot(state):
    c_data = dataset[state["dataset_index"]]
    c_sample = c_data[state["sample_index"]]
    fig = plt.figure(figsize=(4, 4))
    ax = plt.gca()
    ax.plot(c_sample)
    return fig

# Make the viewer -- notice that we don't need to pass in the dataset variable anywhere,
# it's defined globally and referenced in the plot function.
viewer = make_viewer(plot)
viewer.add_integer("dataset_index", value=0, min=0, max=num_datasets - 1)
viewer.add_integer("sample_index", value=0, min=0, max=num_samples - 1)
viewer.show()

## Option 1a: Preload the data into memory using a delayed cache and access it as a global variable

When to use this option?
Option 1a is good for simple viewers but where the dataloading takes a long time _and_ you might not want to look at all of it.

So, load it on-demand with a cache!

Note:
There are many ways to implement a cache in python, I'll just show a simple "manual" example here. For more complex cases, consider
options like joblib, dask, etc.

In [None]:
# Option 1a: Preload the data into memory using a delayed cache and access it as a global variable
dataset_cache = [None] * num_datasets
def load_dataset_cached(dataset_index):
    # Only reloads the dataset if it's not already in the cache
    if dataset_cache[dataset_index] is None:
        dataset_cache[dataset_index] = load_dataset(dataset_index)
    return dataset_cache[dataset_index]

# Make the plot function after defining your cached loader.
# It'll know to use the cached loader whenever it's called.
def plot(state):
    c_data = load_dataset_cached(state["dataset_index"])
    c_sample = c_data[state["sample_index"]]
    fig = plt.figure(figsize=(4, 4))
    ax = plt.gca()
    ax.plot(c_sample)
    return fig

# Make the viewer -- notice that we don't need to pass in the "load_dataset_cached" function anywhere,
# it's defined globally and referenced in the plot function.
viewer = make_viewer(plot)
viewer.add_integer("dataset_index", value=0, min=0, max=num_datasets - 1)
viewer.add_integer("sample_index", value=0, min=0, max=num_samples - 1)
viewer.show()

## Option 2: Load the data as an attribute to a viewer class

When to use this option?
Creating a class for your viewer with data as attributes is great for more complex viewers that need to do some 
post-processing of the data or for when the data takes a long time to load and you want to preserve it while 
potentially making other updates.

In [None]:
# Option 2: Load the data as an attribute to a viewer class
class ViewerWithData(Viewer):
    def __init__(self):
        # In the object constructor, we load the data as an attribute. 
        # This way the data is "ready" once you've created the viewer object.
        self.dataset = [load_dataset(i) for i in range(num_datasets)]

        self.add_integer("dataset_index", value=0, min=0, max=num_datasets - 1)
        self.add_integer("sample_index", value=0, min=0, max=num_samples - 1)

    def filter_data(self, c_sample):
        # You might want to do some processing of the data here. Filtering is fast, but
        # can be slow if you have a lot of data. Filtering on-demand is a great way to 
        # get good quick viewer performance while not having to load and filter the data
        # upfront!! (You can imagine this is useful for all kinds of data processing!)
        return c_sample[c_sample > 0]
        
    def plot(self, state):
        # Now that we're using the class version -- self.dataset is available from the plot function.
        c_data = self.dataset[state["dataset_index"]]
        c_sample = c_data[state["sample_index"]]
        filtered_sample = self.filter_data(c_sample)
        fig = plt.figure(figsize=(4, 4))
        ax = plt.gca()
        ax.plot(filtered_sample)
        return fig

# Now make the viewer object --- this will load the data as an attribute.
viewer = ViewerWithData()

# And now show the viewer!
viewer.show()

## Using the jupyter reload extension
One of the great things about jupyter notebooks is the autoreload extension. I use it all the time to improve my viewers as I work on them. Problem is - if the data takes a long time to load, but you want to iterate on plot functions, you don't want to keep reloading the data every time you make a change to the plot function. So, a workflow:

1. Create your viewer class in a module that you can import from. Then, create an instance of the viewer class in one cell. 
2. Deploy it in a second cell.
3. Make changes to the viewer class in the module.
    - Try it yourself, you can change the filter function, change the color of the plot, etc etc. 
    - Doesn't have to be specifically to the plot function (but it has to be downstream of the object construction). 
4. Re-deploy the second cell (without recreating the viewer object in the first cell!).
5. See your changes show up without reloading the data!!!


**Note: this one won't work in colab, clone the repo and run it locally to test it out!**

In [7]:
# Set up the autoreload extension
%reload_ext autoreload
%autoreload 2

# Import the viewer class
from example_viewer import ViewerWithData

# Create an instance of the viewer class -- this will load the data
viewer = ViewerWithData()

In [None]:
# In a second cell, deploy the viewer
viewer.show()

# You can now make changes to the viewer class in the "example_viewer.py" module and see them show up here
# without reloading the data by rerunning this cell!

## Some extra notes:
- You can combine the class version with the delayed cache version! Just make sure that you implement the cached loading
  in the class. See below for an example:

In [11]:
# This won't work -- it's just to show how to implement cached loading in a class!
class ViewerWithData(Viewer):
    def __init__(self, num_datasets):
        self.dataset = [None] * num_datasets

        # .. syd parameters ..

    def get_dataset(self, dataset_index):
        if self.dataset[dataset_index] is None:
            self.dataset[dataset_index] = load_dataset(dataset_index)
        return self.dataset[dataset_index]

    def plot(self, state):
        c_data = self.get_dataset(state["dataset_index"])
        # .. plotting code ..

## Some extra notes:
- It's great to use dictionaries to store data for cached loaders....

In [12]:
# This won't work -- it's just to show how to implement cached loading in a class!
class ViewerWithData(Viewer):
    def __init__(self, names_of_datasets_you_might_want_to_load):
        self.dataset_names = names_of_datasets_you_might_want_to_load
        self.dataset = {name: None for name in self.dataset_names}

        # .. syd parameters ..

    def get_dataset(self, dataset_name):
        if self.dataset[dataset_name] is None:
            self.dataset[dataset_name] = load_dataset(dataset_name)
        return self.dataset[dataset_name]

    def plot(self, state):
        c_data = self.get_dataset(state["dataset_name"])
        # .. plotting code ..