# Tutorial for Interactive Data Aggregation Using the RootInteractive Template (for PyHEP 2024)

In this tutorial, we will utilize the RootInteractive template function to generate a standalone client application—a dashboard for interactive data aggregation. Unbinned data will be transferred to the client, with all subsequent client-side aggregation performed in the browser. In typical use cases within the ALICE experiment, approximately 0-0.5 GB of compressed data (depending on the scope) is utilized. Such HTML dashboards are then appended to agendas for interactive browsing without the need for additional software installation.

Data compression is applied not only in the dashboard's HTML file but also in memory. Only parts of the columns are cached/expanded to full size. Employing in-memory compression allows us to store a significant amount of data efficiently.

The primary limitation is determined by the number of columns and rows expanded at any given moment. Using compressed data formats, such as 8-bit or 10-bit, enables extensive machine learning and multi-dimensional parameterization optimization studies with many alternative models (columns). It's feasible to load and interactively inspect approximately \(10^7 \times 20\) expanded attributes in memory. The limiting factor is typically memory capacity.

Lazy evaluation (caching) is recommended to reduce data volume. Utilizing JavaScript aliases, which are also lazily evaluated, is preferable. Integrated memory and CPU monitoring features are planned for future implementation.



- **Import Libraries:** Import all necessary libraries required for the tutorial.
- **Generate Random Columnar Data:** Generate a data source named 'ABCD' with a uniform distribution and add noise to it. Additionally, derive other variables from this data.
- **Create Figure Layout and Histogram Array:** Establish the visual layout for data analysis including figure positioning and histogram configurations.
- **Create Aliases:** Define functions that are evaluated on the client side to enhance interactivity.
- **Define User-Defined Compression:** Implement a combination of lossy and lossless compression techniques to minimize the size of the dashboard in both the generated HTML file and in memory usage.
- **Utilize Standard Template:** Employ a standard template to streamline the creation of the interactive dashboard and example of futerh extension in user code

Additional Resources:
- **Bokeh Interactive Templates:** Multiple dashboard templates are available in `bokehInteractiveTemplates` to simplify dashboard creation by minimizing repetitive coding. Explore these templates to find the most suitable one for your needs.
- **Detailed Template Documentation:** For a comprehensive understanding of how to configure interactive data aggregation at a lower level, refer to our detailed documentation [here](https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/InteractiveDrawing/bokeh/doc/READMEtemplate.md).
- **Further Documentation:** For more in-depth details about the configuration of interactive data aggregation, visit our full documentation [here](https://github.com/miranov25/RootInteractive/tree/master/RootInteractive/InteractiveDrawing/bokeh/doc).


In [None]:
from RootInteractive.InteractiveDrawing.bokeh.bokehDrawSA import bokehDrawSA
from RootInteractive.Tools.pandaTools import initMetadata
from RootInteractive.InteractiveDrawing.bokeh.bokehInteractiveTemplate import getDefaultVarsRefWeights
from RootInteractive.Tools.compressArray import arrayCompressionRelative16
from RootInteractive.InteractiveDrawing.bokeh.bokehTools import mergeFigureArrays
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [None]:
from bokeh.io import output_notebook, output_file
output_notebook()
outputPrefix=""

In [None]:
import numpy as np
import pandas as pd

## Generate Random Columnar Data
Create a data source 'ABCD' with a uniform distribution and add noise.

- **Generate Uniform Random Values:** Create a vector of uniform random values termed "MC true."
- **Emulate Measurement Noise:** Simulate real-world data by adding Gaussian noise to these values, which will later be used to define compression strategies.
- **Add Categorical Data:** Include examples of categorical data - used for multiSelect
- **Annotate Data with Metadata:** Attach metadata to the data to specify axis titles and variable description.

In [None]:
df = pd.DataFrame(np.random.random_sample(size=(40000, 8)), columns=list('ABCDEFGH'))
initMetadata(df)

mapDDC={0:"A0",1:"A1",2:"A2",3:"A3",4:"A4"}
df.eval("Bool=A>0.5", inplace=True)
df.eval("BoolB=B>0.5", inplace=True)
df["AA"]=((df.A*10).round(0)).astype(pd.CategoricalDtype(ordered=True))

df["CC"]=((df.C*5).round(0)).astype(int)
df["DD"]=((df.D*4).round(0)).astype(int)
df["DDC"]=((df.D*4).round(0)).astype(int).map(mapDDC)
df['errY']=df.A*0.02+0.02
for col in ["A","B","C","D","E","F","G","H"]:
    df[f"{col}M"] = df[col] + np.random.normal(loc=0, scale=0.05, size=df.shape[0])
df.meta.metaData = {'A.AxisTitle': "A (cm)", 'B.AxisTitle': "B (cm)", 'C.AxisTitle': "C (s)",
                    'D.AxisTitle': "D (cm)", 'D.Description': "variable",
                    'Bool.AxisTitle': "A>half",'Bool.Description': "A bigger than 0.5",
                   }

## Create Figure Layout and Histogram Array using example template function getDefaultVarsRefWeights
Set up the visual representation for effective data analysis.

- **Utilize Templates for Histograms and Aggregated Data:** Employ templates that support histograms and aggregated data with weights to streamline visualizations.
- **Define Input and Derived Variables:** Specify input variables and create derived variables. These aliases are lazily evaluated on the client side using JavaScript for enhanced performance.
- **Incorporate Client Function Aliases:** Append client-side function aliases that are also lazily evaluated to optimize interactions and processing.
- **Expand Visualization Controls:** Enhance the user interface by extending widgets that control visualization aspects, allowing for more interactive and dynamic data exploration.
- **Enhance Widget Layout:** Improve the layout of widgets to better accommodate user interactions and data display.
- **Update Parameter Array:** Expand the parameter array to include default variables, ensuring a comprehensive setup for data manipulation and display.

In [None]:
variables=df.columns.to_list()
variables+=[ "A*A", "A*A+B", "B/(1+C)", "A+B", "A-B"]
variables.sort()

aliasArray, jsFunctionArray, variables, parameterArray, widgetParams, widgetLayoutDesc, histoArray, figureArray, figureLayoutDesc = getDefaultVarsRefWeights(variables=variables)
aliasArray.append(("multiSelectBitmask", "(A+B>0.5) + (A+B<1) * 2 + (A-B > -0.5) * 4 + (A-B < 0.5) * 8"))
widgetsSelect = [
    ['range', ['A'], {"name":"A"}],
    ['range', ['B'], {"name":"B"}],
    ['range', ['C'], {"name":"C"}],
    ['range', ['D'], {"name":"D"}],
    #categorical data
    ["multiSelect",["AA"],{"name":"AA"}],
    ["multiSelect",["CC"],{"name":"CC"}],
    ["multiSelect",["DD"],{"name":"DD"}],
    #
    ['multiSelectBitmask', ['multiSelectBitmask'], {"name":"multiSelectAll", "mapping":{"A+B>0.5":1,"A+B<1":2,"A-B>-0.5":4,"A-B<0.5":8}, "title":"bitmask(any)", "how":"any"}],
    ['multiSelectBitmask', ['multiSelectBitmask'], {"name":"multiSelectAny", "mapping":{"A+B>0.5":1,"A+B<1":2,"A-B>-0.5":4,"A-B<0.5":8}, "title":"bitmask(all)", "how":"all"}]
    ]
widgetParams = mergeFigureArrays(widgetParams, widgetsSelect)
widgetLayoutDesc["Select"] = [["A","B","C","D"],["AA","CC","multiSelectAll","multiSelectAny"]]
#
parameterArray+=[
        {"name": "varX", "value":"A+B", "options":variables},
        {"name": "varY", "value":"A-B", "options":variables},
        {"name": "varZ", "value":"AA", "options":variables},
        {"name": "varYNorm", "value":"A+B", "options":variables},
        {"name": "varZNorm", "value":"A", "options":variables},
]

## **Utilize and extend  Standard Template**
Employ a standard template to streamline the creation of the interactive dashboard, with examples of future extensions in user code.
- **Use Case Applications:** Typically, templates serve as an initial approximation in realistic scenarios, with the description array subsequently expanded for further customization.
- **Examples:**
  - **Inclusion of Additional Default Plots:** Add other default plots and histograms as needed.
  - **Define User-Defined Parametric Functions:** Customize by defining parametric functions tailored to specific user requirements.


In [None]:
from pprint import pprint
pprint("aliasArray")
pprint(aliasArray)
pprint("widgetParams")
pprint(widgetParams)

## Draw Histograms from Template
- **Output Storage:** The output is stored in a standalone dashboard as an HTML file.
- **Data Volume Comparison:** In the subsequent example, we assess the data volume for the dashboard both with and without compression.
  - **Initial Observation:** Initially, without compression, the output HTML size and browser memory usage are approximately five times larger than when compressed.

In [None]:
output_file(f"{outputPrefix}test_histogramWeights.html")
bokehDrawSA.fromArray(df, None, figureArray, widgetParams, layout=figureLayoutDesc, parameterArray=parameterArray,
                      widgetLayout=widgetLayoutDesc, sizing_mode="scale_width", histogramArray=histoArray, aliasArray=aliasArray,
                      jsFunctionArray=jsFunctionArray)


## Define User-Defined Compression
Implement a mix of lossy and lossless compression techniques to reduce the size of the dashboard in both the generated HTML file and in memory.
- **Detailed Syntax and Description:** For a comprehensive understanding of the compression syntax, consult our [documentation on GitHub](https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/InteractiveDrawing/bokeh/doc/READMEcompression.md).
- **Example of Measured Data Compression (see cell below):** Data points identified by the regular expression `.*M` utilize 8-bit relative compression.
- **Precision for Other Data Points:** All other floating-point data are stored with 10 bits of precision.
- **Data Volume Reduction:** In the following example, there is a data volume reduction of about 20% in the HTML file.
- **Use Case in the ALICE Experiment:** Compression reduces data volume by 10-20% in scenarios involving 20 to 200 columns, depending on the precision and data types.
- **Client-Side Compression:** On the client side, similar compression techniques are employed, where lazy evaluation ensures that only cached columns are expanded, typically around 10 columns.

####  For effective data volume management, the extensive use of lazily evaluated aliases on the client side is recommended.

In [None]:
arrayCompression=[  
        (".*M",[("relative",8), ("code",0), ("zip",0)]),
        (".*.*",[("relative",10), ("code",0), ("zip",0)]),
]

In [None]:
output_file(f"{outputPrefix}test_histogramWeights_compressed.html")
bokehDrawSA.fromArray(df, None, figureArray, widgetParams, layout=figureLayoutDesc, parameterArray=parameterArray, arrayCompression=arrayCompression,
                      widgetLayout=widgetLayoutDesc, sizing_mode="scale_width", histogramArray=histoArray, aliasArray=aliasArray,
                      jsFunctionArray=jsFunctionArray)