In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from nb_utils import showAttribs

# Module: eliana.datasets

The `eliana.datasets` paradigm involves selecting a high-level task (`task_instance`), collecting a list of task executions in a DataFrame called `task_instance.meta`, and providing access to its traces, which correspond to event logs related to each task execution. This is achieved using `task_instance.load_trace(i)`, where `i` is one of the indices in `task_instance.meta.index`.

A key feature of this framework is its built-in caching mechanism. When collecting datasets, queries to external sources (e.g., Elasticsearch) can be time-consuming, especially for large datasets spanning months. Eliana first checks for cached metadata or traces; if not found, it performs the query and generates the cache. This process is transparent to the user, allowing code to be written as if all data is in memory. The following code demonstrates this functionality:

```python
logs = ElianaDatasetMyTask(start_timestamp=start, stop_timestamp=stop)
print(f"There are {len(logs.meta)} executions of MyTask between {start} and {stop}")
print(f"And the events in the first trace are:")
df_first_trace = logs.load_trace(1)
display(df_first_trace)
```

## ElianaDataset

 All the eliana datasets shares the properties of the base class ElianaDataset.

In [4]:
from eliana.datasets import ElianaDataset

showAttribs(ElianaDataset)


ElianaDataset       : High level class for Eliana Dataset Manipulation, oriented to pandas DataFrames.

Constructor
__init__            : Common methods for Eliana Dataset Manipulation.

Properties
data_dir            : Path for the raw data directory, including dataset_name().
index               : Alias for meta for backward compatibility.
meta                : Contains the metadata of the dataset.
meta_filename       : Full path for the index filename.
processed_dir       : Path for the processed data directory, including dataset_name().

Methods
do_query_meta       : Code to generate the metadata of the dataset. Must be overloaded!
do_query_trace      : Code to generate the log trace between two timestamps. Must be overloaded!
load_trace          : Load a raw trace based on its ID in the metadata.
preload_traces      : Preloads traces for faster access.
set_dict_for_trace_filenames: Sets unique identifiers for trace filenames. This must be overriden.
signature           : When inhe

### Configuration
ElianaDataset classes shares this configuration, which can be specified in the constructor by name or by dictionary access as in ```ElianaDataset(**config)```

The common configurations to ElianaDataset are:

In [7]:
ElianaDataset().config.to_dict()

{'cols': None,
 'dataset_dir': 'data/raw',
 'dataset_name': 'ElianaDataset',
 'event_cols': None,
 'meta_filename': '_index',
 'trace_prefix': 'ElianaDataset',
 'trace_suffix': '',
 'use_cache': True}

## Parlogs Observations

Public dataset in Huggin Faces, at https://huggingface.co/datasets/Paranal/parlogs-observations. In short, when some instrument is commanded to start an observation the system loads a script called *template* that has a *start* and an *end* timestamps, and a trace associated to the execution itself.

In [17]:
from eliana.datasets import ParlogsObservations

showAttribs(ParlogsObservations)

ParlogsObservations : Public Dataset with VLTI logs from 2019.

Constructor
__init__            : Documentation pending

Properties
available_periods   : 
available_sources   : 
available_systems   : 
data_dir            : Path for the raw data directory, including dataset_name().
index               : Alias for meta for backward compatibility.
meta                : Contains the metadata of the dataset.
meta_filename       : Full path for the index filename.
period              : Documentation pending
processed_dir       : Path for the processed data directory, including dataset_name().
source              : Documentation pending
system              : Documentation pending

Methods
do_query_meta       : Generates the metadata for the dataset.
do_query_trace      : Generates the trace for the dataset based on the row of metadata.
load_trace          : Load a raw trace based on its ID in the metadata.
preload_public_file : Documentation pending
preload_traces      : Preloads traces from 

In [18]:
logs = ParlogsObservations()
logs.config.to_dict()

{'cols': None,
 'dataset_dir': 'data/raw',
 'dataset_name': 'ParlogsObservations',
 'event_cols': ['logtype', 'procname', 'keywname', 'keywvalue', 'logtext'],
 'meta_filename': '_index',
 'trace_prefix': 'ParlogsObservations',
 'trace_suffix': '',
 'use_cache': False,
 'system': 'PIONIER',
 'period': '1d',
 'source': 'Instrument'}

In [19]:
print("Available periods: ", logs.available_periods)
print("Available sources: ", logs.available_sources)
print("Available systems: ", logs.available_systems)

Available periods:  ['1d', '1w', '1m', '6m']
Available sources:  ['Instrument', 'Subsystems', 'Telescopes', 'All']
Available systems:  ['PIONIER', 'GRAVITY', 'MATISSE']


### Simple example

In [30]:
logs = ParlogsObservations(period="1w", source="Instrument", system="PIONIER")
logs.meta

Unnamed: 0,START,END,TIMEOUT,system,procname,TPL_ID,ERROR,USER_ABORT,SECONDS,TEL,TPL_EXEC
0,2019-04-01 22:29:07.746,2019-04-01 22:29:10.519,False,PIONIER,bob_ins,PIONIER_gen_tec_setup,False,False,2.0,AT,STOP
1,2019-04-02 07:40:48.591,2019-04-02 07:41:21.459,False,PIONIER,bob_25299,PIONIER_gen_tec_setup,False,False,32.0,AT,STOP
2,2019-04-02 07:41:21.488,2019-04-02 07:42:52.294,False,PIONIER,bob_25299,PIONIER_gen_tec_niobate,False,False,90.0,AT,STOP
3,2019-04-02 07:42:52.314,2019-04-02 07:43:24.984,False,PIONIER,bob_25299,PIONIER_gen_tec_setup,False,False,32.0,AT,STOP
4,2019-04-02 09:21:52.714,2019-04-02 09:22:25.281,False,PIONIER,bob_ins,PIONIER_gen_tec_setup,False,False,32.0,AT,STOP
...,...,...,...,...,...,...,...,...,...,...,...
389,2019-04-07 20:37:40.154,2019-04-07 20:38:04.954,False,PIONIER,bob_ins,PIONIER_gen_tec_setup,False,False,24.0,AT,STOP
390,2019-04-07 20:38:04.980,2019-04-07 20:39:04.254,False,PIONIER,bob_ins,PIONIER_gen_tec_niobate,False,False,59.0,AT,STOP
391,2019-04-07 20:39:04.272,2019-04-07 20:39:29.141,False,PIONIER,bob_ins,PIONIER_gen_tec_setup,False,False,24.0,AT,STOP
392,2019-04-07 20:39:29.165,2019-04-07 20:40:03.846,False,PIONIER,bob_ins,PIONIER_gen_tec_standby,False,False,34.0,AT,STOP


In [31]:
trace = logs.load_trace(2)
trace

Unnamed: 0,@timestamp,system,hostname,loghost,logtype,envname,procname,procid,module,keywname,keywvalue,keywmask,logtext,trace_id,event
0,2019-04-02 07:41:21.488,PIONIER,wpnr,wpnr,LOG,wpnr,bob_25299,29,seq,,,,PIONIER_gen_tec_niobate -- Niobates alignment ...,2,LOG bob_25299 PIONIER_gen_tec_niobate -- Nio...
1,2019-04-02 07:41:21.488,PIONIER,wpnr,wpnr,LOG,wpnr,bob_25299,29,seq,,,,Started at 2019-04-02T07:41:21 (underlined),2,LOG bob_25299 Started at 2019-04-02T07:41:21...
2,2019-04-02 07:41:21.590,PIONIER,wpnr,wpnr,LOG,wpnr,bob_25299,29,seq,,,,DET SCAN ST = 'T',2,LOG bob_25299 DET SCAN ST = 'T'
3,2019-04-02 07:41:21.590,PIONIER,wpnr,wpnr,LOG,wpnr,bob_25299,29,seq,,,,DET SCAN NREADS = '512',2,LOG bob_25299 DET SCAN NREADS = '512'
4,2019-04-02 07:41:21.590,PIONIER,wpnr,wpnr,LOG,wpnr,bob_25299,29,seq,,,,DET SCAN STROKE = '40e-6',2,LOG bob_25299 DET SCAN STROKE = '40e-6'
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,2019-04-02 07:42:52.000,PIONIER,wpnr,wpnr,FEVT,wpnr,logManager,0,,INS.LAMP2.STOP,,wpnics,Lamp turned off.,2,FEVT logManager INS.LAMP2.STOP Lamp turned off.
61,2019-04-02 07:42:52.000,PIONIER,wpnr,wpnr,FEVT,wpnr,logManager,0,,INS.OPTI1.MOVE,,wpnics,Motion execution.,2,FEVT logManager INS.OPTI1.MOVE Motion execution.
62,2019-04-02 07:42:52.289,PIONIER,wpnr,wpnr,LOG,wpnr,bob_25299,29,seq,,,,Template PIONIER_gen_tec_niobate finished.,2,LOG bob_25299 Template PIONIER_gen_tec_nioba...
63,2019-04-02 07:42:52.292,PIONIER,wpnr,wpnr,LOG,wpnr,bob_25299,29,seq,,,,Finished in 91 seconds at 2019-04-02T07:42:52 ...,2,LOG bob_25299 Finished in 91 seconds at 2019...


In [32]:
print(f"Number of events: {len(trace)}")
print(f"Systems involved: {trace['system'].unique()}")

Number of events: 65
Systems involved: ['PIONIER']


### Use multiple sources

In [None]:
logs = ParlogsObservations(period="1w", source="All", system="PIONIER")
trace = logs.load_trace(2)
print(f"Number of events: {len(trace)}")
print(f"Systems involved: {trace['system'].unique()}")

Number of events: 1165
Systems involved: ['PIONIER' 'DL' 'AT1' 'ARAL' 'ISS' 'AT3' 'AT2' 'AT4']


In [35]:
logs = ParlogsObservations(period="1w", source="Telescopes", system="PIONIER")
trace = logs.load_trace(2)
print(f"Number of events: {len(trace)}")
print(f"Systems involved: {trace['system'].unique()}")

Number of events: 47
Systems involved: ['AT1' 'AT3' 'AT2' 'AT4']


## ElasticsearchQueryDataset

Class specific to manipulate Elasticsearch or Opensearch queries.

In [6]:
from eliana.datasets import ElasticsearchQueryDataset

showAttribs(ElasticsearchQueryDataset)

ElasticsearchQueryDataset: Parlogan dataset specialized in Elasticsearch queries.

Constructor
__init__            : Main parameter: **config

Properties
data_dir            : Path for the raw data directory, including dataset_name().
index               : Alias for meta for backward compatibility.
meta                : Contains the metadata of the dataset.
meta_filename       : Full path for the index filename.
processed_dir       : Path for the processed data directory, including dataset_name().
start_timestamp     : Start timestamp for this dataset instance.
stop_timestamp      : Stop timestamp for this dataset instance.

Methods
add_trace_filter    : Add filters to trace query: (trace_query) -(filter1 OR filter2 ...)
do_query_meta       : Code to generate the metadata of the dataset. Must be overloaded!
do_query_trace      : Code to generate the log trace between two timestamps. Must be overloaded!
filtered_trace_query: Returns (self.trace_query) -(filter1 OR filter2 ...)
load_trac

## Log2Table
Utility to read a sequential set of events and fill a table with complex behavior. It is based on finite state machines.

In [7]:
from eliana.datasets import Log2Table
showAttribs(Log2Table)

Log2Table           : Parse a list of events following a criteria based in state machines to fill a table with complex behavior.

Constructor
__init__            : Constructor

Properties

Methods
parse               : Iterate DataFrame or list and create rows previously defined
start_when          : Declare initialization of a row
when                : Template to fill a column in a row based on its content


## Examples

(TBD)
* [Very simple example from a synthetic log](to_be_done)
* [Longer example](to_be_done)
* [Advanced behaviors](to_be_done)
* [Mix with eliana.datasets](to_be_done)
```