# Analysis history

In an attempt to enhance the reproducibility and data provenance for data analysed using `peaks`, we attempt to keep track of data operations performed using in-built `peaks` functions (including the name of the calling function), and store these in a JSON-type record within the data attributes of the :class:`xarray.DataArray`, under the entry `analysis_history`. While a well-organised Jupyter notenbook is a good start, the user has to be careful to only execute cells in order, and a typical analysis workflow may involve loading and processing some data and saving an intermediate step, before further post-processing in another notebook, necessating a richer storing of data analysis history. Saving and loading using in-built `peaks` methods will attempt to keep a persistent data analysis record in tact.

In [1]:
# Import packages
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
import peaks as pks
import os

# Set default options
xr.set_options(cmap_sequential='Purples', keep_attrs=True)
%matplotlib inline
%config InlineBackend.figure_format='retina'

In [2]:
# Set the default file path
pks.File.path = os.getenv('PEAKS_EXAMPLE_DATA_PATH')

To explore the analysis history, we will load and process some data, involving various directly called and some other hidden (e.g. estimating Fermi level) analysis steps

In [3]:
#Load multiple scans into a list
disps = [pks.load(f"210326_GM2-667_GK_{i}.xy") for i in range(1,4)]

# Merge the data, applying offsets in the default angle direction and trimming the data along this dimension 
merged_disp = pks.merge_data(disps, offsets=12, sel=slice(-10,10))

merged_disp_k = merged_disp.k_convert()

<div class="alert alert-block alert-warning"><b>Analysis warning: </b> Some manipulator data and/or normal emission data was missing or could not be passed. Assuming default values of: polar: 0, tilt: 0, azi: 0, norm_tilt: 0, norm_azi: 0. </div>

Converting data to k-space - initialising:   0%|          | 0/3 [00:00<?, ?it/s]

<div class="alert alert-block alert-warning"><b>Analysis warning: </b> EF_correction set from automatic estimation of Fermi level to: 16.82 eV.. NB may not be accurate. </div>

The analysis history for the various datasets can be displayed by calling the `.history` accessor. This attempts to display the history in a formatted table:

In [4]:
disps[0].history()

Unnamed: 0,time,peaks version,record,fn_name
0,2024-08-19T16:00:15.774714,0.1.0-dev,"{'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_1.xy'}",_load_single_data


However, some processing (e.g. merging) involve combining multiple history records together, complicating the display:

In [5]:
merged_disp_k.history()

Unnamed: 0,time,peaks version,record,fn_name
0,2024-08-19T16:00:16.093411,0.1.0-dev,"{'record': 'Merged 210326_GM2-667_GK_3 & 210326_GM2-667_GK_2 & 210326_GM2-667_GK_1 along theta_par with offsets [0, 12, 24] with data cropped to theta_par=slice(-10, 10, None).', 'original scan 0 analysis history': [{'time': '2024-08-19T16:00:15.774714', 'peaks version': '0.1.0-dev', 'record': {'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_1.xy'}, 'fn_name': '_load_single_data'}], 'original scan 1 analysis history': [{'time': '2024-08-19T16:00:15.901739', 'peaks version': '0.1.0-dev', 'record': {'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_2.xy'}, 'fn_name': '_load_single_data'}], 'original scan 2 analysis history': [{'time': '2024-08-19T16:00:16.026832', 'peaks version': '0.1.0-dev', 'record': {'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_3.xy'}, 'fn_name': '_load_single_data'}]}",merge_data
1,2024-08-19T16:00:16.137172,0.1.0-dev,EF_correction set from automatic estimation of Fermi level to: 16.82 eV.. NB may not be accurate.,_add_estimated_EF
2,2024-08-19T16:00:17.341779,0.1.0-dev,"Converted to k-space using the following parameters: Angles: {'polar': 0, 'tilt': 0, 'azi': 0, 'ana_polar': 0, 'defl_par': 0, 'defl_perp': 0, 'norm_polar': 0, 'norm_tilt': 0, 'norm_azi': 0}, Time taken: 1.23s.",k_convert


The `.history.get(index)` method can be used to attempt a clearer display or optionally return (using the `return_history=True` flag) the history entry for a specific index of the table shown above (defaulting to the last index if no arguments specified). For example, the 0th index of the above table containins multiple concatanated records of the scans before they were merged:

In [6]:
merged_disp_k.history.get(0)

{'fn_name': 'merge_data',
 'peaks version': '0.1.0-dev',
 'record': {'original scan 0 analysis history': [{'fn_name': '_load_single_data',
                                                  'peaks version': '0.1.0-dev',
                                                  'record': {'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example '
                                                                          'data/210326_GM2-667_GK_1.xy',
                                                             'loader': '_load_StA_Phoibos_data',
                                                             'loc': 'StA-Phoibos',
                                                             'record': 'Data '
                                                                       'loaded'},
                                                  'time': '2024-08-19T16:00:15.774714'}],
            'original scan 1 analysis history': [{'fn_name': '_load_single_data',
          

## Saving analysis history
The `.hisotry.save` method can be used to save the complete analysis history as a JSON record.

In [7]:
merged_disp_k.history.save('saving_analysis_metadata.json')

In [8]:
# This can be loaded in the standard way, or e.g. viewed in a web browser or other software

# Example method for Jupyter
import json
import pprint

# Load JSON data from a file
file_path = 'saving_analysis_metadata.json'
with open(file_path, 'r') as file:
    json_data = json.load(file)

# Pretty print the JSON data
pprint.pprint(json_data)

[{'fn_name': 'merge_data',
  'peaks version': '0.1.0-dev',
  'record': {'original scan 0 analysis history': [{'fn_name': '_load_single_data',
                                                   'peaks version': '0.1.0-dev',
                                                   'record': {'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example '
                                                                           'data/210326_GM2-667_GK_1.xy',
                                                              'loader': '_load_StA_Phoibos_data',
                                                              'loc': 'StA-Phoibos',
                                                              'record': 'Data '
                                                                        'loaded'},
                                                   'time': '2024-08-19T16:00:15.774714'}],
             'original scan 1 analysis history': [{'fn_name': '_load_single_data'

It is also saved in the metadata record if a :class:`xarray.DataArray` is saved using the `peaks` `save` and `load` functions:

In [9]:
merged_disp_k.save('test_save.nc')

In [10]:
re_loaded_disp = pks.load('test_save.nc')

In [11]:
re_loaded_disp.history()

Unnamed: 0,time,peaks version,record,fn_name
0,2024-08-19T16:00:16.093411,0.1.0-dev,"{'record': 'Merged 210326_GM2-667_GK_3 & 210326_GM2-667_GK_2 & 210326_GM2-667_GK_1 along theta_par with offsets [0, 12, 24] with data cropped to theta_par=slice(-10, 10, None).', 'original scan 0 analysis history': [{'time': '2024-08-19T16:00:15.774714', 'peaks version': '0.1.0-dev', 'record': {'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_1.xy'}, 'fn_name': '_load_single_data'}], 'original scan 1 analysis history': [{'time': '2024-08-19T16:00:15.901739', 'peaks version': '0.1.0-dev', 'record': {'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_2.xy'}, 'fn_name': '_load_single_data'}], 'original scan 2 analysis history': [{'time': '2024-08-19T16:00:16.026832', 'peaks version': '0.1.0-dev', 'record': {'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_3.xy'}, 'fn_name': '_load_single_data'}]}",merge_data
1,2024-08-19T16:00:16.137172,0.1.0-dev,EF_correction set from automatic estimation of Fermi level to: 16.82 eV.. NB may not be accurate.,_add_estimated_EF
2,2024-08-19T16:00:17.341779,0.1.0-dev,"Converted to k-space using the following parameters: Angles: {'polar': 0, 'tilt': 0, 'azi': 0, 'ana_polar': 0, 'defl_par': 0, 'defl_perp': 0, 'norm_polar': 0, 'norm_tilt': 0, 'norm_azi': 0}, Time taken: 1.23s.",k_convert
3,2024-08-19T16:00:17.401047,0.1.0-dev,Data saved as a NetCDF file to test_save.nc.,save
4,2024-08-19T16:00:17.509455,0.1.0-dev,"{'record': 'Data loaded', 'loc': 'NetCDF', 'loader': '_load_NetCDF_data', 'file_name': 'test_save.nc'}",_load_single_data


## Manually adding analysis history
Most `peaks`-specific functions aim to add a related history record, which is one reason to use `peaks`-specific functions where they are otherwise only thin wrappers around e.g. existing :class:`xarray` functions. However operating on a :class:`xarray.DataArray` with e.g. a built in :class:`xarray` method or other non `peaks`-specific function will not lead to the analysis record being updated. In such cases, you should manaully update the analysis record using `.history.add()`

In [12]:
# Manually change one of the dispersions
disp = disps[0] * 10

# Add the history record
disp.history.add('Data intensity multiplied by 10', fn_name='Manual record')

In [13]:
disp.history()

Unnamed: 0,time,peaks version,record,fn_name
0,2024-08-19T16:00:15.774714,0.1.0-dev,"{'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_1.xy'}",_load_single_data
1,2024-08-19T16:00:17.521422,0.1.0-dev,Data intensity multiplied by 10,Manual record


This methodology can be used within a custom function, where `peaks` will attempt to also capture the name of the calling function. The data can either be updated in place (default) or a copy of the data with modified metadata can be returned using the argument `update_in_place=False`, to aid in chaining methods together.

:::{warning}
It is easy to accidentally update the analysis history of the underlying :class:`xarray.DataArray` that was passed to the function. This should be carefully checked to ensure the desired operation.
:::

In [14]:
def add_one(data):
    data = data.history.add('Added one to the original data', update_in_place=False)
    return data + 1

In [15]:
new_disp = add_one(disp)

In [16]:
new_disp.history()

Unnamed: 0,time,peaks version,record,fn_name
0,2024-08-19T16:00:15.774714,0.1.0-dev,"{'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_1.xy'}",_load_single_data
1,2024-08-19T16:00:17.521422,0.1.0-dev,Data intensity multiplied by 10,Manual record
2,2024-08-19T16:00:17.553372,0.1.0-dev,Added one to the original data,add_one


For simple cases like the above, where e.g. no parameters to pass to the history string need to be determined during the function execution, a decorator method can be used:

In [17]:
from peaks.utils.metadata import update_history_decorator

@update_history_decorator('Added two to the data')
def add_two(data):
    return data + 2

In [18]:
new_disp = add_two(disp)

In [19]:
new_disp.history()

Unnamed: 0,time,peaks version,record,fn_name
0,2024-08-19T16:00:15.774714,0.1.0-dev,"{'record': 'Data loaded', 'loc': 'StA-Phoibos', 'loader': '_load_StA_Phoibos_data', 'file_name': '/Users/pdk6/Library/CloudStorage/OneDrive-UniversityofStAndrews/Lab/Example data/210326_GM2-667_GK_1.xy'}",_load_single_data
1,2024-08-19T16:00:17.521422,0.1.0-dev,Data intensity multiplied by 10,Manual record
2,2024-08-19T16:00:17.568132,0.1.0-dev,Added two to the data,add_two
