# Exploring Rainbow API for use in Data class structures.

There are a few things I want to investigate:

- [x] passing the .UV data directly to other Python objects without exporting to `.csv` first.
- [x] What run metadata is available.

Once these are clarified we can explore how to best intregrate rainbow-api objects into my Data class.

In [1]:
import rainbow as rb

from pathlib import Path

p = "/Users/jonathan/0_jono_data/2023-02-07_18-30-07_Z3-ID-NM-ABS-MAX.D"

data = rb.read(str(p))

data

2023-02-07_18-30-07_Z3-ID-NM-ABS-MAX.D: DAD1D.ch - DAD1E.ch - DAD1A.ch - DAD1F.ch - DAD1B.ch - DAD1C.ch - DAD1.UV

So the DataDirectory objects contain:

- DataDirectory.name - name of the data directory .D.
- DataDirectory.datafiles -  a list of all the data files.
- DataDirectory.metadata - a dict of metadata including run date time and vial position.
- DataDirectory.get_info() outputs a text string with ALL the information and data. Method name can be gotten from there.

Regarding the method data, based on the source code for `parse_uv()`, I should expect to be able to access the method name, however the following:

In [2]:
data.metadata

{'vendor': 'Agilent', 'date': '07-Feb-23, 18:31:29', 'vialpos': 'Vial 41'}

does not contain the method name. Maybe try accessing the metadata specific to a .ch or .uv file?

In [3]:
p = "/Users/jonathan/0_jono_data/2023-02-07_18-30-07_Z3-ID-NM-ABS-MAX.D"

data = rb.read(str(p))

data

2023-02-07_18-30-07_Z3-ID-NM-ABS-MAX.D: DAD1D.ch - DAD1E.ch - DAD1A.ch - DAD1F.ch - DAD1B.ch - DAD1C.ch - DAD1.UV

In [4]:
data.get_file('DAD1A.ch').metadata

{'notebook': 'z3',
 'date': '07-Feb-23, 18:31:29',
 'method': 'AVANTOR100X4_6C18-H2O-MEOH-2_1-1.M',
 'instrument': 'Asterix ChemStation',
 'unit': 'mAU',
 'signal': 'DAD1A, Sig=240.0,4.0  Ref=off'}

So the method names are contained in the metadata of the individual signals. That's fine. We've essentially got the desired class heirarchy provided by rainbow then.

It would be useful to produe a table of all data within a given top-level directory, then access those files with rainbow to extract the desired queries, then return as tables, i.e.

```
with 0_jono_data as dir:

data_table = table(dir)

print(data_table(sample name, acq time, method, signals contained, run time..))
```

So lets try and action that.

In [5]:
top_dir = Path('/Users/jonathan/0_jono_data')

for obj in top_dir.iterdir():
    if obj.name.endswith(".D"):
        print(obj.name)

2023-02-15_COFFEE_COLUMN_CHECK.D
2023-02-09_14-59-17_NC1.D
2023-01-23_WINE_TEST_GRAD_2.D
2023-01-23_COFFEE-TEST_2.D
2023-02-23_2021-DEBORTOLI-CABERNET-MERLOT_AVANTOR.D
2023-02-09_14-30-37_Z3.D
2023-02-22_KOERNER-NELLUCIO-02-21.D
2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_HALO.D
2022-08-02_CAFFEINE_STANDARD_50PPM.D
2023-02-08_16-05-13_Z3.D
2023-02-01_Z3_SECOND_RUN_OF_THE_DAY.D
2023-01-23_PHENOL_TEST.D
2023-01-23_WINE_TEST_GRAD_1.D
2023-01-23_COFFEE-TEST-1.D
2023-02-14_0052_TESTING_COLUMN_FOR_SAMPLE_DEG.D
2023-01-23_WINE_TEST_GRAD_5.D
2023-02-01_Z3_FOURTH_RUN_OF_THE_DAY.D
2023-02-01_Z3_THIRD_RUN_OF_THE_DAY.D
2023-02-07_14-10-47_Z3-NO-FORMIC-ACID-1.D
2022-08-01_CAFFEINE_MOCCONA CLASSIC INSTANT.D
2023-02-07_11-22-37_BLANK-4.D
2023-02-22_STONEY-RISE-PN_02-21.D
2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_AVANTOR.D
2023-02-01_Z3_FIRST_RUN_OF_DAY.D
2023-02-22_HEY-MALBEC_02-21.D
2023-02-22_LOR-RISTRETTO.D
2022-08-01_CAFFEINE_ISOCRATIC_LOR ESPRESSO RISTRETTO.D
2023-01-23_WINE_TEST_GRAD_4.D
2023-02-

Build it as a DF.


In [6]:
top_dir_d = {}

for obj in top_dir.iterdir():
    if obj.name.endswith(".D"):
        try:
            top_dir_d[obj.name] = rb.read(str(obj))
        except Exception as e:
            print(e)
            continue

In [7]:
def acq_method(data_directory):
    return data_directory.datafiles[0].metadata['method']

In [8]:
from datetime import datetime

datetime_obj = datetime.strptime(data.metadata['date'], "%d-%b-%y, %H:%M:%S")

top_dir_d = {}

top_dir_d["name"] = []
top_dir_d["data"] = []
top_dir_d["num_detect_files"] = []
top_dir_d["method"] = []
top_dir_d["acquisition_date"] = []

for obj in top_dir.iterdir():
    if obj.name.endswith(".D"):
        try:
        
            data = rb.read(str(obj))
            
            top_dir_d["name"].append("_".join(obj.name.split("_")[1:]))
            top_dir_d["data"].append(data)
            top_dir_d["num_detect_files"].append(len(data.datafiles))
            top_dir_d["method"].append(acq_method(data))
            top_dir_d["acquisition_date"].append(datetime.strptime(data.metadata['date'], "%d-%b-%y, %H:%M:%S"))
            
        
        except Exception as e:
            print(obj.name, e)

            continue

In [9]:
import pandas as pd

df = pd.DataFrame(top_dir_d, index = top_dir_d["name"])

df = df.set_index('name')

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 39 entries, COFFEE_COLUMN_CHECK.D to COFFEE_COLUMN_CHECK_2.D
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   data              39 non-null     object        
 1   num_detect_files  39 non-null     int64         
 2   method            39 non-null     object        
 3   acquisition_date  39 non-null     datetime64[ns]
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 1.5+ KB


In [99]:
#df = df.drop('name', axis = 1)

zeroth_col = list(df.columns).index('acquisition_date')
second_col = list(df.columns).index('method')
third_col = list(df.columns).index('num_detect_files')
fourth_col = list(df.columns).index('data')
try:
    print("hi")
    
    print(df.shape)

    df = df.iloc[:, [zeroth_col, second_col, third_col, fourth_col]]

except Exception as e:
    
    print(e)

df = df.sort_values(by = 'acquisition_date', ascending = False)

df.head()

hi
(39, 4)


Unnamed: 0_level_0,acquisition_date,method,num_detect_files,data
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-DEBORTOLI-CABERNET-MERLOT_AVANTOR.D,2023-02-23 12:22:35,AVANTOR100X4_6C18-H2O-MEOH-2_1.M,7,2023-02-23_2021-DEBORTOLI-CABERNET-MERLOT_AVAN...
LOR-RISTRETTO.D,2023-02-23 11:26:27,AVANTOR100X4_6C18-H2O-MEOH-2_1.M,7,2023-02-23_LOR-RISTRETTO.D: DAD1D.ch - DAD1E.c...
2021-DEBORTOLI-CABERNET-MERLOT_HALO.D,2023-02-22 16:10:39,HALO150X4_6C18-H2O-MEOH-2_1.M,7,2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_HALO...
2021-DEBORTOLI-CABERNET-MERLOT_AVANTOR.D,2023-02-22 16:10:39,AVANTOR100X4_6C18-H2O-MEOH-2_1.M,7,2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_AVAN...
STONEY-RISE-PN_02-21.D,2023-02-22 13:54:34,AVANTOR100X4_6C18-H2O-MEOH-2_1.M,7,2023-02-22_STONEY-RISE-PN_02-21.D: DAD1D.ch - ...


DF is looking good. Now how about data access?

In [63]:
datadir = df.loc['STONEY-RISE-PN_02-21.D']['data']

data_uv = datadir.get_file("DAD1.UV")

traces = data_uv.extract_traces()

import numpy as np

traces.shape

(106, 7800)

Where is the time axis?

In [64]:
datadir

2023-02-22_STONEY-RISE-PN_02-21.D: DAD1D.ch - DAD1E.ch - DAD1A.ch - DAD1F.ch - DAD1B.ch - DAD1C.ch - DAD1.UV

In [98]:
#help(data_uv)

In [76]:
xlabeldf = pd.DataFrame(data_uv.xlabels)
xlabeldf.max()

0    51.998117
dtype: float64

So it looks like the time is stored in the xlabels member object of the DataFile class.

So we currently have a 2d plane for the detector and a 2 1D vectors of time and wavelengths corresponding to the axes. First off, is it possible to parse a 2d numpy array in pandas?

In [96]:
try:
    data = data_uv.extract_traces().transpose()
    
    print(data.shape)
    
    test_df = pd.DataFrame(data = data, index = data_uv.xlabels, columns = data_uv.ylabels)
    
except Exception as e:
    
    print(e)

(7800, 106)


In [97]:
test_df

Unnamed: 0,190,192,194,196,198,200,202,204,206,208,...,382,384,386,388,390,392,394,396,398,400
0.004783,0.394896,0.223272,0.090152,0.006504,-0.044793,-0.090063,-0.118077,-0.141047,-0.153080,-0.139996,...,0.000067,-0.005752,-0.007331,-0.003703,-0.000328,-0.002198,-0.003271,-0.008263,-0.009201,-0.009894
0.011450,0.457026,0.265785,0.114553,0.018723,-0.039019,-0.087425,-0.117160,-0.140846,-0.153266,-0.139564,...,-0.001274,-0.006631,-0.007488,-0.003837,-0.001788,-0.003688,-0.004351,-0.008978,-0.009961,-0.011019
0.018117,0.490852,0.289537,0.128336,0.026114,-0.034921,-0.084884,-0.115931,-0.140145,-0.152536,-0.138029,...,-0.001833,-0.006482,-0.006661,-0.003450,-0.003137,-0.005223,-0.004932,-0.008598,-0.009656,-0.011221
0.024783,0.488669,0.288285,0.127658,0.026733,-0.033543,-0.082783,-0.114381,-0.138678,-0.150457,-0.135437,...,-0.001647,-0.005439,-0.005037,-0.002272,-0.003524,-0.006020,-0.004701,-0.007376,-0.008859,-0.011079
0.031450,0.456013,0.265568,0.114225,0.021137,-0.034891,-0.081643,-0.113003,-0.136927,-0.147834,-0.132523,...,-0.000753,-0.003785,-0.003047,-0.000432,-0.002824,-0.006080,-0.003971,-0.005849,-0.008114,-0.010729
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51.971450,-14.826342,-10.891229,-7.433586,-4.637338,-2.868906,-2.000816,-1.595862,-1.351558,-1.187801,-1.037024,...,-0.151180,-0.117801,-0.109866,-0.103481,-0.098765,-0.103727,-0.109009,-0.111237,-0.107683,-0.113040
51.978117,-14.845476,-10.904722,-7.441744,-4.641503,-2.871282,-2.002306,-1.596056,-1.351014,-1.187369,-1.037113,...,-0.151530,-0.118129,-0.109442,-0.102550,-0.099070,-0.105642,-0.111006,-0.112548,-0.108063,-0.112727
51.984783,-14.869943,-10.922022,-7.451832,-4.646540,-2.874672,-2.004273,-1.596279,-1.350418,-1.187079,-1.037128,...,-0.151724,-0.118345,-0.108898,-0.101291,-0.099532,-0.107653,-0.112221,-0.112802,-0.107586,-0.111960
51.991450,-14.898919,-10.942496,-7.463597,-4.652344,-2.878189,-2.005897,-1.596346,-1.349807,-1.186974,-1.037091,...,-0.151902,-0.118434,-0.108473,-0.100605,-0.100724,-0.109516,-0.112675,-0.112318,-0.106387,-0.111103


yes. done. Now we've got some basic functionality we should rebuild these as modules.