# Exploring Rainbow API for use in Data class structures.

There are a few things I want to investigate:

- [x] passing the .UV data directly to other Python objects without exporting to `.csv` first.
- [x] What run metadata is available.

Once these are clarified we can explore how to best intregrate rainbow-api objects into my Data class.

In [None]:
import rainbow as rb

from pathlib import Path

p = "/Users/jonathan/0_jono_data/2023-02-07_18-30-07_Z3-ID-NM-ABS-MAX.D"

data = rb.read(str(p))

data

So the DataDirectory objects contain:

- DataDirectory.name - name of the data directory .D.
- DataDirectory.datafiles -  a list of all the data files.
- DataDirectory.metadata - a dict of metadata including run date time and vial position.
- DataDirectory.get_info() outputs a text string with ALL the information and data. Method name can be gotten from there.

Regarding the method data, based on the source code for `parse_uv()`, I should expect to be able to access the method name, however the following:

In [None]:
data.metadata

does not contain the method name. Maybe try accessing the metadata specific to a .ch or .uv file?

In [None]:
p = "/Users/jonathan/0_jono_data/2023-02-07_18-30-07_Z3-ID-NM-ABS-MAX.D"

data = rb.read(str(p))

data

In [None]:
data.get_file("DAD1A.ch").metadata

So the method names are contained in the metadata of the individual signals. That's fine. We've essentially got the desired class heirarchy provided by rainbow then.

It would be useful to produe a table of all data within a given top-level directory, then access those files with rainbow to extract the desired queries, then return as tables, i.e.

```
with 0_jono_data as dir:

data_table = table(dir)

print(data_table(sample name, acq time, method, signals contained, run time..))
```

So lets try and action that.

In [None]:
top_dir = Path("/Users/jonathan/0_jono_data")

for obj in top_dir.iterdir():
    if obj.name.endswith(".D"):
        print(obj.name)

Build it as a DF.


In [None]:
top_dir_d = {}

for obj in top_dir.iterdir():
    if obj.name.endswith(".D"):
        try:
            top_dir_d[obj.name] = rb.read(str(obj))
        except Exception as e:
            print(e)
            continue

In [None]:
def acq_method(data_directory):
    return data_directory.datafiles[0].metadata["method"]

In [None]:
from datetime import datetime

datetime_obj = datetime.strptime(data.metadata["date"], "%d-%b-%y, %H:%M:%S")

top_dir_d = {}

top_dir_d["name"] = []
top_dir_d["data"] = []
top_dir_d["num_detect_files"] = []
top_dir_d["method"] = []
top_dir_d["acquisition_date"] = []

for obj in top_dir.iterdir():
    if obj.name.endswith(".D"):
        try:
            data = rb.read(str(obj))

            top_dir_d["name"].append("_".join(obj.name.split("_")[1:]))
            top_dir_d["data"].append(data)
            top_dir_d["num_detect_files"].append(len(data.datafiles))
            top_dir_d["method"].append(acq_method(data))
            top_dir_d["acquisition_date"].append(
                datetime.strptime(data.metadata["date"], "%d-%b-%y, %H:%M:%S")
            )

        except Exception as e:
            print(obj.name, e)

            continue

In [None]:
import pandas as pd

df = pd.DataFrame(top_dir_d, index=top_dir_d["name"])

df = df.set_index("name")

df.info()

In [None]:
# df = df.drop('name', axis = 1)

zeroth_col = list(df.columns).index("acquisition_date")
second_col = list(df.columns).index("method")
third_col = list(df.columns).index("num_detect_files")
fourth_col = list(df.columns).index("data")
try:
    print("hi")

    print(df.shape)

    df = df.iloc[:, [zeroth_col, second_col, third_col, fourth_col]]

except Exception as e:
    print(e)

df = df.sort_values(by="acquisition_date", ascending=False)

df.head()

DF is looking good. Now how about data access?

In [None]:
datadir = df.loc["STONEY-RISE-PN_02-21.D"]["data"]

data_uv = datadir.get_file("DAD1.UV")

traces = data_uv.extract_traces()


traces.shape

Where is the time axis?

In [None]:
datadir

In [None]:
# help(data_uv)

In [None]:
xlabeldf = pd.DataFrame(data_uv.xlabels)
xlabeldf.max()

So it looks like the time is stored in the xlabels member object of the DataFile class.

So we currently have a 2d plane for the detector and a 2 1D vectors of time and wavelengths corresponding to the axes. First off, is it possible to parse a 2d numpy array in pandas?

In [None]:
try:
    data = data_uv.extract_traces().transpose()

    print(data.shape)

    test_df = pd.DataFrame(data=data, index=data_uv.xlabels, columns=data_uv.ylabels)

except Exception as e:
    print(e)

In [None]:
test_df

yes. done. Now we've got some basic functionality we should rebuild these as modules.