Continuing from [Identifying Optimal Wavelength](2023-03-14_identifying_optimal_wavelength.ipynb), I will use the methods developed there to aggregate the results for all 2.5% avantor runs thus far. Although presumably, this method should work for all methods.

## Set up Environment

In [7]:
%load_ext autoreload
%autoreload 2

import sys

import os

import pandas as pd

import numpy as np

from scipy.signal import find_peaks

pd.options.plotting.backend = 'plotly'

import plotly.graph_objs as go

from plotly.subplots import make_subplots

from sklearn.preprocessing import MinMaxScaler

from pybaselines import Baseline

# adds root dir 'wine_analyis_hplc_uv' to path.

sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '../')))

from agilette import agilette_core as ag

lib = ag.Agilette('/Users/jonathan/0_jono_data').library

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
/Users/jonathan/0_jono_data/2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_HALO.D/acq.macaml does not exist, cannot load signal metadata from acq.macaml
/Users/jonathan/0_jono_data/2023-01-23_WINE_TEST_GRAD_4.D/acq.macaml does not exist, cannot load signal metadata from acq.macaml
/Users/jonathan/0_jono_data/2022-08-01_CAFFEINE_STANDARD_100PPM.D/acq.macaml does not exist, cannot load signal metadata from acq.macaml


In [8]:
lib_df = lib.data_table()

As in the leading in notebook, I will use the latest De Bertoli Cab Merlot sample `2023-03-07_DEBERTOLI_CS_001.D`.

## Planning the Experiment

The way to do this is to stay within a DataFrame environment.

1. Form a a DF of:
run name | uv_data object.
2. for each run: scale, baseline adjust, calculate average baseline gradient and peak heights, get the ratio. 
3. Plot the maxima of the above values for each run. Probably drop after 380nm. 

## Filtering Runs

In [31]:
runs = lib_df[(lib_df['method'].str.contains("2_1*")) & ~(lib_df['sample_name'].str.contains("uracil*"))]
runs.head()

Unnamed: 0,acq_date,sample_name,run_name,path,sequence,ch_files,uv_files,method,desc
3,2023-03-07 13:08:39,debertoli_cs,2023-03-07_DEBERTOLI_CS_001.D,/Users/jonathan/0_jono_data/2023-03-07_DEBERTO...,single run,"[DAD1D.ch, DAD1E.ch, DAD1A.ch, DAD1F.ch, DAD1B...",[DAD1.UV],AVANTOR100X4_6C18-H2O-MEOH-2_1.M,avantor-150-x-4.6-C18-H2O-MEOH-2.1%--gradient
10,2023-02-23 12:21:12,2021-debortoli-cabernet-merlot_avantor,2023-02-23_2021-DEBORTOLI-CABERNET-MERLOT_AVAN...,/Users/jonathan/0_jono_data/2023-02-23_2021-DE...,single run,"[DAD1D.ch, DAD1E.ch, DAD1A.ch, DAD1F.ch, DAD1B...",[DAD1.UV],AVANTOR100X4_6C18-H2O-MEOH-2_1.M,avantor-150-x-4.6-C18-H2O-MEOH-2.1%--gradient
11,2023-02-23 11:25:03,lor-ristretto,2023-02-23_LOR-RISTRETTO.D,/Users/jonathan/0_jono_data/2023-02-23_LOR-RIS...,single run,"[DAD1D.ch, DAD1E.ch, DAD1A.ch, DAD1F.ch, DAD1B...",[DAD1.UV],AVANTOR100X4_6C18-H2O-MEOH-2_1.M,avantor-150-x-4.6-C18-H2O-MEOH-2.1%--gradientp...
12,2023-02-22 17:39:06,2021-debortoli-cabernet-merlot_avantor,2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_AVAN...,/Users/jonathan/0_jono_data/2023-02-22_2021-DE...,single run,"[DAD1D.ch, DAD1E.ch, DAD1A.ch, DAD1F.ch, DAD1B...",[DAD1.UV],AVANTOR100X4_6C18-H2O-MEOH-2_1.M,halo-150-x-4.6-C18-H2O-MEOH-2.1%--gradient
13,2023-02-22 16:09:15,2021-debortoli-cabernet-merlot_halo,2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_HALO.D,/Users/jonathan/0_jono_data/2023-02-22_2021-DE...,single run,"[DAD1D.ch, DAD1E.ch, DAD1A.ch, DAD1F.ch, DAD1B...",[DAD1.UV],HALO150X4_6C18-H2O-MEOH-2_1.M,halo-150-x-4.6-C18-H2O-MEOH-2.1%--gradient


## Prepare the Data

### Assemble the runs_uv_data DF.

In [32]:
all_data = lib.all_data()

for name in runs['run_name']:
    print(name)
    data_dir = all_data[name]
    spectrum = data_dir.load_spectrum()
    data = spectrum.data
    print(data.head())


/Users/jonathan/0_jono_data/2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_HALO.D/acq.macaml does not exist, cannot load signal metadata from acq.macaml
/Users/jonathan/0_jono_data/2023-01-23_WINE_TEST_GRAD_4.D/acq.macaml does not exist, cannot load signal metadata from acq.macaml
/Users/jonathan/0_jono_data/2022-08-01_CAFFEINE_STANDARD_100PPM.D/acq.macaml does not exist, cannot load signal metadata from acq.macaml
2023-03-07_DEBERTOLI_CS_001.D
2023-03-07_DEBERTOLI_CS_001.D
/Users/jonathan/0_jono_data/2023-03-07_DEBERTOLI_CS_001.D/sample.acaml
/Users/jonathan/0_jono_data/2023-03-07_DEBERTOLI_CS_001.D/ACQRES.REG
/Users/jonathan/0_jono_data/2023-03-07_DEBERTOLI_CS_001.D/acq.txt
/Users/jonathan/0_jono_data/2023-03-07_DEBERTOLI_CS_001.D/LCDIAG.REG
/Users/jonathan/0_jono_data/2023-03-07_DEBERTOLI_CS_001.D/RUN.LOG
/Users/jonathan/0_jono_data/2023-03-07_DEBERTOLI_CS_001.D/DAD1D.ch
/Users/jonathan/0_jono_data/2023-03-07_DEBERTOLI_CS_001.D/SINGLE.B
/Users/jonathan/0_jono_data/2023-03-07_DEBERTOLI_CS

AttributeError: 'NoneType' object has no attribute 'head'

In [21]:
def uv_data_extractor(column):
    data_dir = all_data[column]
    #spectrum = data_dir.load_spectrum()

    return data_dir

uv_data_series = runs['run_name'].apply(uv_data_extractor)

2023-03-07_DEBORTOLI-URACIL_001.D
2023-03-07_DEBERTOLI_CS_001.D
2023-02-23_2021-DEBORTOLI-CABERNET-MERLOT_AVANTOR.D
2023-02-23_LOR-RISTRETTO.D
2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_AVANTOR.D
2023-02-22_2021-DEBORTOLI-CABERNET-MERLOT_HALO.D
2023-02-22_STONEY-RISE-PN_02-21.D
2023-02-22_CRAWFORD-CAB_02-21.D
2023-02-22_HEY-MALBEC_02-21.D
2023-02-22_KOERNER-NELLUCIO-02-21.D
2023-02-22_LOR-RISTRETTO.D
006-0601.D
005-0501.D
004-0401.D
003-0301.D
002-0201.D
001-0101.D
2023-02-16_0232.D
2023-02-16_0291.D
017-1601.D
016-1501.D
015-1401.D
014-1301.D
013-1201.D
012-1101.D
011-1001.D
009-0901.D
008-0801.D
007-0701.D
006-0601.D
005-0501.D
004-0401.D
003-0301.D
002-0201.D
001-0101.D
2023-02-15_COFFEE_COLUMN_CHECK_2.D
2023-02-15_COFFEE_COLUMN_CHECK.D
2023-02-14_0052_TESTING_COLUMN_FOR_SAMPLE_DEG.D
2023-02-08_16-05-13_Z3.D
030-1701.D
029-1601.D
028-1501.D
027-1401.D
026-1301.D
020-1201.D
019-1101.D
018-1001.D
017-0901.D
016-0801.D
015-0701.D
010-0601.D
009-0501.D
008-0401.D
007-0301.D
006-0201.D
00

In [20]:
uv_data_series.__dict__

{'_is_copy': None,
 '_mgr': SingleBlockManager
 Items: Int64Index([ 0,  3, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
             25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
             42, 43, 44, 45, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
             62, 63, 64, 65, 66, 67, 68, 69, 70],
            dtype='int64')
 ObjectBlock: 60 dtype: object,
 '_item_cache': {},
 '_attrs': {},
 '_flags': <Flags(allows_duplicate_labels=True)>,
 '_name': 'run_name',
 'str': <pandas.core.strings.accessor.StringMethods at 0x2848442d0>}

In [None]:
runs_uv = runs

runs_uv['uv_data'] = uv_data_series

runs_uv.drop(['uv_files', 'sequence', 'ch_files', 'sample_name', 'desc'], inplace = True, axis = 1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  runs_uv['uv_data'] = uv_data_series
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  runs_uv.drop(['uv_files', 'sequence', 'ch_files', 'sample_name', 'desc'], inplace = True, axis = 1)


In [None]:
runs_uv['uv_data'][0]

NameError: name 'runs_uv' is not defined

### Scale

### Baseline Correct