---
title: Samples Per Acquisition Method
cdt: 2024-09-06T12:23:02
description: "An obervation of the number of samples, detection types, etc. over the acquision methods"
project: "dataset_EDA"
---

## Prepare All Metadata

In [None]:
import duckdb as db
import polars as pl

db_path = "/Users/jonathan/mres_thesis/wine_analysis_hplc_uv/wines.db"

con = db.connect(db_path)


In [None]:
columns = con.sql(
    """
    select
        table_name,
        column_name
    FROM
        information_schema.columns
    WHERE
        table_name = 'c_chemstation_metadata'
    OR
        table_name = 'chromatogram_spectra_long'
    OR
        table_name = 'c_sample_tracker'
    """
).pl()
columns.head()


In [None]:

for x in columns.filter(pl.col('table_name').eq('c_sample_tracker')).get_column('column_name').to_list():
    print(f"{x},")


In [None]:
mta_st = con.sql(
    """
    SELECT
        mta.path,
        mta.ch_samplecode,
        mta.acq_date,
        mta.acq_method,
        mta.unit,
        mta.signal,
        mta.vendor,
        mta.inj_vol,
        mta.seq_name,
        mta.seq_desc,
        mta.id,
        mta.desc,
        mta.join_samplecode,
        st.detection,
        st.sampler,
        st.samplecode,
        st.vintage,
        st.name,
        st.open_date,
        st.sampled_date,
        st.added_to_cellartracker,
        st.notes,
        st.size,
        st.ct_wine_name
    FROM
        c_chemstation_metadata mta
    JOIN
        c_sample_tracker st
    ON
        mta.join_samplecode = st.samplecode
    ORDER BY
        mta.acq_date
    """
)
mta_st.pl().head()


Now we can do some descriptive stats!


## Samples and Detection Type Per Method


In [None]:
con.sql(
    """
    SELECT
    dense_rank() OVER (ORDER BY first_acq_date ASC) as rank,
    *
    FROM
        (select
            first(acq_date) as first_acq_date,
            last(acq_date) as last_acq_date,
            acq_method,
            detection,
            count(id),

        from
            mta_st
        GROUP BY ALL
        ORDER BY
            last_acq_date desc)
    

    """
).pl()


As we can see there are 5 distinct methods. All "avantor" methods were performed on a 100 x 4.6 mm, C18 column with a H:20:MeOH mobile phase. Ordered by last date of use:

1. "avantor100x4_6c18-h2o-meoh-2_1.m": 30 samples.
2. "avantor100x4_6c18-h2o-meoh-2_5.m"L 67 samples.
3. "avantor100x4_6c18-h2o-meoh-2_5_44-mins.m": 6 samples.
4. "halo150x4_6c18-h2o-meoh-2_1.m": 1 sample.
5. "0_cuprac_3_16_40-mins-4min100%hold.m": 71 samples.

So luckily it appears that all of the CUPRAC samples were performed on the same method. So that dataset should be consistent. The raw is more concerning.