---
title: Correcting Offset
cdt: 2024-09-06T15:55:14
description: "Correct the time offset and express as secs for sample 254."
project: dataset_EDA
execution_order: '001'
---

# Summary

- correcting offset
  - the vast majority of samples possess a zeroth value time offset that can be corrected by subtracting the zeroth element from the time column ('mins') element-wise.


# Conclusion
 
'mins_corrected', and 'secs_corrected' written to table 'dataset_eda.nm_254'.

The time offset is the zeroth value. We can correct for it by subtracting it element-wise from the 'mins' column.

In [None]:
import duckdb as db
import polars as pl

db_path = "/Users/jonathan/mres_thesis/wine_analysis_hplc_uv/wines.db"

con = db.connect(db_path)


In [None]:
con.sql("corcondia_2024-08-29.ipynb")


In [None]:
con.sql(
    """
    select
        id,
        min(mins) as zeroth_mins
    from
        dataset_eda.nm_254
    group by
        id
    order by
        id, zeroth_mins
    """
).pl().plot.scatter(x='id',y='zeroth_mins', title="zeroth 'mins' against 'id'")


As we can see, without going into it too deeply, the offset appears essentially random and is to be corrected:


In [None]:
nm_254 = con.sql(
    """--sql
    SELECT
        
        mins - first(mins) OVER (PARTITION BY id ORDER BY id, idx) as mins_corrected,
        *
    FROM
        pbl.chromatogram_spectra_long
    WHERE
        wavelength = 254
    ORDER BY
        id,
        idx
    """
)

nm_254.pl()


In [None]:
nm_254 = con.sql(
    """--sql
    SELECT
        *,
        mins_corrected * 60 as secs_corrected
    FROM
        nm_254
    ORDER BY
        id,
        mins_corrected
    """

)

nm_254.pl()


To confirm that the mins has been corrected, correctly, find the first and last value of each sample

In [None]:
con.sql(
"""--sql
with min_mins AS (
    SELECT
        min(mins_corrected) OVER (PARTITION BY id ORDER BY id, mins) min_mins_corrected,
        first(mins_corrected) OVER (PARTITION BY id ORDER BY id, mins) min_first_corrected,
        
    FROM
        nm_254
),
test_min_mins AS (
    SELECT
        *,
        CASE WHEN min_mins_corrected = min_first_corrected THEN 'pass' WHEN min_mins_corrected != 0 THEN 'fail' ELSE 'fail' END AS test
    FROM
        min_mins

    )
SELECT
    *
FROM
    test_min_mins
WHERE
    test = 'fail'
"""
).pl()


In every sample, the first "mins" is the minimum, in this case, zero.

# Add corrected columns to nm_254

As the corrected mins col passes the test above, we can safely add 'mins_corrected' mins and 'secs_corrected' columns to 'nm_254'.


In [None]:
con.sql("""--sql
DESCRIBE dataset_eda.nm_254
""").pl()


In [None]:
display(con.sql("""--sql
--BEGIN TRANSACTION;
-- mins corrected
ALTER TABLE
    dataset_eda.nm_254
ADD COLUMN IF NOT ExISTS
    mins_corrected DOUBLE;
UPDATE  dataset_eda.nm_254
SET mins_corrected = (
        SELECT
            mins - first(mins) OVER (PARTITION BY id ORDER BY id, idx) as mins_corrected
        FROM
            dataset_eda.nm_254
    );

-- secs corrected
ALTER TABLE
    dataset_eda.nm_254
ADD COLUMN IF NOT EXISTS
    secs_corrected DOUBLE;
UPDATE
    dataset_eda.nm_254
SET secs_corrected = (
        SELECT
            mins_corrected / 60 as secs_corrected
        FROM
            dataset_eda.nm_254
    );
                
-- observe results
SELECT
    *
FROM
    dataset_eda.nm_254
""").pl().head())



# con.sql("""--sql
#         ROLLBACK
# """)


and verify within the database..

In [None]:
con.sql(
"""--sql
with min_mins AS (
    SELECT
        min(mins_corrected) OVER (PARTITION BY id ORDER BY id, mins) min_mins_corrected,
        first(mins_corrected) OVER (PARTITION BY id ORDER BY id, mins) min_first_corrected,
        
    FROM
        dataset_eda.nm_254
),
test_min_mins AS (
    SELECT
        *,
        (CASE
            WHEN
                min_mins_corrected = min_first_corrected
            THEN
                'pass'
            WHEN
                min_mins_corrected != 0
            THEN
                'fail'
            ELSE
                'fail'
            END)
            AS test
    FROM
        min_mins

    )
SELECT
    *
FROM
    test_min_mins
WHERE
    test = 'fail'
"""
).pl()


still passed! well done. Remember, this will have to be executed after [dataset_EDA](/Users/jonathan/mres_thesis/pca_analysis/pca_analysis/experiments/notebooks/experiments/dataset_description_wavelength_time.ipynb)
