---
cdt: 2024-09-10T16:44:02
title: "Creating Join Tables"
description: "Contains a discussion of and the code required to create join tables for 'clean.chm' to 'clean.st' and 'clean.st' to 'clean.ct."
project: dataset_EDA
---


In [3]:
%reload_ext autoreload
%autoreload 2
import duckdb as db
import polars as pl
from pca_analysis.experiments.constants import db_path
from great_tables import GT
pl.Config.set_tbl_rows(999).set_tbl_width_chars(2000).set_fmt_str_lengths(99999)

con = db.connect(db_path)


As we have a lot of potentially useful metadata, it is efficient to store the information in context-specific tables. The creation of a centralised join table containing the primary keys of each individual sample will be useful. It will have the id, chemstation metadata key, sample tracker key, and cellar tracker key.

This will require the creation of another set of keys methinks. prefixing the primary key columns with 'pk_' will make it clear what is what.

The 'id' column is the possessor of all individuals, so to speak. We would start there.

The flow is complicated because we're essentially extracting a number of columns from the tables placed during the 'build_library' pipeline into their own tables.

## Creation of Join Table CHM to ST

'join_samplecode' connects 'c_chemstation_metadata' and 'c_sample_tracker'. Note: 'join_samplecode' was manually added, and thus that connection is fragile without the code that added it. It is somewhere in the 'wine_analysis_hplc_uv' project, and will be added here at a later date. In the meantime, you are warned.

update: its stored in `wine_analysis_hplc_uv.etl.build_library.chemstation.ch_m_cleaner` and simply consists of some value replacement and formatting. Shouldn't be difficult to implement here. 'ch_samplecode' is the original samplecode as entered in the Chemstation data files, 'join_samplecode' is the cleaned version of the samplecode.

In [4]:
con.sql(
"""--sql
from clean.chm LIMIT 1
"""
).pl().columns


['pk',
 'id',
 'path',
 'acq_date',
 'acq_method',
 'unit',
 'signal',
 'vendor',
 'inj_vol',
 'seq_name',
 'seq_desc',
 'vialnum',
 'originalfilepath',
 'st_samplecode',
 'desc']

In [8]:
join_tbl = con.sql(
"""--sql
CREATE schema IF NOT EXISTS joins;
CREATE OR REPLACE TABLE joins.chm_st AS (
SELECT
    chm.pk as pk_mta,
    st.pk as pk_st,
FROM
    clean.chm as chm
JOIN
    clean.st as st
ON
    chm.st_samplecode = st.samplecode
);
SELECT * FROM joins.chm_st
"""
).pl()

join_tbl


pk_mta,pk_st
i32,i32
164,6
65,7
110,8
39,9
126,10
32,11
24,12
40,16
131,17
67,18


then the joining of mta and st becomes..

In [6]:
con.sql(
"""--sql
SELECT
    * EXCLUDE pk_st
FROM
    clean.st AS st
JOIN
    joins.mta_st AS jtbl
ON
    jtbl.pk_st = st.pk
JOIN
    c_chemstation_metadata as mta
ON
    jtbl.pk_mta = mta.pk
"""
).pl().head()


detection,sampler,samplecode,vintage,name,open_date,sampled_date,added_to_cellartracker,notes,size,ct_wine_name,pk,pk_mta,path,ch_samplecode,acq_date,acq_method,unit,signal,vendor,inj_vol,seq_name,seq_desc,vialnum,originalfilepath,id,desc,join_samplecode,pk_1
str,str,str,str,str,str,str,str,str,str,str,i32,i32,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,i32
"""raw""","""jonathan""","""05""","""2022""","""william downie 'cathedral' pinot noir""","""2023-02-04""",,"""y""",,"""750""","""2022 william downie cathedral""",5,55,"""/users/jonathan/uni/0_jono_data/mres_data_library/raw_uv/005.d""","""0051""","""2023-02-15 15:21:28""","""avantor100x4_6c18-h2o-meoh-2_1.m""","""mau""","""dad1i, dad: spectrum""","""agilent""","""10.00""","""2023-02-15_wines_2023-02-15_15-19-53""",,"""vial 1""","""c:\chem32\1\data\0_jono_data\2023-02-15_wines_2023-02-15_15-19-53""","""4fe49506-74e4-473b-b7a8-23500c472189""",,"""05""",55
"""raw""","""jonathan""","""06""","""2021""","""babo chianti""","""2023-02-04""",,"""y""",,"""750""","""2021 babo chianti""",6,164,"""/users/jonathan/uni/0_jono_data/mres_data_library/raw_uv/006.d""","""0061""","""2023-02-15 16:15:09""","""avantor100x4_6c18-h2o-meoh-2_1.m""","""mau""","""dad1i, dad: spectrum""","""agilent""","""10.00""","""2023-02-15_wines_2023-02-15_15-19-53""",,"""vial 2""","""c:\chem32\1\data\0_jono_data\2023-02-15_wines_2023-02-15_15-19-53""","""e56c4dcd-2847-4d34-b457-743be10b0608""",,"""06""",164
"""raw""","""jonathan""","""07""","""2020""","""uva non grata gamay""","""2023-02-04""",,"""y""",,"""750""","""2020 boutinot uva non grata""",7,65,"""/users/jonathan/uni/0_jono_data/mres_data_library/raw_uv/007.d""","""0071""","""2023-02-15 17:08:54""","""avantor100x4_6c18-h2o-meoh-2_1.m""","""mau""","""dad1i, dad: spectrum""","""agilent""","""10.00""","""2023-02-15_wines_2023-02-15_15-19-53""",,"""vial 3""","""c:\chem32\1\data\0_jono_data\2023-02-15_wines_2023-02-15_15-19-53""","""5eb3135c-33a2-404b-8042-e23cae33bccf""",,"""07""",65
"""raw""","""jonathan""","""08""","""2021""","""hey malbec""","""2023-02-04""",,"""y""",,"""750""","""2021 matias riccitelli malbec hey malbec!""",8,110,"""/users/jonathan/uni/0_jono_data/mres_data_library/raw_uv/008.d""","""0081""","""2023-02-15 18:02:36""","""avantor100x4_6c18-h2o-meoh-2_1.m""","""mau""","""dad1i, dad: spectrum""","""agilent""","""10.00""","""2023-02-15_wines_2023-02-15_15-19-53""",,"""vial 4""","""c:\chem32\1\data\0_jono_data\2023-02-15_wines_2023-02-15_15-19-53""","""8cfa23c8-ffa6-4c27-be70-0251d4de1681""",,"""08""",110
"""raw""","""jonathan""","""09""","""2018""","""crawford river cabernets""","""2023-02-01""",,"""y""",,"""750""","""2018 crawford river cabernets""",9,39,"""/users/jonathan/uni/0_jono_data/mres_data_library/raw_uv/009.d""","""0091""","""2023-02-15 18:56:22""","""avantor100x4_6c18-h2o-meoh-2_1.m""","""mau""","""dad1i, dad: spectrum""","""agilent""","""10.00""","""2023-02-15_wines_2023-02-15_15-19-53""",,"""vial 5""","""c:\chem32\1\data\0_jono_data\2023-02-15_wines_2023-02-15_15-19-53""","""38601b0b-5338-4154-9f04-cf85b8c48921""",,"""09""",39


## Join Table ST to CT

The connection between ST and CT is based on the wine name. This is however tenuous, and a perfect example of why a foreign key would be useful. In this case, first we'll proceed with the creation of a join table.