---
cdt: 2024-09-10T11:08:04
project: database_architecture
title: Database Architecture
description: establish an efficient and clean database architecture
conclusion:
status: open
---

# Database Architecture

date: 2024-09-10

It was decided that the current state of the database was insufficient for efficient EDA and downstream processes and thus the creation of a 'clean' schema containing tables corresponding to the 'c_chemstation_metadata', 'c_sample_tracker', and 'c_cellar_tracker' tables with their own primary keys, known as 'clean.chm', 'clean.st', and 'clean.ct', respectively. Following this join tables were constructed for 'clean.chm' $\rightarrow$ 'clean.st' and 'clean.st' $\rightarrow$ 'clean.ct'. Thus through the join tables and the 'clean.st' primary key (pk_st) a relationship could be formed between each of the metadata tables. See the notebooks linked below for more information and the code to create the state described.

## Notebooks

- [Creating the 'clean' Schema and Primary Keys](../experiments/creating_clean_schema.ipynb)
- [Creating Join Tables](../experiments/creating_join_tables.ipynb)

In [None]:
%reload_ext autoreload
%autoreload 2
from great_tables import GT
import duckdb as db
from pathlib import Path
from pca_analysis.constants import ROOT
from pca_analysis.experiments.toc import build_toc
import polars as pl

pl.Config.set_fmt_str_lengths(9999)

path = (ROOT / "experiments" / "notebooks" / "experiments")

notebooks = list(Path(path).glob("*.ipynb"))

toc = build_toc(notebooks)


In [None]:
(
db.sql("""--sql
SELECT
    project,
    cdt,
    title,
    description,
    conclusion,
    link,
    notes,
FROM
    toc
WHERE
    project = 'database_architecture'
ORDER BY
    cdt DESC
""")
.pl()
.pipe(GT)
.fmt_markdown('link')
.opt_stylize(style = 5, color='gray')
)
