---
cdt: 2024-09-13T08:36:53
title: Extracting Binary Pump Tables
description: Needed to get the solvent composition and timetable information from samples in order to calculate gradients. To do so I needed to create a parsing module. This notebook contains the execution code to extract those tables from the 'raw' dataset. 
conclusion: "A module 'bin_pump_to_db' was created, with associated tests. binary pump tables can now be found in the 'bin_pump' schema."
project: bin_pump_extraction
---


In [None]:
# environment

%reload_ext autoreload
%autoreload 2
from database_etl.etl.sql.raw_chm.bin_pumps_to_db import bin_pump_to_db

import duckdb as db
from database_etl.definitions import DB_PATH, DATA_DIR
from IPython.display import Markdown
import polars as pl

pl.Config.set_tbl_rows(99)
con = db.connect(DB_PATH)


# Bin Pump Tables to DB

Have created a module to extract the binary pump tables to the database.


In [None]:
# wine_deg_samples = Path(
#     "/Users/jonathan/uni/0_jono_data/wine-deg-study/raw_uv/ambient"
# ).glob("2023-04-21_A*.D")

paths = sorted(list((DATA_DIR / "raw_uv").glob("*.D")))


bin_pump_to_db(paths=paths, con=con, overwrite=True)

display(Markdown("# Reports"))

# show tables in db
con.sql(
    """--sql
show tables
"""
).pl().pipe(display)

con.sql(
    """--sql
select 'timetables' as tbl, count(distinct pk) as file_count from timetables
union
select 'solvcomps' as tbl, count(distinct pk) as file_count from solvcomps;
"""
).pl()


# show solvcomps table
con.sql(
    """--sql
SELECT
    *
FROM
    solvcomps
order by
    pk
LIMIT 3
"""
).pl().pipe(display)

# show timetables table
con.sql(
    """--sql
SELECT
    *
FROM
    timetables
order by
    pk
LIMIT 5
"""
).pl().pipe(display)

con.close()
del con


# Conclusion

The sample binary pump data has been successfully extracted. To use the tables, join id's to the primary 'id' table then use the 'tbl_num' key to get the data from the 'solvcomps' or 'timetables' tables.