---
title: "'cuprac' Dataset EDA"
project: cuprac_dataset_EDA
cdt: 2024-09-11T13:00:15
description: "EDA of the 'cuprac' dataset"
conclusion: ""
status: "open"
---

In [None]:
target_names = ["Injection Volume"]



rdict = {}
# Parse the XML file
tree = etree.parse(macaml)

# Find all "Parameter" elements
ns = {"acaml": "urn:schemas-agilent-com:acaml14"}
sections = tree.findall(".//acaml:Section", namespaces=ns)

# Iterate through the sections
for section in sections:
    section_name = section.find("acaml:Name", namespaces=ns)  # Adjust as needed
    parameters = section.findall(".//acaml:Parameter", namespaces=ns)
    print(parameters)

    # Iterate through the "Parameter" elements in this section
    for parameter in parameters:
        parameter_name = parameter.find("acaml:Name", namespaces=ns)
        parameter_value = parameter.find("acaml:Value", namespaces=ns)

        for target in target_names:
            if (
                section_name is not None
                and parameter_name is not None
                and parameter_value is not None
                and parameter_name.text == target
            ):

                rdict[target] = parameter_value.text

rdict


In [None]:
%reload_ext autoreload
%autoreload 2
from great_tables import GT, style
import duckdb as db
from pathlib import Path
from pca_analysis.constants import ROOT
from pca_analysis.experiments.toc import build_toc
import polars as pl

pl.Config.set_fmt_str_lengths(9999)

path = (ROOT / "experiments" / "notebooks" / "experiments")

notebooks = list(Path(path).glob("*.ipynb"))

toc = build_toc(notebooks)


In [None]:
(
db.sql("""--sql
SELECT
    cdt,
    title,
    description,
    conclusion,
    link,
    notes,
FROM
    toc
WHERE
    project = 'cuprac_dataset_EDA'
ORDER BY
    cdt DESC
""")
.pl()
.pipe(GT)
.fmt_markdown('link')
.opt_stylize(style = 3, color='gray')
.tab_options(
        table_background_color='#363a4f',
        table_font_color='#cad3f5',
        table_font_size=1
        
        )
)


# From Logs

The Following CUPRAC samples are noted as being bad due to too high injection volumes:


| \#  | id     | wine                                                    |
| --- | ------ | ------------------------------------------------------- |
| 0   | 128    | 2019 mount pleasant wines mount henry shiraz pinot noir |
| 1   | 161    | 2021 le juice fleurie fleurie gamay                     |
| 2   | 163    | 2015 yangarra estate shiraz mclaren vale                |
| 3   | 164    | 2015 yangarra estate old vine grenache                  |
| 4   | 165    | 2020 izway shiraz bruce                                 |
| 5   | ca0101 | 2021 yering station pinot noir                          |
| 6   | ca0301 | 2021 chris ringland shiraz                              |
|     |        |                                                         |

See log entry '2023-08-01_20230801_114918while_develop' for more information.
