---
title: Project Main
description: Project Overview page. A TOC.
cdt: 2024-09-11T11:11:25
---

The main landing page of the project. All projects are listed below.

In [1]:
%reload_ext autoreload
%autoreload 2

from project_toc.toc import build_toc
from pathlib import Path
from great_tables import GT
import duckdb as db

projects_path = "."
nbooks = list(Path(projects_path).glob("*"))
nbooks.remove(Path("main.ipynb"))
nbooks


[PosixPath('project_parafac2.ipynb')]

In [2]:
project_toc = build_toc(paths=nbooks)


In [3]:
# https://catppuccin.com/palette

(
    db.sql(
        """--sql
    SELECT
        status,
        cdt,
        title,
        project,
        link,
        description,
        conclusion,
        filename
    FROM
        project_toc
    """
    )
    .pl()
    .pipe(GT)
    .fmt_markdown("link")
    .opt_stylize(style=3, color="gray")
    .tab_options(
        table_background_color="#363a4f", table_font_color="#cad3f5", table_font_size=1
    )
)


status,cdt,title,project,link,description,conclusion,filename
open,1970-01-01 00:00:00.000000000,PARAFAC2,parafac2,link,efforts to apply PARAFAC2 to decompose multiway data,,project_parafac2


## TODO:
- [ ] admin
  - [x] extract offset calculator
  - [x] check that 'dataset_EDA' project is  clean and orderly.
  - [ ] create logs under the new system.
  - [x] github repo.
  - [ ] move previous projects into this newly created hierarchy.
  - [x] move this todo to TOC root.
- [x] dataset division
  - [x] divide datasets into raw and cuprac, in wide form - string number column names?
- raw dataset
  - [x] cleaning
    - [x] outlier removal
      - [x] for each of the following, make note of outliers, storing in a table
        - [x] time outliers
        - [x] absorbance outliers
    - [x] dimension unification
      - [x] wavelength unification
        - [x] describe wavelength distribution in terms of samples and any intersting features within the 400 - 600 nm range
        - [x] amend nm_254 to the common wavelength range - smaller than 400 would be great.
      - [x] time unification
        - [x] save offset correction to nm_254
        - [x] identify common time cutoff, i.e a time point common to all samples beyond which no relvant features exist
        - [x] replace all sample times with a common 2.5Hz time running from 0 to the time identified above
  - [x] xarray migration
    - [x] once 'cleaning' and 'dataset division' are done, complete migration of data to xarray
- [ ] PARAFAC2:
  - [ ] test PARAFAC2 on a sample set.
- Fix missing CT metadata. See [Samples Missing CT Metadata](../experiments/samples_missing_ct_metadata.ipynb)


# Notes on Dataset

2024-09-12T14:52:28

The dataset is a little bit.. unorganised. Notes on that are even more scattered. I will start constructing useful notes regarding the topic here.

2024-09-12T14:53:07 - 3 of the raw samples from the wine deg study are at "/Users/jonathan/uni/0_jono_data/wine-deg-study/raw_uv/ambient" but have been included in the database to bring the total to 104 'raw' samples.

2024-09-13T15:58:37 - All detections are in mau units.

2024-09-16T15:07:14 - need to reconstruct the database so its all in main, with a primary key based on a composite of 'st.pk' and 'chm.pk' representing every `chm.pk`th **sampling** of each `st.pk` sample.

# Notes without a Project

a method of finding lost notes

In [4]:
notes_toc = build_toc(paths=Path("../experiments/").glob("*.ipynb"))


In [5]:
(
    db.sql(
        """--sql
select
    title,
    link,
    filename,
from
    notes_toc
where
    project IS NULL
"""
    )
    .pl()
    .pipe(GT)
    .fmt_markdown("link")
    .opt_stylize(style=3, color="gray")
    .tab_options(
        table_background_color="#363a4f", table_font_color="#cad3f5", table_font_size=1
    )
)


title,link,filename
,link,zhang_gcms_experiment_2024-08-30
,link,tlviz_parafac_tutorial_2024-08-20
,link,outlier_detection_2024-08-15
,link,zhang_gcms_parafac2_reproduction_2024-08-30
,link,tensorly_parafac2_tutorial_2024-08-24
"Meaning of Factors, Scaling and Reconstruction",link,scaling_and_reconstruction
,link,parafac2_tensorly_demonstration_2024-08-30
Zhang GC-MS PARAFAC2 with Scaling and Centering,link,zhang_gcms_experiment_standardscaled_2024-09-02
,link,parafac2_experiment_2024-08-24
