In [None]:
from wine_analysis_hplc_uv import definitions
from wine_analysis_hplc_uv.modeling import pca
import matplotlib.pyplot as plt
import duckdb as db
import pandas as pd
import numpy as np

con = db.connect(definitions.DB_PATH)
pwine_data = pca.get_data(con)
plt1, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(12, 6))    
a = pca.plot_wine_data(pwine_data, ax=ax1)
b = pca.build_model(pwine_data, ax=ax2)
fig = pca.build_figure(pwine_data)
#fig.tight_layout()

## Investigation of Data

### Low Maxima Wines

The first question is what is going on with those wines with a very high amplitude and very low maxima. Lets pull those up.

In [None]:
low_max_wines = (pwine_data
 .max().reset_index(name='max').set_index('id_wine')
 .query('max<50')
 .reset_index()
 .pipe(lambda df: pd.concat([df['id_wine'].str.split("_-", expand=True).set_axis(['id', 'wine'], axis=1), df], axis=1))
 .drop('id_wine', axis=1)
 )
low_max_wines

For these wines, its not just a question of whether they are failed runs, because they may be somewhat salvagable with scaling. It depends more on the profile than the scale.

So we need to investigate the profiles, but also the injection volumes, which can be found in `chemstation_metadata`.

First get the samplecodes back from `id_wine` then join it to to `chemstation_metadata` to get the injection volume.

At the same time, plot each individually.

In [None]:
import seaborn as sns

fig, axs = plt.subplots(4,2, figsize = (12,8))

axs = axs.flatten()

for i, label in enumerate(low_max_wines.index):
    sns.lineplot(pwine_data[label], ax=axs[i])
    axs[i].set_ylabel('abs')
    axs[i].set_title(label)
    
#fig.legend(bbox_to_anchor=(0.5, -0.15), loc="upper center")
fig.tight_layout()

So, the following wines are invalid:


| #   | id     | wine                                                    |
|---|---|---|
|  0 | 128    | 2019 mount pleasant wines mount henry shiraz pinot noir |
|  1 | 161    | 2021 le juice fleurie fleurie gamay                     |
|  2 | 163    | 2015 yangarra estate shiraz mclaren vale                |
|  3 | 164    | 2015 yangarra estate old vine grenache                  |
|  4 | 165    | 2020 izway shiraz bruce                                 |
|  5 | ca0101 | 2021 yering station pinot noir                          |
|  6 | ca0301 | 2021 chris ringland shiraz                              |


As a list:

In [None]:
low_max_wines['id'].to_list()

In [None]:
low_max_wines['wine'].to_list()