# Product correlations
An obvious question to ask in exploratory data analysis: Is there a correlation between the different products.
E.g. do we get product C in the cases where we do not get A?

In [None]:
import pathlib
import sys

sys.path.append(str(pathlib.Path().resolve().parents[1]))

import pandas as pd
import scipy

from src.definitions import DATA_DIR

In [None]:
# get the dataset
df = pd.read_csv(DATA_DIR / "curated_data" / "synferm_dataset_2023-12-20_39486records.csv")
df.head()

## Correlation between products

In [None]:
scipy.stats.pearsonr(df["binary_A"], df["binary_C"])

In [None]:
scipy.stats.pearsonr(df["binary_A"], df["binary_B"])

In [None]:
scipy.stats.pearsonr(df["binary_B"], df["binary_C"])

In [None]:
scipy.stats.pearsonr(df["scaled_A"], df["scaled_C"])

In [None]:
scipy.stats.pearsonr(df["scaled_A"], df["scaled_B"])

In [None]:
scipy.stats.pearsonr(df["scaled_B"], df["scaled_C"])

## Conclusion
It seems the formation of A and B is moderately correlated (which makes some sense as B is an intermediate en route to A).
A and C are weakly correlated (probably this is the result of two opposing tendencies: Formation of A depletes mutual intermediate B, leading to a negative correlation, but formation of A and C is confounded by formation of B, leading to a positive correlation) 