# Optimising 3d surface plots for .UV data

## Todos

The plots need to contain:

- [ ] a method of controling the size of the plot.
- [ ] title. 
- [x] x axis label.
- [x] y axis label.
- [x] z axis label.
- [x] legend.
- [ ] a method of controlling the range each axis.

## Set up software environment

In [None]:
import rainbow as rb

from pathlib import Path

import pandas as pd

%matplotlib inline  
from plotly.offline import init_notebook_mode
import cufflinks as cf

init_notebook_mode(connected=True)
cf.go_offline()

## Import Data

In [None]:
p = Path("/Users/jonathan/0_jono_data/2023-02-09_14-30-37_Z3.D/DAD1.UV")
p

In [None]:
uv_data = rb.agilent.chemstation.parse_uv(str(p))

"/Users/jonathan/002_wine_analysis_hplc_uv/2023-02-09_14-30-37_Z3_uv-data.csv"

uv_data.export_csv(
    "/Users/jonathan/002_wine_analysis_hplc_uv/2023-02-09_14-30-37_Z3_uv-data.csv"
)

In [None]:
uv_df = pd.read_csv(
    "/Users/jonathan/002_wine_analysis_hplc_uv/2023-02-09_14-30-37_Z3_uv-data.csv"
)

uv_df.info()

In [None]:
uv_df.head()

## Try with Cufflinks

In [None]:
wavelengths = ["RT (min)", "240", "260", "280", "300", "320"]

uv_df[wavelengths].iplot(kind="surface", title="test")

How doest hthe API behave if we set RT to be index?

Need to melt the df.

In [None]:
melt_uv_df = uv_df.melt(id_vars="RT (min)")

melt_uv_df.columns = ["RT (min)", "wavelength", "mAU"]

display(melt_uv_df.head())

melt_uv_df.info()

Which looks promising. To use plotting api's, wavelength needs to be in a numerical format.

In [None]:
melt_uv_df["wavelength"] = pd.to_numeric(melt_uv_df["wavelength"])

In [None]:
melt_uv_df["wavelength"].values

In [None]:
melt_uv_df.iplot(kind="surface", x="RT (min)", y="wavelength", z="mAU")

It is not able to plot the data in this format.

Try with plotly directly..

## Try with Plotly

## Generating a Simulated Dataset

It may be easier to diagnose problems if we ues a smaller, simulated dataset.

To produce a simulated long dataset, need a dict with 3 keys: "min", "nm", "mAU", each containing values of corresponding length. Say ~~10 data points each~~ need 100 data points for sufficient smoothness, 1 peak, absorbance is a ~~sine wave~~ normal distribution.

"min" ranges from 0 - 10.

"nm" ranges from 220 - 260.

absorbance ranges from 0 - 1500.

Been doing this wrong. Need to generate a seperate dataset for each wavelength. Easiest way to do it is to do it that way then melt as opposed to attempting to build a melted dataset from scratch.

In [None]:
# code lifted from https://stackoverflow.com/questions/10138085/how-to-plot-normal-distribution

import numpy as np
import scipy.stats as stats
import math


def norm_dist_curve_genner(total_mins=int, peak_max_factor=int, peak_loc=int):
    mu = peak_loc

    variance = 1

    sigma = math.sqrt(variance)

    # linspace args: start, stop, number of data points

    mins = np.linspace(0, total_mins, 100)

    absorb = stats.norm.pdf(mins, mu, sigma) * peak_max_factor

    # plt.plot(mins, absorb)

    # plt.show()

    return absorb


norm_dist_curve_genner(10, 4000, 2)[0:50]

In [None]:
import random

In [None]:
sim_data_dict = {}

for nm in wavelengths:
    peak_loc = random.randint(0, 10)

    peak_max_factor = random.randint(500, 1500)

    sim_data_dict[nm] = norm_dist_curve_genner(10, peak_max_factor, peak_loc)

sim_data_dict["mins"] = np.linspace(0, 10, 100)

sim_data_dict.keys()

I dont know how, but somehow 'RT (min)' is turning up in the dict. simplest course of action is to pop it:

In [None]:
sim_data_dict.pop("RT (min)")

sim_data_dict.keys()

Now to form the data into a long format:

In [None]:
sim_data_df = pd.DataFrame(sim_data_dict)

sim_data_df_melt = sim_data_df.melt(id_vars="mins", var_name="nm", value_name="mAU")

sim_data_df_melt["nm"] = pd.to_numeric(sim_data_df_melt["nm"])

sim_data_df_melt.set_index("mins")

I would much rather have line plots than surfaces, at least at the moment. So let's try to get that working first:

In [None]:
fig = px.line_3d(sim_data_df_melt, x="nm", y="mins", z="mAU", color="nm")

fig.update_layout(width=800, height=800)

fig

Now how about a bigger dataset?

In [None]:
melt_uv_df.columns = ["mins", "nm", "mAU"]

melt_uv_df

In [None]:
fig = px.line_3d(melt_uv_df, x="nm", y="mins", z="mAU", color="nm")

fig.update_layout(width=800, height=800)

fig

Which is a fantastic result. Lets run with that for now. Collate the code below:

In [None]:
# Set up the environment

import rainbow as rb

from pathlib import Path

import pandas as pd


from plotly.offline import init_notebook_mode

from plotly import express as px

# open a path to the data

p = Path("/Users/jonathan/0_jono_data/2023-02-09_14-30-37_Z3.D/DAD1.UV")

print(p)

# Convert the data to csv

uv_data = rb.agilent.chemstation.parse_uv(str(p))

"/Users/jonathan/002_wine_analysis_hplc_uv/2023-02-09_14-30-37_Z3_uv-data.csv"

uv_data.export_csv(
    "/Users/jonathan/002_wine_analysis_hplc_uv/2023-02-09_14-30-37_Z3_uv-data.csv"
)

# Read the csv (this and the preceeding could be excluded by either loading the data directly
# or moving them to a different module).

uv_df = pd.read_csv(
    "/Users/jonathan/002_wine_analysis_hplc_uv/2023-02-09_14-30-37_Z3_uv-data.csv"
)

display(uv_df.info())

display(uv_df.head())

# format the data for plotting

melt_uv_df = uv_df.melt(id_vars="RT (min)")

melt_uv_df.columns = ["mins", "nm", "mAU"]

melt_uv_df["nm"] = pd.to_numeric(melt_uv_df["nm"])

display(melt_uv_df.info())

display(melt_uv_df.head())

fig = px.line_3d(melt_uv_df, x="nm", y="mins", z="mAU", color="nm")

fig.update_layout(width=800, height=800)

display(fig)