# Random Data Simulation and Fitting Isotopomer Distribution Using Neural Network

Import necessary packages:

In [7]:
import numpy as np
import pandas as pd
from metabolabpytools import isotopomerAnalysis

Create an isotopomerAnalysis object:

In [9]:
ia = isotopomerAnalysis.IsotopomerAnalysis()

Define metabolite parameters:

In [8]:
# Ensure isotopomers is correctly initialized
isotopomers = [
    [0, 0, 0],  # Unlabelled
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
    [1, 1, 0],
    [1, 0, 1],
    [0, 1, 1],
    [1, 1, 1]
]

num_samples = 1000
hsqc = [0, 1, 1]
metabolite = 'L-LacticAcid'


In [10]:
ia.init_metabolite_multiple_samples(metabolite, hsqc, num_samples=num_samples)

Initialising and set isoptomer, HSQC and gcms data for multiple samples:

In [11]:
generated_percentages = []
for exp_index in range(num_samples):
    random_percentages = ia.generate_isotopomer_percentages()  # Generate new random percentages for each sample
    generated_percentages.append(random_percentages)  # Store generated percentages for comparison
    
    ia.set_fit_isotopomers_simple(metabolite=metabolite, isotopomers=isotopomers, percentages=random_percentages, exp_index=exp_index)
    ia.sim_hsqc_data(metabolite=metabolite, exp_index=exp_index, isotopomers=isotopomers, percentages=random_percentages)
    ia.sim_gcms_data(metabolite, exp_index)

Add noise to HSQC and GC-MS data:

In [12]:
ia.add_noise_to_hsqc_gcms(metabolite, num_samples, hsqc_noise_level=0.03, gcms_noise_level=0.075)

Modify object states for the data:

In [13]:
ia.use_hsqc_multiplet_data = True
ia.use_gcms_data = True
ia.use_nmr1d_data = False

Fitting the neural network:

In [14]:
ia.fit_data_nn(metabolite=metabolite, fit_isotopomers=isotopomers, percentages=generated_percentages, num_samples=num_samples)

Reloading Tuner from my_dir\isotopomer_analysis\tuner0.json
Epoch 1/100


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - loss: 553.5529 - val_loss: 407.7870
Epoch 2/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 179.3965 - val_loss: 59.4321
Epoch 3/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 45.5851 - val_loss: 32.4372
Epoch 4/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 24.8297 - val_loss: 25.6773
Epoch 5/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 22.5848 - val_loss: 22.8123
Epoch 6/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 19.2962 - val_loss: 19.0725
Epoch 7/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 16.2192 - val_loss: 15.8382
Epoch 8/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 14.9940 - val_loss: 16.8301
Epoch 9/100
[1m29/29[0m [32m━━━━━━━━━━

## Addressing Overfitting: 

To prevent overfitting in my neural network model for predicting isotopomer distributions, several strategies have been implemented:

- First, use of a validation set to monitor the model's performance during training, ensuring it maintains its ability to generalize to unseen data has been used. This involves splitting the data into training and validation sets and using early stopping to halt training when the validation loss stops improving, which helps avoid overfitting by preventing the model from learning noise in the training data. 
 
- Additionally, dropout layers have been employed within the neural network architecture. Dropout randomly deactivates a fraction of neurons during each training step, which forces the network to learn more robust features and reduces reliance on any specific neurons. 

- Regularization techniques, such as L2 regularization, have been used to penalize large weights, discouraging the model from becoming too complex. 

- Finally, the model has been trained with an adequate amount of data (1000 samples), enhancing the model's ability to generalize.