# Random Data Simulation and Fitting Isotopomer Distribution Using Neural Network

Addtion of neural network to fit isotopomer distribution. We will complete this with the following workflow: 
- Create simple nn to take the place of the basic fit within the isotopomer class.
- Expand this to handle multiple samples, adding functions to generate distributions and sim data for different samples.
- Train and tune the network and add overfitting prevention measures.
- Generalise the network to handle different metabolites.
- Use our networks to fit to real HSQC and GCMS data

Import necessary packages:

In [None]:
import numpy as np
import pandas as pd
from metabolabpytools import isotopomerAnalysis

Create an isotopomerAnalysis object:

In [None]:
ia = isotopomerAnalysis.IsotopomerAnalysis()

Define metabolite parameters:

In [None]:
# Ensure isotopomers is correctly initialized
isotopomers = [
    [0, 0, 0],  # Unlabelled
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
    [1, 1, 0],
    [1, 0, 1],
    [0, 1, 1],
    [1, 1, 1]
]

num_samples = 1000
hsqc = [0, 1, 1]
metabolite = 'L-LacticAcid'


In [None]:
ia.init_metabolite_multiple_samples(metabolite, hsqc, num_samples=num_samples)

Initialising and set isoptomer, HSQC and gcms data for multiple samples:

In [None]:
generated_percentages = []
for exp_index in range(num_samples):
    random_percentages = ia.generate_isotopomer_percentages()  # Generate new random percentages for each sample
    generated_percentages.append(random_percentages)  # Store generated percentages for comparison
    
    ia.set_fit_isotopomers_simple(metabolite=metabolite, isotopomers=isotopomers, percentages=random_percentages, exp_index=exp_index)
    ia.sim_hsqc_data(metabolite=metabolite, exp_index=exp_index, isotopomers=isotopomers, percentages=random_percentages)
    ia.sim_gcms_data(metabolite, exp_index)

Add noise to HSQC and GC-MS data:

In [None]:
ia.add_noise_to_hsqc_gcms(metabolite, num_samples, hsqc_noise_level=0.03, gcms_noise_level=0.075)

Modify object states for the data:

In [None]:
ia.use_hsqc_multiplet_data = True
ia.use_gcms_data = True
ia.use_nmr1d_data = False

Fitting the neural network:

In [None]:
ia.fit_data_nn(metabolite=metabolite, fit_isotopomers=isotopomers, percentages=generated_percentages, num_samples=num_samples)

## Addressing Overfitting: 

To prevent overfitting in my neural network model for predicting isotopomer distributions, several strategies have been implemented:

- First, use of a validation set to monitor the model's performance during training, ensuring it maintains its ability to generalize to unseen data has been used. This involves splitting the data into training and validation sets and using early stopping to halt training when the validation loss stops improving, which helps avoid overfitting by preventing the model from learning noise in the training data. 
 
- Additionally, dropout layers have been employed within the neural network architecture. Dropout randomly deactivates a fraction of neurons during each training step, which forces the network to learn more robust features and reduces reliance on any specific neurons. 

- Regularization techniques, such as L2 regularization, have been used to penalize large weights, discouraging the model from becoming too complex. 

- Finally, the model has been trained with an adequate amount of data (1000 samples), enhancing the model's ability to generalize.

## Generalising for other metabolites:

In [2]:
import numpy as np
import pandas as pd
from metabolabpytools import isotopomerAnalysis

ia = isotopomerAnalysis.IsotopomerAnalysis()

# Define isotopomers for different metabolites
isotopomers_three_carbon = [
    [0, 0, 0],  # Unlabelled
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
    [1, 1, 0],
    [1, 0, 1],
    [0, 1, 1],
    [1, 1, 1]
]

isotopomers_aspartate = [
    [0, 0, 0, 0],  # Unlabelled
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
    [1, 1, 0, 0],
    [1, 0, 1, 0],
    [1, 0, 0, 1],
    [0, 1, 1, 0],
    [0, 1, 0, 1],
    [0, 0, 1, 1],
    [1, 1, 1, 0],
    [1, 1, 0, 1],
    [1, 0, 1, 1],
    [0, 1, 1, 1],
    [1, 1, 1, 1]
]

num_samples = 1000
hsqc_three_carbon = [[1, 1, 1], [0, 1, 1]]
hsqc_aspartate = [
    [0, 1, 1, 0], 
    [1, 1, 1, 0], 
    [0, 1, 1, 1], 
    [1, 1, 1, 1]
]

# Initialize the metabolites with multiple samples
ia.init_metabolite_multiple_samples(metabolites=['three-carbon', 'aspartate'], hsqc=hsqc_three_carbon, num_samples=num_samples)

# Generate and set isotopomers for three-carbon metabolite
for hsqc in hsqc_three_carbon:
    generated_percentages = []
    for exp_index in range(num_samples):
        random_percentages = ia.generate_isotopomer_percentages('three-carbon')
        generated_percentages.append(random_percentages)
        
        ia.set_fit_isotopomers_simple(metabolite='three-carbon', isotopomers=isotopomers_three_carbon, percentages=random_percentages, exp_index=exp_index)
        ia.sim_hsqc_data(metabolite='three-carbon', exp_index=exp_index, isotopomers=isotopomers_three_carbon, percentages=random_percentages)
        ia.sim_gcms_data('three-carbon', exp_index)

    ia.add_noise_to_hsqc_gcms('three-carbon', num_samples, hsqc_noise_level=0.03, gcms_noise_level=0.075)
    ia.fit_data_nn(metabolite='three-carbon', fit_isotopomers=isotopomers_three_carbon, percentages=generated_percentages, num_samples=num_samples, hsqc=hsqc, tuner_project_name=f'three_carbon_{hsqc}')

# Generate and set isotopomers for aspartate
for hsqc in hsqc_aspartate:
    generated_percentages = []
    for exp_index in range(num_samples):
        random_percentages = ia.generate_isotopomer_percentages('aspartate')
        generated_percentages.append(random_percentages)
        
        ia.set_fit_isotopomers_simple(metabolite='aspartate', isotopomers=isotopomers_aspartate, percentages=random_percentages, exp_index=exp_index)
        ia.sim_hsqc_data(metabolite='aspartate', exp_index=exp_index, isotopomers=isotopomers_aspartate, percentages=random_percentages)
        ia.sim_gcms_data('aspartate', exp_index)

    ia.add_noise_to_hsqc_gcms('aspartate', num_samples, hsqc_noise_level=0.03, gcms_noise_level=0.075)
    ia.fit_data_nn(metabolite='aspartate', fit_isotopomers=isotopomers_aspartate, percentages=generated_percentages, num_samples=num_samples, hsqc=hsqc, tuner_project_name=f'aspartate_{hsqc}')

# Save results to an Excel file
ia.save_results('results.xlsx')


Trial 200 Complete [00h 01m 47s]
val_loss: 20.53839683532715

Best val_loss So Far: 17.318084716796875
Total elapsed time: 02h 34m 27s
Epoch 1/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 31ms/step - loss: 276.7698 - val_loss: 270.1278
Epoch 2/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 259.3806 - val_loss: 261.0565
Epoch 3/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 249.4652 - val_loss: 249.9429
Epoch 4/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 241.4877 - val_loss: 245.6904
Epoch 5/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 238.3989 - val_loss: 243.3895
Epoch 6/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 236.5983 - val_loss: 241.0748
Epoch 7/100
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 242.3429 - val_loss: 239.4780
Epo

So let's generalize using hsqc vector length and adding 5 and 6 carbon:

In [4]:
import numpy as np
import pandas as pd
from metabolabpytools import isotopomerAnalysis

ia = isotopomerAnalysis.IsotopomerAnalysis()

# Define isotopomers for different metabolites
isotopomers_three_carbon = [
    [0, 0, 0],  # Unlabelled
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
    [1, 1, 0],
    [1, 0, 1],
    [0, 1, 1],
    [1, 1, 1]
]

isotopomers_four_carbon = [
    [0, 0, 0, 0],  # Unlabelled
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
    [1, 1, 0, 0],
    [1, 0, 1, 0],
    [1, 0, 0, 1],
    [0, 1, 1, 0],
    [0, 1, 0, 1],
    [0, 0, 1, 1],
    [1, 1, 1, 0],
    [1, 1, 0, 1],
    [1, 0, 1, 1],
    [0, 1, 1, 1],
    [1, 1, 1, 1]
]

isotopomers_five_carbon = [
    [0, 0, 0, 0, 0],  # Unlabelled
    [1, 0, 0, 0, 0],
    [0, 1, 0, 0, 0],
    [0, 0, 1, 0, 0],
    [0, 0, 0, 1, 0],
    [0, 0, 0, 0, 1],
    [1, 1, 0, 0, 0],
    [1, 0, 1, 0, 0],
    [1, 0, 0, 1, 0],
    [1, 0, 0, 0, 1],
    [0, 1, 1, 0, 0],
    [0, 1, 0, 1, 0],
    [0, 1, 0, 0, 1],
    [0, 0, 1, 1, 0],
    [0, 0, 1, 0, 1],
    [0, 0, 0, 1, 1],
    [1, 1, 1, 0, 0],
    [1, 1, 0, 1, 0],
    [1, 1, 0, 0, 1],
    [1, 0, 1, 1, 0],
    [1, 0, 1, 0, 1],
    [1, 0, 0, 1, 1],
    [0, 1, 1, 1, 0],
    [0, 1, 1, 0, 1],
    [0, 1, 0, 1, 1],
    [0, 0, 1, 1, 1],
    [1, 1, 1, 1, 0],
    [1, 1, 1, 0, 1],
    [1, 1, 0, 1, 1],
    [1, 0, 1, 1, 1],
    [0, 1, 1, 1, 1],
    [1, 1, 1, 1, 1]
]

isotopomers_six_carbon = [
    [0, 0, 0, 0, 0, 0],  # Unlabelled
    [1, 0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 0],
    [0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 1, 0],
    [0, 0, 0, 0, 0, 1],
    [1, 1, 0, 0, 0, 0],
    [1, 0, 1, 0, 0, 0],
    [1, 0, 0, 1, 0, 0],
    [1, 0, 0, 0, 1, 0],
    [1, 0, 0, 0, 0, 1],
    [0, 1, 1, 0, 0, 0],
    [0, 1, 0, 1, 0, 0],
    [0, 1, 0, 0, 1, 0],
    [0, 1, 0, 0, 0, 1],
    [0, 0, 1, 1, 0, 0],
    [0, 0, 1, 0, 1, 0],
    [0, 0, 1, 0, 0, 1],
    [0, 0, 0, 1, 1, 0],
    [0, 0, 0, 1, 0, 1],
    [0, 0, 0, 0, 1, 1],
    [1, 1, 1, 0, 0, 0],
    [1, 1, 0, 1, 0, 0],
    [1, 1, 0, 0, 1, 0],
    [1, 1, 0, 0, 0, 1],
    [1, 0, 1, 1, 0, 0],
    [1, 0, 1, 0, 1, 0],
    [1, 0, 1, 0, 0, 1],
    [1, 0, 0, 1, 1, 0],
    [1, 0, 0, 1, 0, 1],
    [1, 0, 0, 0, 1, 1],
    [0, 1, 1, 1, 0, 0],
    [0, 1, 1, 0, 1, 0],
    [0, 1, 1, 0, 0, 1],
    [0, 1, 0, 1, 1, 0],
    [0, 1, 0, 1, 0, 1],
    [0, 1, 0, 0, 1, 1],
    [0, 0, 1, 1, 1, 0],
    [0, 0, 1, 1, 0, 1],
    [0, 0, 1, 0, 1, 1],
    [0, 0, 0, 1, 1, 1],
    [1, 1, 1, 1, 0, 0],
    [1, 1, 1, 0, 1, 0],
    [1, 1, 1, 0, 0, 1],
    [1, 1, 0, 1, 1, 0],
    [1, 1, 0, 1, 0, 1],
    [1, 1, 0, 0, 1, 1],
    [1, 0, 1, 1, 1, 0],
    [1, 0, 1, 1, 0, 1],
    [1, 0, 1, 0, 1, 1],
    [1, 0, 0, 1, 1, 1],
    [0, 1, 1, 1, 1, 0],
    [0, 1, 1, 1, 0, 1],
    [0, 1, 1, 0, 1, 1],
    [0, 1, 0, 1, 1, 1],
    [0, 0, 1, 1, 1, 1],
    [1, 1, 1, 1, 1, 0],
    [1, 1, 1, 1, 0, 1],
    [1, 1, 1, 0, 1, 1],
    [1, 1, 0, 1, 1, 1],
    [1, 0, 1, 1, 1, 1],
    [0, 1, 1, 1, 1, 1],
    [1, 1, 1, 1, 1, 1]
]


num_samples = 1000
hsqc_three_carbon = [[1, 1, 1], [0, 1, 1]]
hsqc_four_carbon = [
    [0, 1, 1, 0], 
    [1, 1, 1, 0], 
    [0, 1, 1, 1], 
    [1, 1, 1, 1]
]
hsqc_five_carbon = [
    [0, 1, 1, 1, 0],
    [1, 1, 1, 1, 0],
    [0, 1, 1, 1, 1],
    [1, 1, 1, 1, 1]
]
hsqc_six_carbon = [
    [1, 1, 1, 1, 1, 1]
]

# Initialize the metabolites with multiple samples
ia.init_metabolite_multiple_samples(metabolites=['three-carbon', 'four_carbon', 'five_carbon', 'six_carbon'], hsqc=hsqc_three_carbon, num_samples=num_samples)

# Generate and set isotopomers for three-carbon metabolite
for hsqc in hsqc_three_carbon:
    generated_percentages = []
    for exp_index in range(num_samples):
        random_percentages = ia.generate_isotopomer_percentages('three-carbon')
        generated_percentages.append(random_percentages)
        
        ia.set_fit_isotopomers_simple(metabolite='three-carbon', isotopomers=isotopomers_three_carbon, percentages=random_percentages, exp_index=exp_index)
        ia.sim_hsqc_data(metabolite='three-carbon', exp_index=exp_index, isotopomers=isotopomers_three_carbon, percentages=random_percentages)
        ia.sim_gcms_data('three-carbon', exp_index)

    ia.add_noise_to_hsqc_gcms('three-carbon', num_samples, hsqc_noise_level=0.03, gcms_noise_level=0.075)
    ia.fit_data_nn(metabolite='three-carbon', fit_isotopomers=isotopomers_three_carbon, percentages=generated_percentages, num_samples=num_samples, hsqc=hsqc, tuner_project_name=f'three_carbon_{hsqc}')

# Generate and set isotopomers four carbon
for hsqc in hsqc_four_carbon:
    generated_percentages = []
    for exp_index in range(num_samples):
        random_percentages = ia.generate_isotopomer_percentages('four_carbon')
        generated_percentages.append(random_percentages)
        
        ia.set_fit_isotopomers_simple(metabolite='four_carbon', isotopomers=isotopomers_four_carbon, percentages=random_percentages, exp_index=exp_index)
        ia.sim_hsqc_data(metabolite='four_carbon', exp_index=exp_index, isotopomers=isotopomers_four_carbon, percentages=random_percentages)
        ia.sim_gcms_data('four_carbon', exp_index)

    ia.add_noise_to_hsqc_gcms('four_carbon', num_samples, hsqc_noise_level=0.03, gcms_noise_level=0.075)
    ia.fit_data_nn(metabolite='four_carbon', fit_isotopomers=isotopomers_four_carbon, percentages=generated_percentages, num_samples=num_samples, hsqc=hsqc, tuner_project_name=f'four_carbon_{hsqc}')

for hsqc in hsqc_five_carbon:
    generated_percentages = []
    for exp_index in range(num_samples):
        random_percentages = ia.generate_isotopomer_percentages('five_carbon')
        generated_percentages.append(random_percentages)
        
        ia.set_fit_isotopomers_simple(metabolite='five_carbon', isotopomers=isotopomers_five_carbon, percentages=random_percentages, exp_index=exp_index)
        ia.sim_hsqc_data(metabolite='five_carbon', exp_index=exp_index, isotopomers=isotopomers_five_carbon, percentages=random_percentages)
        ia.sim_gcms_data('five_carbon', exp_index)

    ia.add_noise_to_hsqc_gcms('five_carbon', num_samples, hsqc_noise_level=0.03, gcms_noise_level=0.075)
    ia.fit_data_nn(metabolite='five_carbon', fit_isotopomers=isotopomers_five_carbon, percentages=generated_percentages, num_samples=num_samples, hsqc=hsqc, tuner_project_name=f'five_carbon_{hsqc}')

for hsqc in hsqc_six_carbon:
    generated_percentages = []
    for exp_index in range(num_samples):
        random_percentages = ia.generate_isotopomer_percentages('six_carbon')
        generated_percentages.append(random_percentages)
        
        ia.set_fit_isotopomers_simple(metabolite='six_carbon', isotopomers=isotopomers_six_carbon, percentages=random_percentages, exp_index=exp_index)
        ia.sim_hsqc_data(metabolite='six_carbon', exp_index=exp_index, isotopomers=isotopomers_six_carbon, percentages=random_percentages)
        ia.sim_gcms_data('six_carbon', exp_index)

    ia.add_noise_to_hsqc_gcms('six_carbon', num_samples, hsqc_noise_level=0.03, gcms_noise_level=0.075)
    ia.fit_data_nn(metabolite='six_carbon', fit_isotopomers=isotopomers_six_carbon, percentages=generated_percentages, num_samples=num_samples, hsqc=hsqc, tuner_project_name=f'six_carbon_{hsqc}')

# Save results to an Excel file
ia.save_results('results.xlsx')


Trial 10 Complete [00h 00m 07s]
val_loss: 8.747845649719238

Best val_loss So Far: 8.747845649719238
Total elapsed time: 00h 01m 08s
Epoch 1/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step - loss: 61.5929 - val_loss: 34.1323
Epoch 2/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step - loss: 58.0724 - val_loss: 32.2273
Epoch 3/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 62ms/step - loss: 55.8620 - val_loss: 31.0036
Epoch 4/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step - loss: 54.5674 - val_loss: 30.3175
Epoch 5/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step - loss: 53.8733 - val_loss: 29.8765
Epoch 6/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - loss: 53.0891 - val_loss: 29.5195
Epoch 7/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 69ms/step - loss: 52.7448 - val_loss: 29.2071
Epoch 8/100
[1m1/1[0m [32m

Trial 10 Complete [00h 00m 10s]
val_loss: 17.108579635620117

Best val_loss So Far: 3.6974751949310303
Total elapsed time: 00h 01m 07s
Epoch 1/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step - loss: 23.9742 - val_loss: 53.2492
Epoch 2/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 61ms/step - loss: 14.7924 - val_loss: 36.2842
Epoch 3/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 58ms/step - loss: 9.4886 - val_loss: 13.8606
Epoch 4/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 65ms/step - loss: 7.6612 - val_loss: 4.5671
Epoch 5/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step - loss: 7.3039 - val_loss: 9.6470
Epoch 6/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 58ms/step - loss: 6.2418 - val_loss: 18.8306
Epoch 7/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step - loss: 6.2736 - val_loss: 23.1715
Epoch 8/100
[1m1/1[0m [32m━━━━━

KeyboardInterrupt: 