This notebook is for mass generation of the training data for ANN-1D-PIB. It makes use of the ./solve_poly executable which is a Fortran program to numerically calculate the energies for a 1D particle in a box. 

In [18]:
import subprocess
import random
import os
import pandas as pd

The genererated data is dictated by two parameters:
- L_max - the total size of the box. Specifically, in the code the size of the box will be defined as from -L/2 to L/2. 
- A - the parameter A gives us the largest absolute value of the polynomial coefficients for the potential. So the data will have polynomials with coefficients from -A to +A. 

In [2]:
L_max = 1
A = 10

We can use another parameter N to choose how many random samples in this range we want to collect for the training set.

In [3]:
N = 100

Running the code block below will randomly generate N training sample sets based on the previously defined parameters.
- L is a random number from 0 to L_max
- each coefficient is a random number from -A to +A. 

In [32]:
if os.path.exists('training_data.csv'):
    os.remove('training_data.csv')
    open('training_data.csv', 'w').close()
else:
    open('training_data.csv', 'w').close()

if os.path.exists('training_data_nodim.csv'):
    os.remove('training_data_nodim.csv')
    open('training_data_nodim.csv', 'w').close()
else:
    open('training_data_nodim.csv', 'w').close()

for i in range(N):
    L = random.uniform(0, L_max)
    
    coeffs = [random.uniform(-A, A) for _ in range(5)]
    
    with open('coefficients.txt', 'w') as f:
        f.write(f"{L}\n")
        for coeff in coeffs:
            f.write(f"{coeff}\n")
    
    subprocess.run(['./solve_poly'])

print(f"Done. Generated {N} training samples!")

Done. Generated 100 training samples!


The raw data from the Fortran code must be converted into a dimensionless format, where we'll scale each coefficient by L^(i+2) to account for the energy scaling with 1/L^2

In [33]:
df = pd.read_csv('training_data.csv')

# artificially add the labels
df.columns = ["Box Length", "Coefficient 0", "Coefficient 1", "Coefficient 2", "Coefficient 3", "Coefficient 4", "Energy 0", "Energy 1", "Energy 2", "Energy 3", "Energy 4", "Energy 5", "Energy 6", "Energy 7", "Energy 8", "Energy 9"]

for i in range(5):
    df[f'Coefficient {i}'] = df[f'Coefficient {i}'] * df['Box Length']**(i+2)

# save the modified dataset
df.to_csv('training_data_nodim.csv', index=False)

print("Coefficients transformed to dimensionless format")

df = pd.read_csv('training_data_nodim.csv')

Coefficients transformed to dimensionless format
