<a href="https://colab.research.google.com/github/pranavkantgaur/gamd_sr/blob/main/sr_for_lj_potential.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="100" /> _Colab Notebook Template_

## Instructions
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. If you need a GPU: _Runtime_ > _Change runtime type_ > _Harware accelerator_ = _GPU_.
3. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia and other packages (if needed, update `JULIA_VERSION` and the other parameters). This takes a couple of minutes.
4. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the next section.

_Notes_:
* If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2, 3 and 4.
* After installation, if you want to change the Julia version or activate/deactivate the GPU, you will need to reset the Runtime: _Runtime_ > _Factory reset runtime_ and repeat steps 3 and 4.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.10.0" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools"
JULIA_PACKAGES_IF_GPU="CUDA" # or CuArrays for older Julia versions
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  nvidia-smi -L &> /dev/null && export GPU=1 || export GPU=0
  if [ $GPU -eq 1 ]; then
    JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.10.0 on the current Colab Runtime...
2024-11-26 08:12:19 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.10/julia-1.10.0-linux-x86_64.tar.gz [168592090/168592090] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package BenchmarkTools...
Installing IJulia kernel...
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mInstalling julia kernelspec in /root/.local/share/jupyter/kernels/julia-1.10

Successfully installed julia version 1.10.0!
Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then
jump to the 'Checking the Installation' section.




# Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system:

In [1]:
versioninfo()

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
  Threads: 3 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/nvidia/lib:/usr/local/nvidia/lib64
  JULIA_NUM_THREADS = 2


In [2]:
using BenchmarkTools

M = rand(2^11, 2^11)

@btime $M * $M;

  547.783 ms (2 allocations: 32.00 MiB)


In [9]:
using Pkg
Pkg.add("SymbolicRegression")
Pkg.add("MLJ")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m LoggingExtras ─────────────── v1.1.0
[32m[1m   Installed[22m[39m HypergeometricFunctions ───── v0.3.25
[32m[1m   Installed[22m[39m NNlib ─────────────────────── v0.9.26
[32m[1m   Installed[22m[39m ShowCases ─────────────────── v0.1.0
[32m[1m   Installed[22m[39m Accessors ─────────────────── v0.1.38
[32m[1m   Installed[22m[39m RelocatableFolders ────────── v1.0.1
[32m[1m   Installed[22m[39m StatsFuns ─────────────────── v1.3.2
[32m[1m   Installed[22m[39m ContextVariablesX ─────────── v0.1.3
[32m[1m   Installed[22m[39m CategoricalDistributions ──── v0.1.15
[32m[1m   Installed[22m[39m StaticArrays ──────────────── v1.9.8
[32m[1m   Installed[22m[39m CEnum ───────

In [10]:
import SymbolicRegression: SRRegressor
import MLJ: machine, fit!, predict, report

# Dataset with two named features:
X = (a = rand(500), b = rand(500))

# and one target:
y = @. 2 * cos(X.a * 23.5) - X.b ^ 2

# with some noise:
y = y .+ randn(500) .* 1e-3

model = SRRegressor(
    niterations=50,
    binary_operators=[+, -, *],
    unary_operators=[cos],
)

SRRegressor(
  defaults = nothing, 
  binary_operators = Function[+, -, *], 
  unary_operators = [cos], 
  maxsize = nothing, 
  maxdepth = nothing, 
  expression_type = DynamicExpressions.ExpressionModule.Expression, 
  expression_options = NamedTuple(), 
  node_type = DynamicExpressions.NodeModule.Node, 
  populations = nothing, 
  population_size = nothing, 
  ncycles_per_iteration = nothing, 
  elementwise_loss = nothing, 
  loss_function = nothing, 
  dimensional_constraint_penalty = nothing, 
  parsimony = nothing, 
  constraints = nothing, 
  nested_constraints = nothing, 
  complexity_of_operators = nothing, 
  complexity_of_constants = nothing, 
  complexity_of_variables = nothing, 
  warmup_maxsize_by = nothing, 
  adaptive_parsimony_scaling = nothing, 
  mutation_weights = nothing, 
  crossover_probability = nothing, 
  annealing = nothing, 
  alpha = nothing, 
  probability_negate_constant = nothing, 
  tournament_selection_n = nothing, 
  tournament_selection_p = nothing, 

In [11]:
mach = machine(model, X, y)

fit!(mach)

[33m[1m│ [22m[39m - To prevent this behaviour, do `ProgressMeter.ijulia_behavior(:append)`. 
[33m[1m└ [22m[39m[90m@ ProgressMeter ~/.julia/packages/ProgressMeter/kVZZH/src/ProgressMeter.jl:594[39m
[32mEvolving for 50 iterations... 100%|██████████████████████████████████████████| Time: 0:00:53[39m
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mFinal population:


───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           1.881e+00  3.604e+01  y = -0.33638
3           1.848e+00  8.821e-03  y = a * -0.67349
4           6.255e-01  1.083e+00  y = cos(a * 23.539)
6           1.920e-01  5.904e-01  y = cos(a * 23.537) * 1.9624
8           3.309e-02  8.791e-01  y = (cos(a * 23.492) * 1.9991) - b
10          1.112e-06  5.151e+00  y = (cos(a * 23.5) * 2) - (b * b)
12          1.105e-06  3.229e-03  y = (cos(a * 23.5) * 2) - ((b * 0.99981) * b)
14          1.097e-06  3.567e-03  y = (((a * -0.00063216) + 2.0003) * cos(a * 23.5)) - (b * ...
                                      b)
16          1.093e-06  1.902e-03  y = (cos(a * 23.5) * ((a * -0.00058765) + 2.0003)) - (b * ...
                                      (b + -0.0001142))
17          1.079e-06  1.253e-02  y = (cos(a * 23.5) * 2.0001) - ((cos(a * -27.729) * -0.000...
                                      25

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mResults saved to:


trained Machine; caches model-specific representations of data
  model: SRRegressor(defaults = nothing, …)
  args: 
    1:	Source @237 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:	Source @943 ⏎ AbstractVector{ScientificTypesBase.Continuous}


In [12]:
report(mach)

(best_idx = 6,
 equations = DynamicExpressions.ExpressionModule.Expression{Float64, DynamicExpressions.NodeModule.Node{Float64}, @NamedTuple{operators::DynamicExpressions.OperatorEnumModule.OperatorEnum{Tuple{typeof(+), typeof(-), typeof(*)}, Tuple{typeof(cos)}}, variable_names::Vector{String}}}[-0.3363823676071395, a * -0.6734919897202386, cos(a * 23.53929943316426), cos(a * 23.537023552572748) * 1.9624233342565114, (cos(a * 23.491550737225463) * 1.999081505616723) - b, (cos(a * 23.500053330149107) * 2.000006588379132) - (b * b), (cos(a * 23.50006060188921) * 1.9999992451614406) - ((b * 0.9998069580246424) * b), (((a * -0.0006321619708218952) + 2.000319691169679) * cos(a * 23.500048419093645)) - (b * b), (cos(a * 23.500054040015467) * ((a * -0.00058764521719565) + 2.00029341988214)) - (b * (b + -0.00011419555097368143)), (cos(a * 23.500028101738828) * 2.0000516388570566) - ((cos(a * -27.72889935855771) * -0.00025930912010819286) + (b * b)), (cos(a * 23.500028101738828) * 2.00005163885

In [13]:
predict(mach, X)

500-element Vector{Float64}:
  1.4529191130144081
  0.37351879992802367
 -1.0397001428255115
 -1.9710944003066304
 -1.9850746482051314
  0.1400200941383002
  0.6379148199719589
 -2.1941247426624964
  0.6966018959886291
  0.07996730272821265
 -0.6065558270503534
 -0.5501203714807741
  1.5690209673116229
  ⋮
 -2.448166399741377
  0.9333720527746993
 -0.22425232689975205
 -1.2113200643083721
 -2.67909771546825
 -2.0071929943537286
  1.809579698230519
 -1.5807374412560309
 -0.10169226630363723
 -0.45455122286051786
  0.7620157945377333
 -1.9924306870471111

In [16]:
import Pkg; Pkg.add("MLJBase")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.10/Project.toml`
  [90m[a7f614a8] [39m[92m+ MLJBase v1.7.0[39m
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`


In [32]:
using SymbolicRegression
using MLJBase: machine, fit!, report

function my_structure((; f, g1, g2), (x1, x2, x3))
    _f = f(x1, x2)
    _g1 = g1(x3)
    _g2 = g2(x3)

    # We use `.x` to get the underlying vector
    out = map((fi, g1i, g2i) -> (fi + g1i, fi + g2i), _f.x, _g1.x, _g2.x)
    # And `.valid` to see whether the evaluations
    return ValidVector(out, _f.valid && _g1.valid && _g2.valid)
end
structure = TemplateStructure{(:f, :g1, :g2)}(my_structure)

(::TemplateStructure{(:f, :g1, :g2), typeof(my_structure), @NamedTuple{f::Int64, g1::Int64, g2::Int64}}) (generic function with 0 methods)

In [41]:
function lj_potential_structure((; attr_func, rep_func), (rad, ))
  _attr_func = attr_func(rad)^-12
  _rep_func = rep_func(rad)^-6

  out = map((attr_func_i, rep_func_i) -> (attr_func_i - rep_func_i), _attr_func.x, _rep_func.x)
  return ValidVector(out, _attr_func.valid && _rep_func.valid)
end
lj_structure = TemplateStructure{(:attr_func, :rep_func)}(lj_potential_structure)

(::TemplateStructure{(:attr_func, :rep_func), typeof(lj_potential_structure), @NamedTuple{attr_func::Int64, rep_func::Int64}}) (generic function with 0 methods)

In [35]:
X = rand(100, 3) .* 10

y = [
    (sin(X[i, 1]) + X[i, 3]^2, sin(X[i, 1]) + X[i, 3])
    for i in eachindex(axes(X, 1))
]

100-element Vector{Tuple{Float64, Float64}}:
 (21.840554630738517, 4.96709415668862)
 (15.28681304180029, 3.778531419128792)
 (10.310276027666779, 3.847953473729792)
 (24.720177387496992, 4.8149370612435)
 (80.4894212485866, 9.316202124658657)
 (61.82403950232539, 6.929141990905532)
 (91.04637721239007, 9.37255668370128)
 (45.53575341678187, 7.384218759084607)
 (9.936702084275998, 3.409888117712459)
 (55.81291923682466, 6.57459726463829)
 (63.54501702336823, 8.86531426613666)
 (12.46042008452883, 3.704335261835227)
 (27.087191147589106, 5.773148397610701)
 ⋮
 (56.06021646521159, 8.402384610536757)
 (63.01287070644966, 7.000858337963683)
 (-0.4952404123354181, -0.25082417554624015)
 (7.202198118271907, 3.0093204076087408)
 (1.6734419084681114, 1.8173048648790548)
 (1.057437334408116, 1.2396764509750922)
 (24.747215288393036, 4.154713559132587)
 (0.1611249743359573, 0.32066234224930545)
 (15.043789495567218, 3.5864499283965006)
 (16.073027512386755, 4.293684995785675)
 (7.789880645938832

In [49]:
using LinearAlgebra

# Set parameters for Lennard-Jones potential
epsilon = 1.0  # Depth of the potential well
sigma = 1.0    # Finite distance at which the potential is zero

# Generate random radial distances
X = rand(100, 1) .* 10  # Random distances in a 3D space

# Calculate Lennard-Jones potential for each radial distance
Y = [
    4 * epsilon * ((sigma / norm(X[i, :]))^12 - (sigma / norm(X[i, :]))^6)
    for i in eachindex(axes(X, 1))
]

# Y now contains the Lennard-Jones potentials corresponding to each radial distance


100-element Vector{Float64}:
 -0.002207115310458205
 -0.00035928157637425147
 -0.011931915341852352
 -0.001503708040672303
 -0.001426637263613886
 -0.010000339425530986
 -0.7737111646266084
 18.570299010860708
 -0.0025406434611354232
 -8.381298924398304e-6
 -0.018758175621607684
 -0.2819784358412525
 -0.01663329167729555
  ⋮
 13.074120431078924
 -4.047191959489278e-6
 -8.381394872340109e-6
 -0.1169401875397392
 -1.2644571202517626e-5
 -0.0014261940161023519
 -0.004807709374753719
 -8.52298747852645e-6
 -0.06954011670134806
 -0.0015821033225524351
 -0.0004035971815177588
 -6.883507563207863e-6

In [50]:
elementwise_loss = ((x1), (y1)) -> (y1 - x1)^2

#39 (generic function with 1 method)

In [51]:
model = SRRegressor(;
    binary_operators=(+, *, /),
    #unary_operators=(sin,),
    maxsize=15,
    elementwise_loss=elementwise_loss,
    expression_type=TemplateExpression,
    # Note - this is where we pass custom options to the expression type:
    #expression_options=(; structure),
    expression_options=(; lj_structure),
)

mach = machine(model, X, Y)
fit!(mach)

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(SRRegressor(defaults = nothing, …), …).
[91m[1m┌ [22m[39m[91m[1mError: [22m[39mProblem fitting the machine machine(SRRegressor(defaults = nothing, …), …). 
[91m[1m└ [22m[39m[90m@ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/machines.jl:694[39m
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mRunning type checks... 
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mType checks okay. 


LoadError: type NamedTuple has no field structure

In [21]:
report(mach)

(best_idx = 3,
 equations = TemplateExpression{Float64, TemplateStructure{(:f, :g1, :g2), typeof(my_structure), @NamedTuple{f::Int64, g1::Int64, g2::Int64}}, Node{Float64}, ComposableExpression{Float64, Node{Float64}, @NamedTuple{operators::DynamicExpressions.OperatorEnumModule.OperatorEnum{Tuple{typeof(+), typeof(*)}, Tuple{typeof(sin)}}, variable_names::Nothing}}, @NamedTuple{f::ComposableExpression{Float64, Node{Float64}, @NamedTuple{operators::DynamicExpressions.OperatorEnumModule.OperatorEnum{Tuple{typeof(+), typeof(*)}, Tuple{typeof(sin)}}, variable_names::Nothing}}, g1::ComposableExpression{Float64, Node{Float64}, @NamedTuple{operators::DynamicExpressions.OperatorEnumModule.OperatorEnum{Tuple{typeof(+), typeof(*)}, Tuple{typeof(sin)}}, variable_names::Nothing}}, g2::ComposableExpression{Float64, Node{Float64}, @NamedTuple{operators::DynamicExpressions.OperatorEnumModule.OperatorEnum{Tuple{typeof(+), typeof(*)}, Tuple{typeof(sin)}}, variable_names::Nothing}}}, @NamedTuple{structu

In [22]:
r = report(mach)
idx = r.best_idx
best_expr = r.equations[idx]
best_f = get_contents(best_expr).f
best_g1 = get_contents(best_expr).g1
best_g2 = get_contents(best_expr).g2

x1

Add new code cells by clicking the `+ Code` button (or _Insert_ > _Code cell_).

Have fun!

<img src="https://raw.githubusercontent.com/JuliaLang/julia-logo-graphics/master/images/julia-logo-mask.png" height="100" />