standalone hackable model+training, architecture tuning by jpata · Pull Request #460 · jpata/particleflow

jpata · 2026-03-18T11:12:30Z

Support for a small, hackable, standalone version of mlpf that can be tuned: supports standard attention, HEPT, global attention, fastformer.
Automatic tuning via genetic algorithm evolution of a simple string-based model specification (DSL).

Interestingly, the architecture optimization prefers to use Fastformer or global attention. Perhaps the effect of the cpu runtime is too strong right now.

 1. Gen  8 - Fitness: 0.1376 - Val Loss: 0.9553 - Match Frac: 0.7339 - VRAM: 3345.2 MB - Key: i(55,256,512,default)|f(16,256,1024,pos=T,dropout=0.0)+f(16,256,2048,pos=T,dropout=0.0)|o(8,512,default)
 2. Gen  9 - Fitness: 0.1376 - Val Loss: 1.0921 - Match Frac: 0.6857 - VRAM: 1307.1 MB - Key: i(55,256,512,projection_only,dropout=0.0)|g(16,256,1024,pos=T,dropout=0.2)|o(8,256,default,rg={pt:linear,eta:additive},dropout=0.0)
 3. Gen  8 - Fitness: 0.1352 - Val Loss: 0.9332 - Match Frac: 0.7477 - VRAM: 2547.0 MB - Key: i(55,256,512,default,dropout=0.0)|f(16,256,1024,pos=T,dropout=0.0)|o(8,512,default,rg={pt:direct,eta:linear,sin_phi:direct,cos_phi:direct,energy:direct})
 4. Gen  4 - Fitness: 0.1232 - Val Loss: 0.9591 - Match Frac: 0.7248 - VRAM: 3274.1 MB - Key: i(55,256,512,default)|f(16,256,512,pos=T,dropout=0.05)+f(16,256,2048,pos=T,dropout=0.0)|o(8,512,default)
 5. Gen  5 - Fitness: 0.1228 - Val Loss: 1.1205 - Match Frac: 0.6843 - VRAM: 1306.6 MB - Key: i(55,256,512,projection_only,dropout=0.0)|g(16,256,1024,dropout=0.2)|o(8,256,default,rg={pt:linear,eta:multiplicative},dropout=0.0)
 6. Gen  4 - Fitness: 0.1203 - Val Loss: 1.1500 - Match Frac: 0.6884 - VRAM: 3081.4 MB - Key: i(55,256,512,default)|g(16,256,1024,dropout=0.2)*2|o(8,512,default,rg={pt:linear,eta:additive},dropout=0.2)
 7. Gen  9 - Fitness: 0.1196 - Val Loss: 0.9732 - Match Frac: 0.7765 - VRAM: 2568.0 MB - Key: i(55,256,512,default,dropout=0.15)|f(16,256,1024,pos=T,dropout=0.0)|o(8,512,default,rg={pt:direct,eta:multiplicative,sin_phi:direct,cos_phi:direct,energy:direct},dropout=0.05)
 8. Gen  7 - Fitness: 0.1185 - Val Loss: 0.9704 - Match Frac: 0.7260 - VRAM: 2568.9 MB - Key: i(55,256,512,default)|f(16,256,1024,pos=T,dropout=0.0)|o(8,512,default)
 9. Gen  6 - Fitness: 0.1144 - Val Loss: 0.9697 - Match Frac: 0.7719 - VRAM: 2569.0 MB - Key: i(55,256,512,default)|f(16,256,1024,pos=T,dropout=0.0)|o(8,512,default,rg={pt:direct,eta:linear,sin_phi:direct,cos_phi:direct,energy:direct})
10. Gen  6 - Fitness: 0.1143 - Val Loss: 1.2024 - Match Frac: 0.6972 - VRAM: 2780.8 MB - Key: i(55,256,512,projection_only,dropout=0.0)|f(16,256,1024,dropout=0.0)*2|o(8,512,default,rg={pt:linear,eta:additive},dropout=0.2)

TODO:

check attention impementations and losses (standard, HEPT, global, fastformer)
add results from current architecture scan (make plots first)

jpata and others added 30 commits March 17, 2026 23:25

options for autoresearch

9eecbbc

added standalone code

2083188

fixes

fd92f28

fixes for training

10e0d32

variance loop

5622e27

slightly longer training

8ff317c

format

93cff38

fix durations

a91b54c

fix buffering

fc281f1

fix inconsistency

043f8b8

switch to GELU and double depth in PreLnSelfAttentionLayer

8861761

Exp 6: GELU, width*4, 16 heads

30b902a

Exp 15: single backbone, 6 layers

506accf

update program

e98a586

Exp 23: Fastformer attention

502130e

update program

78b46db

Merge branch 'autoresearch/mar18' into jp_20260317_localtraining

1c5871a

merge research

9a892ab

attention types

bcb4b36

masks

46245ac

move benchmarking

b4f0cfd

match frac

9a43451

move benchmark and compilation to the end again, better for training

201bfea

simple dsl

654bc93

dsl submission script

77888b2

fixes to run on tallinn

47998ca

more compact configs

5fb7796

generate configs

864efab

format

bdf47fa

mutate

3067066

jpata added 4 commits March 19, 2026 08:10

metrics

b88e384

evolution script

4f0eefe

use MIGs

57bd774

use MIGs

2c47c4f

jpata changed the title ~~Autoresearch training~~ standalone hackable model+training, architecture tuning Mar 19, 2026

jpata and others added 14 commits March 19, 2026 23:20

format, fix ambiguity

52991f3

update DSL and evolution

21da616

improve evolution plot, format

ba9cc6d

Merge branch 'main' into jp_20260317_localtraining

1333fab

format

e40b38a

dropout and repr

736420d

show attention

c9bdc68

visualization

ff6da45

linear attewntion

9f9940c

comments and additional tests

7f63ee6

get rid of fla layers

3f98a25

longer training

681a0b5

added print, cleanup

da96cc4

revert

5e111fb

jpata merged commit 62e0a0c into main Mar 24, 2026
3 checks passed

jpata deleted the jp_20260317_localtraining branch March 25, 2026 15:05

This was referenced Mar 25, 2026

investigate HEPT #303

Closed

update evolution #470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

standalone hackable model+training, architecture tuning#460

standalone hackable model+training, architecture tuning#460
jpata merged 48 commits intomainfrom
jp_20260317_localtraining

jpata commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpata commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jpata commented Mar 18, 2026 •

edited

Loading