Skip to content

standalone hackable model+training, architecture tuning#460

Merged
jpata merged 48 commits intomainfrom
jp_20260317_localtraining
Mar 24, 2026
Merged

standalone hackable model+training, architecture tuning#460
jpata merged 48 commits intomainfrom
jp_20260317_localtraining

Conversation

@jpata
Copy link
Copy Markdown
Owner

@jpata jpata commented Mar 18, 2026

Support for a small, hackable, standalone version of mlpf that can be tuned: supports standard attention, HEPT, global attention, fastformer.
Automatic tuning via genetic algorithm evolution of a simple string-based model specification (DSL).

Interestingly, the architecture optimization prefers to use Fastformer or global attention. Perhaps the effect of the cpu runtime is too strong right now.

 1. Gen  8 - Fitness: 0.1376 - Val Loss: 0.9553 - Match Frac: 0.7339 - VRAM: 3345.2 MB - Key: i(55,256,512,default)|f(16,256,1024,pos=T,dropout=0.0)+f(16,256,2048,pos=T,dropout=0.0)|o(8,512,default)
 2. Gen  9 - Fitness: 0.1376 - Val Loss: 1.0921 - Match Frac: 0.6857 - VRAM: 1307.1 MB - Key: i(55,256,512,projection_only,dropout=0.0)|g(16,256,1024,pos=T,dropout=0.2)|o(8,256,default,rg={pt:linear,eta:additive},dropout=0.0)
 3. Gen  8 - Fitness: 0.1352 - Val Loss: 0.9332 - Match Frac: 0.7477 - VRAM: 2547.0 MB - Key: i(55,256,512,default,dropout=0.0)|f(16,256,1024,pos=T,dropout=0.0)|o(8,512,default,rg={pt:direct,eta:linear,sin_phi:direct,cos_phi:direct,energy:direct})
 4. Gen  4 - Fitness: 0.1232 - Val Loss: 0.9591 - Match Frac: 0.7248 - VRAM: 3274.1 MB - Key: i(55,256,512,default)|f(16,256,512,pos=T,dropout=0.05)+f(16,256,2048,pos=T,dropout=0.0)|o(8,512,default)
 5. Gen  5 - Fitness: 0.1228 - Val Loss: 1.1205 - Match Frac: 0.6843 - VRAM: 1306.6 MB - Key: i(55,256,512,projection_only,dropout=0.0)|g(16,256,1024,dropout=0.2)|o(8,256,default,rg={pt:linear,eta:multiplicative},dropout=0.0)
 6. Gen  4 - Fitness: 0.1203 - Val Loss: 1.1500 - Match Frac: 0.6884 - VRAM: 3081.4 MB - Key: i(55,256,512,default)|g(16,256,1024,dropout=0.2)*2|o(8,512,default,rg={pt:linear,eta:additive},dropout=0.2)
 7. Gen  9 - Fitness: 0.1196 - Val Loss: 0.9732 - Match Frac: 0.7765 - VRAM: 2568.0 MB - Key: i(55,256,512,default,dropout=0.15)|f(16,256,1024,pos=T,dropout=0.0)|o(8,512,default,rg={pt:direct,eta:multiplicative,sin_phi:direct,cos_phi:direct,energy:direct},dropout=0.05)
 8. Gen  7 - Fitness: 0.1185 - Val Loss: 0.9704 - Match Frac: 0.7260 - VRAM: 2568.9 MB - Key: i(55,256,512,default)|f(16,256,1024,pos=T,dropout=0.0)|o(8,512,default)
 9. Gen  6 - Fitness: 0.1144 - Val Loss: 0.9697 - Match Frac: 0.7719 - VRAM: 2569.0 MB - Key: i(55,256,512,default)|f(16,256,1024,pos=T,dropout=0.0)|o(8,512,default,rg={pt:direct,eta:linear,sin_phi:direct,cos_phi:direct,energy:direct})
10. Gen  6 - Fitness: 0.1143 - Val Loss: 1.2024 - Match Frac: 0.6972 - VRAM: 2780.8 MB - Key: i(55,256,512,projection_only,dropout=0.0)|f(16,256,1024,dropout=0.0)*2|o(8,512,default,rg={pt:linear,eta:additive},dropout=0.2)

fitness_evolution

TODO:

  • check attention impementations and losses (standard, HEPT, global, fastformer)
  • add results from current architecture scan (make plots first)

@jpata jpata changed the title Autoresearch training standalone hackable model+training, architecture tuning Mar 19, 2026
@jpata jpata merged commit 62e0a0c into main Mar 24, 2026
3 checks passed
@jpata jpata deleted the jp_20260317_localtraining branch March 25, 2026 15:05
This was referenced Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant