This repository is meant to mostly be a simple hackable educational tool to demonstrate/iterate on a basic implementation of plastic memory including core features : hierarchial learning, autonomous abstraction, and synaptic potentation.
Here is a demo of it in action, learning the first 40 names in the makemore dataset.
- Install the requirements in
requirements.txt
. The basic implementation only requiresnumpy
andnetworkx
but you can optionally installgraphviz
to visualize graphs by following instructions here. - Follow the instructions in
01_introduction.ipynb
.
Plastic memory is an experimental seq2seq learning technique that learns by arranging information as a hierarchical abstraction tree, with the process of learning as the dual mechanism of non-local refactoring i.e. finding similarities and abstracting them out in the tree and local synaptic firings i.e. strengthening / weakening pathways through the tree.
We call it plastic because unlike traditional neural networks which are rigid/hard to update, plastic memory is explicitly mouldable and flexible over the long-term, similar in behaviour to neuroplasticity in biological systems.
The process is analogous to the refactoring of a software tree, which starts off with low-level standard libraries, and gradually builds higher-level concepts or abstractions by combining lower-level concepts together. The process exhibits the following properties :
- Continuous learning : No gradient descent - weights get continuously updated via local "synaptic firings" as it exposed to samples.
- Sample efficiency : Dimensionality reduction via finding similarities/isomorphisms and refactoring them into a hierarchical tree
- Growing concept hierarchies : Refactoring process creates abstract hierarchical concept libraries (demo here shows this for 32k first names from the makemore dataset, takes ~2 mins to train. note how the library grows from 1 r_level to 9 r_levels)
- Curriculum learning : The order in which it is shown information plays a big role, simpler concepts shown first accelerate creating higher-level concepts later. As far as I see (subject to more testing), ideal information order is that where cumulative info entropy with every additional sample increases at a constant rate
- Emergent complexity : Localized, so each neuron knows only about the ones surrounding it. Only the process to find isomorphisms + refactoring them is non-local
- Sparse networks : During inference and learning, only certain neurons are fired along the decision tree, all are not active at all times
The fundamental learning algorithm can be compared against other traditional seq2seq prediction methods (like Transformers, RNNs) out-of-the-box on the same first names dataset by running tests against their handy implementations in Andrej Karpathy's makemore repository.
- Better/local ways to find (subgraph) isomorphisms
- Express algorithms using lambda calculus / a purely functional spec
- Better, more localized refactoring algorithms
- Experimenting with sample order w.r.t learning efficiency/time
- Experimenting with different priors to create base hierarchy
- De-potentiation - potentiating a negative or fractional signal based on environmental feedback
- Larger/more complex datasets (ARC, etc)
- SDRs : Brain-like sparse density representations of incoming data
- Training on list-like functional DSLs
- Finding similarities/differences with biological learners