Synapse Update (#780)

* Remove duplicate info print in validator epoch * Add duration info print to validator forward * Add duration info print to validator forward * Change print details in validator forward * Bit 458 combine advanced with template server (#792) * init * added local training * . * working axon backward * model saving * shape fix * clean * averaging grad and loss * fix * fixes for comments * removed advanced server and templarte server * Synapse fix dend test (#798) * prelim fixes * 2nd fix * final fix * concstant fix * validator fix * core server self.set_fine_tuning_params() fix * fix * UI updates (#796) * UI updates * core server fixes * turning off blacklisting for now * generation fix * UI and check updates * small bug fixes * constant import * bug fixes * Bit 490 synapse one fail drag all down (#810) * init * bug fixed * generate bug fixes (#812) * circle ci test fix (#813) fixes * validator fixes * fixes for test_wallet * Adds flag --wallet.reregister <bool> (#819) * Adds flag --neuron.reregister <bool> Default True * created tests for new flag * Revert "Adds flag --neuron.reregister <bool>" This reverts commit 0736dfb. * fix tests for new flag * add new flag and implementation * [WIP]Bit 490 synapse fix (#814) * generate bug fixes * dendrite backward disabled by default and paded sequence * detach fix * vocab size from 50257 to 50258 Co-authored-by: joeylegere <joeylegere@gmail.com> * Synapse gpu fix (#822) * temp fixes * small fixes + blacklist * Bit 490 synapse fix (#824) * generate bug fixes * dendrite backward disabled by default and paded sequence * detach fix * vocab size from 50257 to 50258 * fix generate sizing * arg fix * generate fix * put new inputs on correct device * tensors don't do .to inplace * Null synapse (#827) * default synapse last hidden state * Synapse * move synapse code within try catch * args * kwargs * Unknown synapse * unknown synapse * codes update * string update * revert fix for comment * proto updates * seq2seq parameters update * Synapse defaults fix (#832) * default synapse last hidden state * Synapse * move synapse code within try catch * args * kwargs * Unknown synapse * unknown synapse * codes update * string update * revert fix for comment * proto updates * seq2seq parameters update * defaults update * Remove template miner (#833) * default synapse last hidden state * Synapse * move synapse code within try catch * args * kwargs * Unknown synapse * unknown synapse * codes update * string update * revert fix for comment * proto updates * seq2seq parameters update * defaults update * deprecate template miner * core_server update (blacklist changes + priority) * removed tests for template miner * remove update script * seq2seq defaults updates (#836) * defaults updates * 1 minute timeout * Bit 495 synapse more unit test (#828) * added dend receptor test * missing synapse test * fix * added axon text * fix * . * small defaults update for seq2seq (#837) * Fix uninitialized device variable in core_server * Check for any available cuda device in core_server * Allow for higher pytorch versions for CUDA 11.6 (sm_86) support * Remove forced training when autocast and cuda * Cast back to float32 in case autocast in logit translation * Truncate logits down to vocab length * put tensors on the device * Validator hotfix (#844) * . * . * Change tokenizer pad_token to eos_token Define PAD Token = EOS Token = 50256, according to https://github.com/huggingface/transformers/blob/49c8c67fb815a277405f84dea4a66353e19fb347/tests/models/gpt2/test_modeling_gpt2.py#L532 Set padding_side = "left", since generative default expects most recent token on right-hand side with padding on left, according to huggingface/transformers#10552 * Simplify token padding in remapping_token_causallm * Use pad_token in remapping_token_causallm in server Note that tokenizer(padding=True, ...) is not used because unpadded offset_mapping is required for logit translation operations. * Add tokenizer flags to remapping_token_causallm To allow function to be used in various scenarios, including for causallm and generate. * Combine remapping_token functions in server Now a single remapping_token function servers all server forward functions. * Update forward_generate with new token_remap * Remove old remapping_token * Adjust token_remap parameters in core_server * Undo pad handling in validator forward Now tokenizer.padding_side='left' ensures that last position fulfils validation. Additionally, validation is usually performed on unpadded [batch_size, sequence_len] tokens. * Tensorize token input_ids and attention_mask in core_server * Fix legacy constructor device parameter issue * Add GPT2 generate convention pad_token_id=eos_token_id https://github.com/huggingface/transformers/blob/49c8c67fb815a277405f84dea4a66353e19fb347/tests/models/gpt2/test_modeling_gpt2.py#L532 * Maint reposition remapping_token function * use fast fix * Revert "use fast fix" This reverts commit e34c032. * Add TextCausalLMNext Synapse to proto Specifies messaging of topk server token phrases with probabilities. Server last position token predictions are retokenized to token phrases with the bittensor tokenizer. Allows for zero translation loss CausalLM next generation between different tokenizers. Also adds comment specifying proto compile command, which is useful to see for manual compilation instruction. * Add code_to_synapse for text_causal_lm_next * Add TextCausalLMNext Synapse Specifies messaging of topk server token phrases with probabilities. Server last position token predictions are retokenized to token phrases with the bittensor tokenizer. Allows for zero translation loss CausalLM next generation between different tokenizers. * Add text_causal_lm_next to Dendrite class Specifies messaging of topk server token phrases with probabilities. Server last position token predictions are retokenized to token phrases with the bittensor tokenizer. Allows for zero translation loss CausalLM next generation between different tokenizers. * Add synapse_causal_lm_next to Axon Specifies messaging of topk server token phrases with probabilities. Server last position token predictions are retokenized to token phrases with the bittensor tokenizer. Allows for zero translation loss CausalLM next generation between different tokenizers. * Add topk token phrases utilities Tokenizer utilities adds functions to compact and unravel topk server token phrases (standard tokenized), to be used with TextCausalLMNext synapse. * Add unit test for topk token phrases utilities Unit test new tokenizer utility functions that compact and unravel topk server token phrases (standard tokenized), to be used with TextCausalLMNext synapse. * Add phrase entropy for topk token phrases Calculates the cross entropy of a phrase prediction against a target phrase, so that this is a multi-token extension of typical cross entropy calculated for next token prediction, to be used with TextCausalLMNext synapse. * Add unit test for phrase entropy for topk token phrases Adds unit test for calculating the cross entropy of a phrase prediction against a target phrase, so that this is a multi-token extension of typical cross entropy calculated for next token prediction, to be used with TextCausalLMNext synapse. * Add axon tests for TextCausalLMNext * Add dendrite tests for TextCausalLMNext * Add forward-backward tests for TextCausalLMNext * Add receptor tests for TextCausalLMNext * Add receptor_pool tests for TextCausalLMNext * Update receptor_pool tests for TextCausalLMNext * Update receptor_pool tests for TextCausalLMNext * Add encode_forward_causallmnext() to server To be used for TextCausalLMNext synapse, already compacts/encodes response for transfer to remote dendrite with backward support. Forward pass through the pretrained model and select topk tokenizer logits and retokenize with std_tokenizer, then compact new token phrases and probabilities into 1-D tensor [ > batch_size * 2 * topk + 1] prob + at least 1 token per phrase + floor_prob. The floor probability is the mean probability of token phrases not captured in topk, required since the server tokenizer vocab_size may not be known to the receiver/validator. * Add forward_casual_lm_next() to server To be used for TextCausalLMNext synapse, already compacts/encodes response for transfer to remote dendrite with backward support. Forward pass through the pretrained model and select topk tokenizer logits and retokenize with std_tokenizer, then compact new token phrases and probabilities into 1-D tensor [ > batch_size * 2 * topk + 1] prob + at least 1 token per phrase + floor_prob. The floor probability is the mean probability of token phrases not captured in topk, required since the server tokenizer vocab_size may not be known to the receiver/validator. * Add causallmnext to core_server neuron config * Add calc_loss_fct to neuron_utilities * Add PositionalEncoding to neuron_utilities * Import neuron_utilities and use its PositionalEncoding * Add validation_len to core_validator neuron params Number of tokens to holdout for phrase validation beyond sequence context. * Maint core_validator forward() formatting and docs * Maint core_validator forward() num_servers -> num_endpoints * Detach synapse responses and move to validator device * Add synergy_table display to core_validator Prints the synergy loss diff matrix with pairwise loss reduction due to synergy (original loss on diagonal) * Add stats_table display to core_validator Gathers data and constructs neuron statistics table and prints it. * Add synapse_table display to core_validator Prints the evaluation of the neuron responses to the validator request. * Add unsuccess display to core_validator Prints the return codes and response times of unsuccessful responses. * Add shapley_base to core_validator Calculate Shapley base values and neuron response validation measure statistics, given responses from a synapse. * Add scaling_law_loss_to_params to core_validator (OpenAI scaling laws) Kaplan, Jared, et al. "Scaling laws for neural language models." arXiv:2001.08361 (2020) * Add shapley_synergy to core_validator Calculates Shapley synergy for coalition size 2, measured performance above expected performance. Measured in effective number of model parameters, just like base Shapley values. * Add textcausallm to core_validator Calculate Shapley values and neuron response validation measure statistics, given TextCausalLM synapse responses. * Add textcausallmnext to core_validator Calculate Shapley values and neuron response validation measure statistics, given TextCausalLMNext synapse responses. * Use synapse validation functions in core_validator Replace monolith with individual synapse validation function use in forward(). * Add neuron statistics variables to core_validator * Add neuron_stats_update to core_validator Updates self.neuron_stats with new individual dictionaries per uid. * Add calculate_weights to core_validator Calculates neuron set-weights from weight_key mapped values. Defines weight_key as the neuron stats key used to obtain the mapped stat value (typically a Shapley value) that the final set-weights are calculated from. * Add __str__ and __repr__ for core_validator Display UID, IP, wallet address summary for validator. * Add weights_table to core_validator Prints weights table given topk_uids and topk_weights. * Use neuron_stats_update in core_validator * Use stats_table to print stats update in core_validator * Use calculate_weights and weights_table in core_validator * Replace server_stats with neuron_stats in core_validator * Fix rich print formatting in core_validator * Update neuron_stats_columns with TextCausalLMNext synapse fields * Move tensors to core_validator device * Simplify shapley_synergy loss_diff_share * Move tensors to same device in phrase_cross_entropy * Use non-zero weight_key in core_validator * Fix topk not implemented on cpu for float16 * Unforce remote_train when cuda and float16 * Ensure probability computations done in float32 for improved precision * Set tokenizer.padding_side = "left" in core_server Generative default expects most recent token on right-hand side with padding on left. huggingface/transformers#10552 * Run model for TextCausalLMNext always Do not reuse model outputs from TextCausalLM, since the padding side differs. * Gradually increase weight fields from zero via EMA New axons will gradually increase in weighting as the number of successfully responded queries grows. This ensures that sufficient observations are averaged before weighting to address potentially noisy validation measures. * Add vocab to tokenizer if it does not have attr * Add vocab to tokenizer if it does not have attr * Define vocab_len as real tokenizer vocabulary length * Define vocab_len as real tokenizer vocabulary length * Try use_fast=False at tokenizer load if fast fails * Add epsilon for log in loss computation * Sort validation table according to TextCausalLMNext More models, like OPT, are supported by TextCausalLMNext than TextCausalLM that requires fast tokenizers. Validation table sorted according to more populated synapse provides better view. * Avoid EMA of nan values in core_validator neuron_stats * Replace nan/inf losses with large loss in core_validator * Swap validator table columns TextCausalLM <-> TextCausalLMNext * Ensure softmax computation done in float32 for improved precision * Simplify remapping_token to include untensorized offset_mappings * Simplify encode_forward_causallm in core_server * Remove left padding from batch probabilities for logit translation * Remove left padding from batch tokens for logit translation * Return internal model_output in encode_forward_causallm * Reuse model_output in encode_forward_causallmnext in core_server * Move tokens to core_server device * Set std_sequence_len according to tokens_std in logit translation * Check that model_output is not None before overwrite in axon Otherwise some synapses like TextSeq2Seq with model_output=None will overwrite previous (potentially) non-None model_output. * Revert model_output check in axon * Log server loss and translated loss for TextCausalLM in core_server * Update target_phrases type * Add validation loss for TextCausalLMNext To complement the existing phrase cross entropy loss, and to allow for more direct comparison to TextCausalLM. * Move target_phrase to cpu in phrase_cross_entropy * Remove losses_val_nxt in textcausallmnext * Use floor probability when no validation match in phrase_cross_entropy * Update logger used for server translated loss in core_server * Use loguru info in core_validator instead of print * Update logger style in core_validator * Update logger style in core_validator * Set vocab_len for tokenizers to store real vocabulary size * Update test_tokenizer_utils.py * Omit attention_mask in TextCausalLM (update unittests) Transformer models like gerpt2 typically perform worse with left-side attention mask, so turning it off. * Add message to synapse_callback return outputs Allows direct loss evaluations and statistics to be passed through from the server model synapse execution to the axon logging output, useful to display server-side loss values for verification at validation. * Add message to synapse_callback return outputs Allows direct loss evaluations and statistics to be passed through from the server model synapse execution to the axon logging output, useful to display server-side loss values for verification at validation. * [BIT 487] btcli regen coldkey with pubkey only (#831) * fix no_prompt help dialog * add function to regen coldkeypub from the pub part * added new regen_coldkeypub to CLI * add no_prompt option * catch if the addr/pub key is valid first * add None default * fix utils import * explicitly set ss58_format * add tests for new regen_coldkeypub * add integration test for wallet coldkeypub create * add integration test for cli regen_coldkeypub * fix type * add wallet args * fix config access * don't specify bad function param * fix test to only check coldkeypub * remove printout inside validation check and rename * fix uncaught name change * Synapse basic fixes (#851) * basic bug fixes * sample config fixes Co-authored-by: Ala Shaabana <shaabana@gmail.com> Co-authored-by: joeylegere <joeylegere@gmail.com> * Add support for non whitespace-preserving tokenizers Prepends strings with a space, in the case of non whitespace-preserving tokenizers like BERT, which should mimic whitespace-preserving token strings more often than not. Adds error catching on per-UID basis in shapley_base for synapse validation to be more robust. * Move prep_tokenizer to tokenizer_utils.py * Update warning message in core_validator synapse validation * Disable coveralls (#855) * Update README.md * disable publishing coverage * dont upload any coverage Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com> Co-authored-by: Joey Legere <joey@opentensor.ai> * Add set_std_token_phrases for caching std_tokenizer equivalent of tokenizer token strings Sets std_token_phrases which are the tokenizer token strings tokenized with std_tokenizer, so the std_tokenizer equivalent of the tokenizer token strings. Used for converting model predictions/logits into std_tokenizer representations, for example in TextCausalLMNext. * Update topk_token_phrases to output 2D tensor with gradients instead of compact tensor Standardizes the TextCausalLMNext server model output to 2D tensor with gradients, thereafter the axon encoding will compact the 2D tensor into 1D by removing ignore_index padding. Standardization should allow for multithreaded decoding at dendrite receptor_pool into 2D tensor for validator to form matching gradient shape. Select topk tokenizer logits/phrases and include std_token_phrases counterparts (std_tokenization of token text) in topk_tensor output of shape [batch_size * (topk + 1), max_len], where max len of all phrase lists (with prob in front) is max_{b,k}(len([prob_k, tok_0_k, tok_1_k, ...])). The output topk_tensor also includes a floor_prob for each batch item. The floor probability is the mean probability of token phrases not captured in topk, required since the tokenizer vocab_size may not be known to the receiver. * Add compact_topk_token_phrases to tokenizer_utils.py Compact 2D topk_tensor [batch_size * (topk + 1), max_len] by removing ignore_index padding, and also offset tokens by 2 to preserve [0, 1] for probabilities to allow for proper unraveling demarcated by probability boundaries. * Update topk_token_phrases usages * Update unravel_topk_token_phrases to form 2D topk_tensor Unravel topk token phrases input_tensor from 1-D to [batch_size * (topk + 1), max_len] topk_tensor, which includes topk token probabilities (prob_k) + floor_prob in first column with gradients attached, with std_tokens in remaining columns with ignore_index padding. * Update phrase_cross_entropy to use topk_tensor as input * Update phrase_cross_entropy usages * Change topk_tensor shape to [batch_size, (topk + 1), max_len] * Update TextCausalLMNext synapse to use forward_response encoding/decoding * Add backward_request_gradient encoding/decoding for TextCausalLMNext with new dimensions * Update TextCausalLMNext unit tests with new backward dimensions * Update server to perform encoding in synapse * Update check_len and topk_tensor.device in phrase_cross_entropy * Update unit tests for TextCausalLMNext with new shapes * Update check_backward_request_gradient for TextCausalLMNext with new shapes * Update nill responses of TextCausalLMNext with new shapes * Debug axon tests * Debug axon tests * Debug axon tests * Update decode_backward_request_gradient shape * Update nill responses for TextCausalLMNext synapses * Update nill responses for TextCausalLMNext synapses * Update test_receptor_neuron_mock_server shape for causallmnext * Debug receptor_impl response deserialization exception * Update causallmnext shape in receptor tests * Update causallmnext shape in receptor tests * Update receptor stub shapes for causallmnext * Debug receptor_impl response deserialization exception * Update assert message in unravel_topk_token_phrases * Add template_forward_response_tensor for TextCausalLMNext * Prepare tokenizer with std_token_phrases in unit tests * Assert ResponseDeserializationException for failed causallmnext unravel * Add and update causallmnext unit tests * Require logging debug or trace to print validator tables * Use cell values correctly to sort table rows in validator * Add parameter count estimates to neuron_stats in validator * Parameterize scaling_law_power for scaling_law_loss_to_params * Move scaling_law_power into validation_func * Add synapse name to synergy_table * Limit step weight table to neurons validated in that step * Exclude include_uids with no stats in weights_table * Remove synapse name from synergy_table * Convert set to list in weights_table * Update weights_table status info * Remove unused import from tokenizer_utils * Remove unused name parameter in synergy_table * Add argument --nucleus.scaling_law_power to core_validator Power for modified scaling law, powered down to improve dynamic range, e.g. 3 → 6 nats for 0.5. * Mark updated uids in weights_table * Adjust uid marking in weights_table * Remove unused imports in core_validator * Mark uid row in stats_table with uid key * Mark uid row in stats_table with uid key * [BIT-532] Adding TextCausalLM user arg (#859) * added TextCausalLM user arg * Set synapse_keys based on neuron.validation_synapse Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com> Co-authored-by: opentaco <opentaco@protonmail.com> * Update weights_table conditions for include_uids * Bump Transformer Requirements to 4.20.1 (#863) up the hugging transformer version * Add validator progress status console output when debug/trace off * Add validator progress status console output when debug/trace off * Add validator progress status console output when debug/trace off * Change status message style * Add validator identifier message * Change status message style * Change status message style * Change status message style * Change status message style * Modify message ordering * Add responsive/queried stat to validator status * Change status message style * Synapse timeout fix (#869) * detach logits, and detach graph * loss timing * turn off priority threadpool * timing + loss.item * syntax fix * better timings * additional timing within _forward function * enable threadpool * remove loss cal * thread pool limitation * remove loss cal * thread pool update * Null entry * import bittensor * cancel future + remove finetune * remove timings, add mutex lock for synapse calls * cleanup + comments * removing detach from forward * more cleanup + increase priority size * test fix * revert loss calculation * revert off arguements * Bittensor V3.0.0 (#870) version 3.0.0 * Additional Checks and Protections (#861) * additional checks * doc string update * blacklist time update * Wandb Maintenence (#862) (1) added weight metric to wandb (2) accumulate wandb commit * Bit 537 remove thread queue (#871) * removed thread queue * data_corpus fix * fix * added timeout Co-authored-by: opentaco <opentaco@protonmail.com> Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com> Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca> Co-authored-by: joeylegere <joeylegere@gmail.com> Co-authored-by: Cameron Fairchild <cameron@opentensor.ai> Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com> Co-authored-by: Ala Shaabana <shaabana@gmail.com> Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com> Co-authored-by: Joey Legere <joey@opentensor.ai>
opentensor · Aug 8, 2022 · 4bc9e69 · 4bc9e69
1 parent 3265214
commit 4bc9e69
Show file tree

Hide file tree

Showing 82 changed files with 9,623 additions and 6,093 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -2,7 +2,7 @@ version: 2.1
 
 orbs:
   python: circleci/python@2.0.3
-  coveralls: coveralls/coveralls@1.0.6
+  # coveralls: coveralls/coveralls@1.0.6
 
 jobs:
   build-and-test:
@@ -76,21 +76,21 @@ jobs:
       - store_artifacts:
           path: test-results
 
-      - when:
-          condition:
-            equal: ["3.10.5", << parameters.python-version >> ]
-          steps:
-            - run:
-                name: Upload Coverage
-                command: |
-                  . env/bin/activate && coveralls
-                env:
-                  CI_NAME: circleci
-                  CI_BUILD_NUMBER: $CIRCLE_BUILD_NUM
-                  CI_BUILD_URL: $CIRCLE_BUILD_URL
-                  CI_BRANCH: $CIRCLE_BRANCH
-                  CI_JOB_ID: $CIRCLE_NODE_INDEX
-                  COVERALLS_PARALLEL: true
+      #- when:
+          #condition:
+            #equal: ["3.10.5", << parameters.python-version >> ]
+          #steps:
+            #- run:
+                #name: Upload Coverage
+                #command: |
+                  #. env/bin/activate && coveralls
+                #env:
+                  #CI_NAME: circleci
+                  #CI_BUILD_NUMBER: $CIRCLE_BUILD_NUM
+                  #CI_BUILD_URL: $CIRCLE_BUILD_URL
+                  #CI_BRANCH: $CIRCLE_BRANCH
+                  #CI_JOB_ID: $CIRCLE_NODE_INDEX
+                  #COVERALLS_PARALLEL: true
 
   unit-tests-all-python-versions:
     docker:
@@ -120,6 +120,6 @@ workflows:
       - unit-tests-all-python-versions:
           requires:
             - build-and-test
-      - coveralls:
-          requires:
-            - build-and-test
+      #- coveralls:
+          #requires:
+            #- build-and-test
diff --git a/README.md b/README.md
@@ -230,19 +230,19 @@ The template server follows a similar structure as the template miner.
 
 ```bash
 $ cd bittensor
-$ python3 ./bittensor/_neuron/text/template_server/main.py --wallet.name <WALLET NAME> --wallet.hotkey <HOTKEY NAME>
+$ python3 ./bittensor/_neuron/text/core_server/main.py --wallet.name <WALLET NAME> --wallet.hotkey <HOTKEY NAME>
 ```
 or 
 ```python3
 >> import bittensor
->> bittensor.neurons.text.template_server.neuron().run()
+>> bittensor.neurons.text.core_server.neuron().run()
 ```
 
 For the full list of settings, please run
 
 ```bash
 $ cd bittensor
-$ python3 ./bittensor/_neuron/text/template_server/main.py --help
+$ python3 ./bittensor/_neuron/text/core_server/main.py --help
 ```
 
 

diff --git a/benchmarks/advanced_server.py b/benchmarks/advanced_server.py
diff --git a/benchmarks/template_server.py → benchmarks/core_server.py b/benchmarks/template_server.py → benchmarks/core_server.py
@@ -18,7 +18,7 @@
 """ Benchmarking pytest fixture.
 
 Example:
-    $ python3  benchmarks/template_server.py --neuron.model_name albert-base-v1
+    $ python3  benchmarks/core_server.py --neuron.model_name albert-base-v1
 
 """
 from benchmarks import QueryBenchmark
@@ -33,7 +33,7 @@ class Benchmark ( QueryBenchmark ):
     def miner_name() -> str:
         r""" Return miner name
         """
-        return 'template_server'
+        return 'core_server'
 
     @staticmethod
     def run_neuron( config , subtensor, metagraph, wallet ):
@@ -42,7 +42,7 @@ def run_neuron( config , subtensor, metagraph, wallet ):
                 config (bittensor.Config)
                     Run config
         """
-        bittensor.neurons.text.template_server.neuron( config,subtensor=subtensor, metagraph=metagraph,wallet=wallet).run()
+        bittensor.neurons.text.core_server.neuron( config,subtensor=subtensor, metagraph=metagraph,wallet=wallet).run()
 
     @staticmethod
     def config() -> 'bittensor.Config':
@@ -51,7 +51,7 @@ def config() -> 'bittensor.Config':
                 config (bittensor.Config)
                     Run config.
         """
-        config = bittensor.neurons.text.template_server.neuron.config()
+        config = bittensor.neurons.text.core_server.neuron.config()
         return config
 
 

diff --git a/bittensor/__init__.py b/bittensor/__init__.py
@@ -18,7 +18,7 @@
 from rich.console import Console
 
 # Bittensor code and protocol version.
-__version__ = '2.0.4'
+__version__ = '3.0.0'
 version_split = __version__.split(".")
 __version_as_int__ = (100 * int(version_split[0])) + (10 * int(version_split[1])) + (1 * int(version_split[2]))
 
@@ -34,7 +34,7 @@ def turn_console_off():
 
 # Vocabulary dimension.
 #__vocab_size__ = len( tokenizer ) + len( tokenizer.additional_special_tokens) + 100 # Plus 100 for eventual token size increase.
-__vocab_size__ = 50378
+__vocab_size__ = 50258
 
 # Tensor dimension.
 # NOTE (const): if/when this increases peers must be responsible for trimming or expanding output to this size.
@@ -49,6 +49,9 @@ def turn_console_off():
 # Substrate ss58_format
 __ss58_format__ = 42
 
+# Wallet ss58 address length
+__ss58_address_length__ = 48
+
 __networks__ = [ 'local', 'nobunaga', 'nakamoto']
 
 __datasets__ = ['ArXiv', 'BookCorpus2', 'Books3', 'DMMathematics', 'EnronEmails', 'EuroParl', 'Gutenberg_PG', 'HackerNews', 'NIHExPorter', 'OpenSubtitles', 'PhilPapers', 'UbuntuIRC', 'YoutubeSubtitles']
@@ -102,6 +105,7 @@ def turn_console_off():
 from bittensor._subtensor import subtensor as subtensor
 from bittensor._tokenizer import tokenizer as tokenizer
 from bittensor._serializer import serializer as serializer
+from bittensor._synapse import synapse  as synapse 
 from bittensor._dataset import dataset as dataset
 from bittensor._receptor import receptor_pool as receptor_pool
 from bittensor._wandb import wandb as wandb
@@ -122,7 +126,12 @@ def turn_console_off():
 from bittensor._dataset.dataset_impl import Dataset as Dataset
 from bittensor._receptor.receptor_pool_impl import ReceptorPool as ReceptorPool
 from bittensor._threadpool.priority_thread_pool_impl import PriorityThreadPoolExecutor as PriorityThreadPoolExecutor
-from bittensor._ipfs.ipfs_impl import Ipfs
+from bittensor._ipfs.ipfs_impl import Ipfs as Ipfs
+from bittensor._synapse.synapse_impl import Synapse as Synapse
+from bittensor._synapse.text_causallm_impl import TextCausalLM as TextCausalLM
+from bittensor._synapse.text_causallmnext_impl import TextCausalLMNext as TextCausalLMNext
+from bittensor._synapse.text_lasthiddenstate_impl import TextLastHiddenState as TextLastHiddenState
+from bittensor._synapse.text_seq2seq_impl import TextSeq2Seq as TextSeq2Seq
 
 # DEFAULTS
 defaults = Config()

diff --git a/bittensor/_axon/__init__.py b/bittensor/_axon/__init__.py
@@ -51,10 +51,11 @@ def __new__(
             wallet: 'bittensor.Wallet' = None,
             forward_text: 'Callable' = None,
             backward_text: 'Callable' = None,
-            forward_image: 'Callable' = None,
-            backward_image: 'Callable' = None,
-            forward_tensor: 'Callable' = None,
-            backward_tensor: 'Callable' = None,
+            synapse_last_hidden: 'Callable' = None,
+            synapse_causal_lm: 'Callable' = None,
+            synapse_causal_lm_next: 'Callable' = None,
+            synapse_seq_2_seq: 'Callable' = None,
+            synapse_checks: 'Callable' = None,
             thread_pool: 'futures.ThreadPoolExecutor' = None,
             server: 'grpc._Server' = None,
             port: int = None,
@@ -77,14 +78,16 @@ def __new__(
                     function which is called on forward text requests.
                 backward_text (:obj:`callable`, `optional`):
                     function which is called on backward text requests.
-                forward_image (:obj:`callable`, `optional`):
-                    function which is called on forward image requests.
-                backward_image (:obj:`callable`, `optional`):
-                    function which is called on backward image requests.
-                forward_tensor (:obj:`callable`, `optional`):
-                    function which is called on forward tensor requests.
-                backward_tensor (:obj:`callable`, `optional`):
-                    function which is called on backward tensor requests.
+                synapse_last_hidden (:obj:`callable`, `optional`):
+                    function which is called by the last hidden synapse
+                synapse_causal_lm (:obj:`callable`, `optional`):
+                    function which is called by the causal lm synapse
+                synapse_causal_lm_next (:obj:`callable`, `optional`):
+                    function which is called by the TextCausalLMNext synapse
+                synapse_seq_2_seq (:obj:`callable`, `optional`):
+                    function which is called by the seq2seq synapse   
+                synapse_checks (:obj:`callable`, 'optional'):
+                    function which is called before each synapse to check for stake        
                 thread_pool (:obj:`ThreadPoolExecutor`, `optional`):
                     Threadpool used for processing server queries.
                 server (:obj:`grpc._Server`, `required`):
@@ -139,8 +142,13 @@ def __new__(
                                              ('grpc.keepalive_timeout_ms', 500000)]
                                 )
 
-        forwards = [forward_text, forward_image, forward_tensor]
-        backwards = [backward_text, backward_image, backward_tensor]
+        synapses = {}
+        synapses[bittensor.proto.Synapse.SynapseType.TEXT_LAST_HIDDEN_STATE] = synapse_last_hidden
+        synapses[bittensor.proto.Synapse.SynapseType.TEXT_CAUSAL_LM] = synapse_causal_lm
+        synapses[bittensor.proto.Synapse.SynapseType.TEXT_CAUSAL_LM_NEXT] = synapse_causal_lm_next
+        synapses[bittensor.proto.Synapse.SynapseType.TEXT_SEQ_2_SEQ] = synapse_seq_2_seq
+
+        synapse_check_function = synapse_checks if synapse_checks != None else axon.default_synapse_check
 
         if priority != None:
             priority_threadpool = bittensor.prioritythreadpool(config=config)
@@ -152,8 +160,10 @@ def __new__(
             server = server,
             ip = config.axon.ip,
             port = config.axon.port,
-            forwards = forwards,
-            backwards = backwards,
+            forward = forward_text,
+            backward = backward_text,
+            synapses = synapses,
+            synapse_checks = synapse_check_function,
             priority = priority,
             priority_threadpool = priority_threadpool,
             forward_timeout = config.axon.forward_timeout,
@@ -200,7 +210,7 @@ def add_args( cls, parser: argparse.ArgumentParser, prefix: str = None  ):
             parser.add_argument('--' + prefix_str + 'axon.backward_timeout', type=int,
                 help='Number of seconds to wait for backward axon request', default=2*bittensor.__blocktime__)
             parser.add_argument('--' + prefix_str + 'axon.forward_timeout', type=int,
-                help='Number of seconds to wait for forward axon request', default=bittensor.__blocktime__)
+                help='Number of seconds to wait for forward axon request', default=5*bittensor.__blocktime__)
             parser.add_argument('--' + prefix_str + 'axon.priority.max_workers', type = int,
                 help='''maximum number of threads in thread pool''', default = bittensor.defaults.axon.priority.max_workers)
             parser.add_argument('--' + prefix_str + 'axon.priority.maxsize', type=int, 
@@ -217,13 +227,13 @@ def add_args( cls, parser: argparse.ArgumentParser, prefix: str = None  ):
     def add_defaults(cls, defaults):
         """ Adds parser defaults to object from enviroment variables.
         """
-        defaults.axon = bittensor.Config()
+        defaults.axon = bittensor.config()
         defaults.axon.port = os.getenv('BT_AXON_PORT') if os.getenv('BT_AXON_PORT') != None else 8091
         defaults.axon.ip = os.getenv('BT_AXON_IP') if os.getenv('BT_AXON_IP') != None else '[::]'
         defaults.axon.max_workers = os.getenv('BT_AXON_MAX_WORERS') if os.getenv('BT_AXON_MAX_WORERS') != None else 10
         defaults.axon.maximum_concurrent_rpcs = os.getenv('BT_AXON_MAXIMUM_CONCURRENT_RPCS') if os.getenv('BT_AXON_MAXIMUM_CONCURRENT_RPCS') != None else 400
 
-        defaults.axon.priority = bittensor.Config()
+        defaults.axon.priority = bittensor.config()
         defaults.axon.priority.max_workers = os.getenv('BT_AXON_PRIORITY_MAX_WORKERS') if os.getenv('BT_AXON_PRIORITY_MAX_WORKERS') != None else 10
         defaults.axon.priority.maxsize = os.getenv('BT_AXON_PRIORITY_MAXSIZE') if os.getenv('BT_AXON_PRIORITY_MAXSIZE') != None else -1
 
@@ -236,56 +246,42 @@ def check_config(cls, config: 'bittensor.Config' ):
         assert config.axon.port > 1024 and config.axon.port < 65535, 'port must be in range [1024, 65535]'
         bittensor.wallet.check_config( config )
 
+    @classmethod   
+    def default_synapse_check(cls, synapse, hotkey ):
+        """ default synapse check function
+        """
+        if len(hotkey) == bittensor.__ss58_address_length__:
+            return True
+
+        return False
+
     @staticmethod
-    def check_backward_callback( backward_callback:Callable, modality:int, pubkey:str = '_' ):
+    def check_backward_callback( backward_callback:Callable, pubkey:str = '_' ):
         """ Check and test axon backward callback function
         """
         if not inspect.ismethod(backward_callback) and not inspect.isfunction(backward_callback):
             raise ValueError('The axon backward callback must be a function with signature Callable[inputs_x:torch.FloatTensor, grads_dy:torch.FloatTensor ) -> torch.FloatTensor:, got {}'.format(backward_callback))        
-        if len( inspect.signature(backward_callback).parameters) != 2:
-            raise ValueError('The axon backward callback must have signature Callable[ inputs_x:torch.FloatTensor, grads_dy:torch.FloatTensor ) -> torch.FloatTensor:, got {}'.format(inspect.signature(backward_callback)))
+        if len( inspect.signature(backward_callback).parameters) != 3:
+            raise ValueError('The axon backward callback must have signature Callable[ inputs_x:torch.FloatTensor, grads_dy:torch.FloatTensor, synapses ) -> torch.FloatTensor:, got {}'.format(inspect.signature(backward_callback)))
         if 'inputs_x' not in inspect.signature(backward_callback).parameters:
             raise ValueError('The axon backward callback must have signature Callable[inputs_x:torch.FloatTensor, grads_dy:torch.FloatTensor ) -> torch.FloatTensor:, got {}'.format(inspect.signature(backward_callback)))
         if 'grads_dy' not in inspect.signature(backward_callback).parameters:
             raise ValueError('The axon backward callback must have signature Callable[inputs_x:torch.FloatTensor, grads_dy:torch.FloatTensor ) -> torch.FloatTensor:, got {}'.format(inspect.signature(backward_callback)))
 
-        if modality == bittensor.proto.Modality.TEXT:
-            sample_input = torch.randint(0,1,(3, 3))
-            grads_raw = torch.rand(3, 3, bittensor.__network_dim__)
-            backward_callback(sample_input,grads_raw)
-
-        if modality == bittensor.proto.Modality.IMAGE:
-            sample_input = torch.rand(1,1,3,512,512)
-            grads_raw = torch.rand(512, 512, bittensor.__network_dim__)
-            backward_callback(sample_input,grads_raw)
-
-        if modality == bittensor.proto.Modality.TENSOR:
-            sample_input = torch.rand(1,1,1)
-            grads_raw = torch.rand(1, 1, bittensor.__network_dim__)
-            backward_callback(sample_input,grads_raw)
 
     @staticmethod
-    def check_forward_callback( forward_callback:Callable, modality:int, pubkey:str = '_'):
+    def check_forward_callback( forward_callback:Callable, synapses:list = []):
         """ Check and test axon forward callback function
         """
         if not inspect.ismethod(forward_callback) and not inspect.isfunction(forward_callback):
             raise ValueError('The axon forward callback must be a function with signature Callable[inputs_x: torch.Tensor] -> torch.FloatTensor:, got {}'.format(forward_callback))   
-        if len( inspect.signature(forward_callback).parameters) != 1:
-            raise ValueError('The axon forward callback must have signature Callable[ inputs_x: torch.Tensor] -> torch.FloatTensor:, got {}'.format(inspect.signature(forward_callback)))
+        if len( inspect.signature(forward_callback).parameters) != 3:
+            raise ValueError('The axon forward callback must have signature Callable[ inputs_x: torch.Tensor, synapses, hotkey] -> torch.FloatTensor:, got {}'.format(inspect.signature(forward_callback)))
         if 'inputs_x' not in inspect.signature(forward_callback).parameters:
             raise ValueError('The axon forward callback must have signature Callable[ inputs_x: torch.Tensor] -> torch.FloatTensor:, got {}'.format(inspect.signature(forward_callback)))
 
-        if modality == bittensor.proto.Modality.TEXT:
-            sample_input = torch.randint(0,1,(3, 3))
-            forward_callback(sample_input)
-
-        if modality == bittensor.proto.Modality.IMAGE:
-            sample_input = torch.rand(1,1,3,512,512)
-            forward_callback(sample_input)
-
-        if modality == bittensor.proto.Modality.TENSOR:
-            sample_input = torch.rand(1,1,1)
-            forward_callback(sample_input)
+        sample_input = torch.randint(0,1,(3, 3))
+        forward_callback([sample_input], synapses, hotkey='')
 
 class AuthInterceptor(grpc.ServerInterceptor):
     """ Creates a new server interceptor that authenticates incoming messages from passed arguments.