Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator exit #980

Merged
merged 6 commits into from
Nov 7, 2022
Merged

Validator exit #980

merged 6 commits into from
Nov 7, 2022

Conversation

unconst
Copy link
Contributor

@unconst unconst commented Nov 7, 2022

Minor fix on validator exit sequence.

@unconst unconst changed the base branch from master to nobunaga November 7, 2022 20:57
Copy link
Contributor

@isabella618033 isabella618033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@unconst unconst merged commit c11ff11 into nobunaga Nov 7, 2022
unconst added a commit that referenced this pull request Nov 7, 2022
* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <jake@bittensor.com>

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <jake@bittensor.com>

Co-authored-by: Thebes <jake@bittensor.com>
Eugene-hu added a commit that referenced this pull request Nov 9, 2022
* [feature] external axon flags (#887)

* add external axon changes

* add defaults for new axon flags

* fix args to axon

* default to internal ip and port if not specified

* add new args and todefaults

* add axon unit tests

* add description for subtensor integration test

* move test to unit test

* create new test file
add/update copyright notices

* don't default to internal ip

* add tests for setting the full_address

* add tests for subtensor.serve w/external axon info

* allow external port config to be None

* switch to mock instead of patch

* fix test mocks

* change mock config create

* fix/add default config

* change asserts add mesage

* fix check call args

* fix mock config set

* only call once

* fix help wording

* should be True

* [fix] fixes unstake with max-stake flag (#905)

* add equality to None to the balance class

* add tests for the None case

* local train bug fix (#906)

* [feature] [CUDA solver] Add multi-GPU and ask for CUDA during btcli run (#893)

* added cuda solver

* boost versions to fix pip error

* allow choosing device id

* fix solution check to use keccak

* adds params for cuda and dev_id to register

* list devices by name during selection

* add block number logging

* fix calculation of hashrate

* fix update interval default

* add --TPB arg to register

* add update_interval flag

* switch back to old looping/work structure

* change typing

* device count is a function

* stop early if wallet registered

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* change import to cubit

* fix import and default

* up default
should have default in CLI call

* add comments about params

* fix config var access

* add cubit as extra

* handle stale pow differently
check registration after failure

* restrict number of processes for integration test

* fix stale check

* use wallet.is_registered instead

* attempt to fix test issue

* fix my test

* oops typo

* typo again ugh

* remove print out

* fix partly reg test

* fix if solution None

* fix test?

* fix patch

* add args for cuda to subtensor

* add cuda args to reregister call

* add to wallet register the cuda args

* fix refs and tests

* add for val test also

* fix tests with rereg

* fix patch for tests

* add mock_register to subtensor passed instead

* move register under the check for isregistered

* use patch obj instead

* fit patch object

* fix prompt

* remove unneeded if

* modify POW submit to use rolling submit again

* add backoff to block get from network

* add test for backoff get block

* suppress the dev id flag if not set

* remove dest so it uses first arg

* fix pow submit loop

* move registration status with

* fix max attempts check

* remove status in subtensor.register

* add submit status

* change to neuron get instead

* fix count

* try to patch live display

* fix patch

* .

* separate test cases

* add POWNotStale and tests

* add more test cases for block get with retry

* fix return to None

* fix arg order

* fix indent

* add test to verify solution is submitted

* fix mock call

* patch hex bytes instead

* typo :/

* fix print out for unstake

* fix indexing into mock call

* call indexing

* access dict not with dot

* fix other indent

* add CUDAException for cubit

* up cubit version

* [Feature] ask cuda during btcli run (#890)

* add ask for cuda reg config in btcli run

* suppress unset arg

* [Feature] [cuda solver] multi gpu (#891)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* Feature/cuda solver multi gpu (#892)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* add All gpus specification

* continue trying reg after Stale

* catch for OSX

* dont use qsize

* add test for continue after being stale

* patch get_nowait instead of qsize

* [Docs] Update old docs link to new link. Change discord invite to custom link (#915)

* Update old docs link to new one

This change deletes the old gitbooks documentation link and replaces it with the new one.

* fix discord links

Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>

* Fix for test_neuron.py (#917)

prevents downloading from huggingface

* [feature] add --seed option to regen_hotkey (#916)

* add seed option to regen hotkey

* make seed optional and fix docstring

* add tests for both coldkey and hotkey regen w/seed

* oops, make seed optional

* fix old test, add config.seed

* circle ci version update and fix (#920)

* Add test_phrases_split unit test

Asserts that randomly instantiated compact_topk encodings can be correctly decoded to recover the original topk_tensor.

* Update unravel_topk_token_phrases with faster implementation

Replaces .tensor_split() with block indexing to avoid extra copy operations.

* Rename test_phrases_split to test_random_topk_token_phrases

* Unit tests cleanup (#922)

* circle ci version update and fix

* Test clean up

* uncomment test and remove specific test

* remove loguru and fix flaky tests

* fix syncing

* removing tokenizer equivalence + some bug fixes

* moving old dataset test

* Deactivate test_random_topk_token_phrases unit test

* Create topk_tensor on origin device

* Normalization Update (#909)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* Adding development workflow documentation and script for bumping the version (#918)

BIT-582 Adding development workflow documentation and script for bumping the version

* Revert "Normalization Update (#909)"

This reverts commit 3990a28.

* Parachain registration (#912)

* removed ws assumption

* removing check

* never registered

* Fixed sched_getaffinity for mac osx

* Started adding parachain support

* [hot-fix] fix indent again. add test (#907)

fix indent again. add test

* Fixed registration check and first time registration

* Removed old entrypoint list structure

* Fixed unit tests

Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>

* Bit 583 memory optimization v4 (#929)

* set allowed receptor to be 0 in validator to not store any receptor

* max_active receptro to 0

* fix

* feature/BIT-579/Adding Prometheus (#928)

* BIT-582 Adding development workflow documentation and script for bumping the version

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* Revert "BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory"

This reverts commit 8742d7f.

* BIT-579 Adding _prometheus to bittensor

* BIT-579 Adding prometheus code to bittensor/_neuron/text/core_*

* BIT-579 Adding prometheus code to bittensor/_config/config_impl.py. Sends the config to the inprocess prometheus server if it exists.

* BIT-579 Adding prometheus code to bittensor/_axon/*

* BIT-579 Adding prometheus code to bittensor/_dendrite/*

* BIT-579 Fixing syntax error

* BIT-579 Fixing missing import: time

* BIT-579 fixing typo

* BIT-579 fixing test: unit_tests/bittensor_tests/test_neuron.py

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* Dendrite Text Generate (#941)

* adds generate to dendrite

* vune fixes

* extend readme

Co-authored-by: unconst <jake@bittensor.com>

* Subtensor and Normalization updates (#936)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* Prometheus bug fix (#942)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* small bug fix

* [Fix] only reregister if flag is set (#937)

* add test for expected reregister behaviour

* add fix

* pass passed args into earlier parse

* fix test by using args

* exit before actual register

* use strtobool

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [BIT 584] [feature] btcli register output stats not in place (#923)

* add flags for output_in_place during registration

* stop tracking best

* refactor registration logging output

* fix reregister from type bool

* change in_place and use_cuda to strtobool

* add param and defaults

* fix reference before assignment

* add new logger to cuda rege

* pass param to btcli register call

* oops

* fix init

* try slight timeout

* try fix

* oop

* ?

* fix use_cuda flag

* add test for new use_cuda flag setup

* use create pow to patch

* all no prompt dev id

* fix console.error

* use lower for str comparison

* call self register instead

* add test for wallet register call

* tests are for wallet reregister

* fix typo

* no self on top-level test

* fix tests?

* use reregister

* typo in test

* fix assert

* fix assert

* should be False

* fix time output to use timedelta

* add log verbose as option to reg output

* should be action

* fix typo

* add missing function arg

* fix spacing

* fix flags

* fix flags

* fix test

* should pass in args to config pre-parse

* use None instead of NA

Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [Fix] multi cuda fix (#940)

* adjust none end calculation

* attempt to fix stop issue

* modify stop

* update nonce_start by correct amount

* fix nonce init to only random and update

* fix update amount

* add start values

* add test

* try different hashrate calc

* try EWMA for hash_rate

* oops bad import

* change name to worker

* extract helper and modify comment

* fix time now

* catch Full

* use a finished queue instead of times

* move constants to function params

* fix name of n

* fix verbose log

* allow --output_in_place

* fix n

* change to --no_ouput_in_place

* fix test

* Fix/pin wandb (#945)

pin below 0.13.4

* [Fix] change bellagene entrypoint string (#938)

dont add special case for network endpoint

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update dockerfile to current on dockerhub (#934)

* update dockerfile to current on dockerhub

* add netcat

* move nvm install up to take advantage of caching

* use pip

* add nvm install checksum

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Minor fixes (#955)

minor fixes

Co-authored-by: unconst <jake@bittensor.com>

* Remove locals from cli and bittensor common (#947)

remove locals from cli and bittensor common

Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* [feature] Improve dataloader performance (#950)

* use threadpool and futures for dataloader

* add cli arg for max directories

Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* No set weights (#959)

* add no set weights

* add no_set_weights

* fix logging

* comments fix;

Co-authored-by: unconst <jake@bittensor.com>

* Bit 590 backward fix (#957)

* init

* no local forward and remote forward overlap

* clean up

* saving remote

* fix local size mismatch

* clean up

* fix

* hidden state and causalLM deterministicness

* rm backward

* default to have dendrite backward

* [Fix] add perpet hash rate and adjust alpha (#960)

* perpet hash rate and adjust alpha

* move reg code to registrationpy

* try different calc

* fix div by 0

* fix for cpu too

* fix race

* modify reg metrics output

* fix test mock

* oops

* [Fix] stake conversion issue (#958)

* modify balance arithm to cast to float first

* fix tests to model this behavior

* fix prompt spacing

* should be value error

* add test for eq balance other

* add comment to explain change

* fix tests

* .

* fix class

* balance fix

* try fix to staking

* fix comments

* add test for fix

* fix test

* fix impl

* add tests with bad types

* catch Typerror too and notimplerror

* catch typeerror

* .

* catch valueerror also

* initial commit

* fix manager server no return

* Dasyncio (#967)

* initial commit

* fix manager server no return

Co-authored-by: unconst <jake@bittensor.com>

* Update __init__.py

* Moving to release

* Release 3.4.2 (#969)

* initial commit

* fix manager server no return

* Moving to release

Co-authored-by: unconst <jake@bittensor.com>

* fix failing test_forward_priority_2nd_request_timeout

* remove test_receptor test

* fix tests

* Decrease validator moving average window

Decrease validator moving average window from 20 (alpha=0.05) to 10 (alpha=0.1) steps. This parameter could probably eventually be set to alpha=0.2.

The current 20-step window means that a server model change will take 20 steps * ~250 blocks/epoch * 12 sec = approx. 17 hours to reach full score in the validator neuron stats, because of the moving average slowly weighing in new model performance. 17 hours is probably too long, and it is also likely affecting registration immunity.

* Release 3.4.2 (#972)

* remove test_receptor test

* fix tests

Co-authored-by: unconst <jake@bittensor.com>

* No version checking (#974)

* no version checking

* fix integration tests

* remove print

Co-authored-by: Thebes <jake@bittensor.com>

* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <jake@bittensor.com>

* Update bittensor/VERSION

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <jake@bittensor.com>

* Promo suffix (#977) (#981)

* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <jake@bittensor.com>

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <jake@bittensor.com>

Co-authored-by: Thebes <jake@bittensor.com>

Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>
Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Eugene-hu <85906264+Eugene-hu@users.noreply.github.com>
Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>
Co-authored-by: opentaco <opentaco@protonmail.com>
Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com>
Co-authored-by: Eduardo García <garciaruiz.edu+github@gmail.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>
Co-authored-by: joeylegere <joeylegere@gmail.com>
Co-authored-by: Joey Legere <joey@opentensor.ai>
Eugene-hu added a commit that referenced this pull request Nov 15, 2022
* [feature] external axon flags (#887)

* add external axon changes

* add defaults for new axon flags

* fix args to axon

* default to internal ip and port if not specified

* add new args and todefaults

* add axon unit tests

* add description for subtensor integration test

* move test to unit test

* create new test file
add/update copyright notices

* don't default to internal ip

* add tests for setting the full_address

* add tests for subtensor.serve w/external axon info

* allow external port config to be None

* switch to mock instead of patch

* fix test mocks

* change mock config create

* fix/add default config

* change asserts add mesage

* fix check call args

* fix mock config set

* only call once

* fix help wording

* should be True

* [fix] fixes unstake with max-stake flag (#905)

* add equality to None to the balance class

* add tests for the None case

* local train bug fix (#906)

* [feature] [CUDA solver] Add multi-GPU and ask for CUDA during btcli run (#893)

* added cuda solver

* boost versions to fix pip error

* allow choosing device id

* fix solution check to use keccak

* adds params for cuda and dev_id to register

* list devices by name during selection

* add block number logging

* fix calculation of hashrate

* fix update interval default

* add --TPB arg to register

* add update_interval flag

* switch back to old looping/work structure

* change typing

* device count is a function

* stop early if wallet registered

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* change import to cubit

* fix import and default

* up default
should have default in CLI call

* add comments about params

* fix config var access

* add cubit as extra

* handle stale pow differently
check registration after failure

* restrict number of processes for integration test

* fix stale check

* use wallet.is_registered instead

* attempt to fix test issue

* fix my test

* oops typo

* typo again ugh

* remove print out

* fix partly reg test

* fix if solution None

* fix test?

* fix patch

* add args for cuda to subtensor

* add cuda args to reregister call

* add to wallet register the cuda args

* fix refs and tests

* add for val test also

* fix tests with rereg

* fix patch for tests

* add mock_register to subtensor passed instead

* move register under the check for isregistered

* use patch obj instead

* fit patch object

* fix prompt

* remove unneeded if

* modify POW submit to use rolling submit again

* add backoff to block get from network

* add test for backoff get block

* suppress the dev id flag if not set

* remove dest so it uses first arg

* fix pow submit loop

* move registration status with

* fix max attempts check

* remove status in subtensor.register

* add submit status

* change to neuron get instead

* fix count

* try to patch live display

* fix patch

* .

* separate test cases

* add POWNotStale and tests

* add more test cases for block get with retry

* fix return to None

* fix arg order

* fix indent

* add test to verify solution is submitted

* fix mock call

* patch hex bytes instead

* typo :/

* fix print out for unstake

* fix indexing into mock call

* call indexing

* access dict not with dot

* fix other indent

* add CUDAException for cubit

* up cubit version

* [Feature] ask cuda during btcli run (#890)

* add ask for cuda reg config in btcli run

* suppress unset arg

* [Feature] [cuda solver] multi gpu (#891)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* Feature/cuda solver multi gpu (#892)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* add All gpus specification

* continue trying reg after Stale

* catch for OSX

* dont use qsize

* add test for continue after being stale

* patch get_nowait instead of qsize

* [Docs] Update old docs link to new link. Change discord invite to custom link (#915)

* Update old docs link to new one

This change deletes the old gitbooks documentation link and replaces it with the new one.

* fix discord links

Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>

* Fix for test_neuron.py (#917)

prevents downloading from huggingface

* [feature] add --seed option to regen_hotkey (#916)

* add seed option to regen hotkey

* make seed optional and fix docstring

* add tests for both coldkey and hotkey regen w/seed

* oops, make seed optional

* fix old test, add config.seed

* circle ci version update and fix (#920)

* Add test_phrases_split unit test

Asserts that randomly instantiated compact_topk encodings can be correctly decoded to recover the original topk_tensor.

* Update unravel_topk_token_phrases with faster implementation

Replaces .tensor_split() with block indexing to avoid extra copy operations.

* Rename test_phrases_split to test_random_topk_token_phrases

* Unit tests cleanup (#922)

* circle ci version update and fix

* Test clean up

* uncomment test and remove specific test

* remove loguru and fix flaky tests

* fix syncing

* removing tokenizer equivalence + some bug fixes

* moving old dataset test

* Deactivate test_random_topk_token_phrases unit test

* Create topk_tensor on origin device

* Normalization Update (#909)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* Adding development workflow documentation and script for bumping the version (#918)

BIT-582 Adding development workflow documentation and script for bumping the version

* Revert "Normalization Update (#909)"

This reverts commit 3990a28.

* Parachain registration (#912)

* removed ws assumption

* removing check

* never registered

* Fixed sched_getaffinity for mac osx

* Started adding parachain support

* [hot-fix] fix indent again. add test (#907)

fix indent again. add test

* Fixed registration check and first time registration

* Removed old entrypoint list structure

* Fixed unit tests

Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>

* Bit 583 memory optimization v4 (#929)

* set allowed receptor to be 0 in validator to not store any receptor

* max_active receptro to 0

* fix

* feature/BIT-579/Adding Prometheus (#928)

* BIT-582 Adding development workflow documentation and script for bumping the version

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* Revert "BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory"

This reverts commit 8742d7f.

* BIT-579 Adding _prometheus to bittensor

* BIT-579 Adding prometheus code to bittensor/_neuron/text/core_*

* BIT-579 Adding prometheus code to bittensor/_config/config_impl.py. Sends the config to the inprocess prometheus server if it exists.

* BIT-579 Adding prometheus code to bittensor/_axon/*

* BIT-579 Adding prometheus code to bittensor/_dendrite/*

* BIT-579 Fixing syntax error

* BIT-579 Fixing missing import: time

* BIT-579 fixing typo

* BIT-579 fixing test: unit_tests/bittensor_tests/test_neuron.py

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* Dendrite Text Generate (#941)

* adds generate to dendrite

* vune fixes

* extend readme

Co-authored-by: unconst <jake@bittensor.com>

* Subtensor and Normalization updates (#936)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* Prometheus bug fix (#942)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* small bug fix

* [Fix] only reregister if flag is set (#937)

* add test for expected reregister behaviour

* add fix

* pass passed args into earlier parse

* fix test by using args

* exit before actual register

* use strtobool

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [BIT 584] [feature] btcli register output stats not in place (#923)

* add flags for output_in_place during registration

* stop tracking best

* refactor registration logging output

* fix reregister from type bool

* change in_place and use_cuda to strtobool

* add param and defaults

* fix reference before assignment

* add new logger to cuda rege

* pass param to btcli register call

* oops

* fix init

* try slight timeout

* try fix

* oop

* ?

* fix use_cuda flag

* add test for new use_cuda flag setup

* use create pow to patch

* all no prompt dev id

* fix console.error

* use lower for str comparison

* call self register instead

* add test for wallet register call

* tests are for wallet reregister

* fix typo

* no self on top-level test

* fix tests?

* use reregister

* typo in test

* fix assert

* fix assert

* should be False

* fix time output to use timedelta

* add log verbose as option to reg output

* should be action

* fix typo

* add missing function arg

* fix spacing

* fix flags

* fix flags

* fix test

* should pass in args to config pre-parse

* use None instead of NA

Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [Fix] multi cuda fix (#940)

* adjust none end calculation

* attempt to fix stop issue

* modify stop

* update nonce_start by correct amount

* fix nonce init to only random and update

* fix update amount

* add start values

* add test

* try different hashrate calc

* try EWMA for hash_rate

* oops bad import

* change name to worker

* extract helper and modify comment

* fix time now

* catch Full

* use a finished queue instead of times

* move constants to function params

* fix name of n

* fix verbose log

* allow --output_in_place

* fix n

* change to --no_ouput_in_place

* fix test

* Fix/pin wandb (#945)

pin below 0.13.4

* [Fix] change bellagene entrypoint string (#938)

dont add special case for network endpoint

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update dockerfile to current on dockerhub (#934)

* update dockerfile to current on dockerhub

* add netcat

* move nvm install up to take advantage of caching

* use pip

* add nvm install checksum

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Minor fixes (#955)

minor fixes

Co-authored-by: unconst <jake@bittensor.com>

* Remove locals from cli and bittensor common (#947)

remove locals from cli and bittensor common

Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* [feature] Improve dataloader performance (#950)

* use threadpool and futures for dataloader

* add cli arg for max directories

Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* No set weights (#959)

* add no set weights

* add no_set_weights

* fix logging

* comments fix;

Co-authored-by: unconst <jake@bittensor.com>

* Bit 590 backward fix (#957)

* init

* no local forward and remote forward overlap

* clean up

* saving remote

* fix local size mismatch

* clean up

* fix

* hidden state and causalLM deterministicness

* rm backward

* default to have dendrite backward

* [Fix] add perpet hash rate and adjust alpha (#960)

* perpet hash rate and adjust alpha

* move reg code to registrationpy

* try different calc

* fix div by 0

* fix for cpu too

* fix race

* modify reg metrics output

* fix test mock

* oops

* [Fix] stake conversion issue (#958)

* modify balance arithm to cast to float first

* fix tests to model this behavior

* fix prompt spacing

* should be value error

* add test for eq balance other

* add comment to explain change

* fix tests

* .

* fix class

* balance fix

* try fix to staking

* fix comments

* add test for fix

* fix test

* fix impl

* add tests with bad types

* catch Typerror too and notimplerror

* catch typeerror

* .

* catch valueerror also

* initial commit

* fix manager server no return

* Dasyncio (#967)

* initial commit

* fix manager server no return

Co-authored-by: unconst <jake@bittensor.com>

* Update __init__.py

* Moving to release

* Release 3.4.2 (#969)

* initial commit

* fix manager server no return

* Moving to release

Co-authored-by: unconst <jake@bittensor.com>

* fix failing test_forward_priority_2nd_request_timeout

* Decrease validator moving average window

Decrease validator moving average window from 20 (alpha=0.05) to 10 (alpha=0.1) steps. This parameter could probably eventually be set to alpha=0.2.

The current 20-step window means that a server model change will take 20 steps * ~250 blocks/epoch * 12 sec = approx. 17 hours to reach full score in the validator neuron stats, because of the moving average slowly weighing in new model performance. 17 hours is probably too long, and it is also likely affecting registration immunity.

* Release 3.4.2 (#972)

* remove test_receptor test

* fix tests

Co-authored-by: unconst <jake@bittensor.com>

* No version checking (#974)

* no version checking

* fix integration tests

* remove print

Co-authored-by: Thebes <jake@bittensor.com>

* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <jake@bittensor.com>

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <jake@bittensor.com>

* Support arbitrary gRPC request metadata order (#976)

* Format AuthInterceptor using black

* Parse request metadata as key value pairs

* Use request method to black list calls

* Fix request type provided on backward

* Add type hints

* Refactor signature parsing

* [Fix] Dockerfile: clone the repo to install instead (#984)

* clone the repo to install instead

* no cd

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update bittensor version to 3.4.3

(cherry picked from commit 43110cf)

* Catch precision errors in synapse forward responses

Response serialization/deserialization introduces precision errors that may cause probability sums to exceed permissible boundaries. Now checks to see if precision errors are within established absolute tolerance (atol = 1e-6 currently).

(cherry picked from commit d96b625)

* Comment update for tensor size

(cherry picked from commit 6dd06f9)

* fix for changelog and version

* 3.4.2 changelog

* Revert "Merge branch 'nobunaga' into version_changelog_fix"

This reverts commit 42f1a3c, reversing
changes made to a32a035.

Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>
Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>
Co-authored-by: opentaco <opentaco@protonmail.com>
Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com>
Co-authored-by: Eduardo García <garciaruiz.edu+github@gmail.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>
Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>
Co-authored-by: joeylegere <joeylegere@gmail.com>
Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Adrian-Stefan Mares <36161392+adriansmares@users.noreply.github.com>
eduardogr added a commit that referenced this pull request Nov 24, 2022
* [feature] external axon flags (#887)

* add external axon changes

* add defaults for new axon flags

* fix args to axon

* default to internal ip and port if not specified

* add new args and todefaults

* add axon unit tests

* add description for subtensor integration test

* move test to unit test

* create new test file
add/update copyright notices

* don't default to internal ip

* add tests for setting the full_address

* add tests for subtensor.serve w/external axon info

* allow external port config to be None

* switch to mock instead of patch

* fix test mocks

* change mock config create

* fix/add default config

* change asserts add mesage

* fix check call args

* fix mock config set

* only call once

* fix help wording

* should be True

* [fix] fixes unstake with max-stake flag (#905)

* add equality to None to the balance class

* add tests for the None case

* local train bug fix (#906)

* [feature] [CUDA solver] Add multi-GPU and ask for CUDA during btcli run (#893)

* added cuda solver

* boost versions to fix pip error

* allow choosing device id

* fix solution check to use keccak

* adds params for cuda and dev_id to register

* list devices by name during selection

* add block number logging

* fix calculation of hashrate

* fix update interval default

* add --TPB arg to register

* add update_interval flag

* switch back to old looping/work structure

* change typing

* device count is a function

* stop early if wallet registered

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* change import to cubit

* fix import and default

* up default
should have default in CLI call

* add comments about params

* fix config var access

* add cubit as extra

* handle stale pow differently
check registration after failure

* restrict number of processes for integration test

* fix stale check

* use wallet.is_registered instead

* attempt to fix test issue

* fix my test

* oops typo

* typo again ugh

* remove print out

* fix partly reg test

* fix if solution None

* fix test?

* fix patch

* add args for cuda to subtensor

* add cuda args to reregister call

* add to wallet register the cuda args

* fix refs and tests

* add for val test also

* fix tests with rereg

* fix patch for tests

* add mock_register to subtensor passed instead

* move register under the check for isregistered

* use patch obj instead

* fit patch object

* fix prompt

* remove unneeded if

* modify POW submit to use rolling submit again

* add backoff to block get from network

* add test for backoff get block

* suppress the dev id flag if not set

* remove dest so it uses first arg

* fix pow submit loop

* move registration status with

* fix max attempts check

* remove status in subtensor.register

* add submit status

* change to neuron get instead

* fix count

* try to patch live display

* fix patch

* .

* separate test cases

* add POWNotStale and tests

* add more test cases for block get with retry

* fix return to None

* fix arg order

* fix indent

* add test to verify solution is submitted

* fix mock call

* patch hex bytes instead

* typo :/

* fix print out for unstake

* fix indexing into mock call

* call indexing

* access dict not with dot

* fix other indent

* add CUDAException for cubit

* up cubit version

* [Feature] ask cuda during btcli run (#890)

* add ask for cuda reg config in btcli run

* suppress unset arg

* [Feature] [cuda solver] multi gpu (#891)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* Feature/cuda solver multi gpu (#892)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* add All gpus specification

* continue trying reg after Stale

* catch for OSX

* dont use qsize

* add test for continue after being stale

* patch get_nowait instead of qsize

* [Docs] Update old docs link to new link. Change discord invite to custom link (#915)

* Update old docs link to new one

This change deletes the old gitbooks documentation link and replaces it with the new one.

* fix discord links

Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>

* Fix for test_neuron.py (#917)

prevents downloading from huggingface

* [feature] add --seed option to regen_hotkey (#916)

* add seed option to regen hotkey

* make seed optional and fix docstring

* add tests for both coldkey and hotkey regen w/seed

* oops, make seed optional

* fix old test, add config.seed

* circle ci version update and fix (#920)

* Add test_phrases_split unit test

Asserts that randomly instantiated compact_topk encodings can be correctly decoded to recover the original topk_tensor.

* Update unravel_topk_token_phrases with faster implementation

Replaces .tensor_split() with block indexing to avoid extra copy operations.

* Rename test_phrases_split to test_random_topk_token_phrases

* Unit tests cleanup (#922)

* circle ci version update and fix

* Test clean up

* uncomment test and remove specific test

* remove loguru and fix flaky tests

* fix syncing

* removing tokenizer equivalence + some bug fixes

* moving old dataset test

* Deactivate test_random_topk_token_phrases unit test

* Create topk_tensor on origin device

* Normalization Update (#909)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* Adding development workflow documentation and script for bumping the version (#918)

BIT-582 Adding development workflow documentation and script for bumping the version

* Revert "Normalization Update (#909)"

This reverts commit 3990a28.

* Parachain registration (#912)

* removed ws assumption

* removing check

* never registered

* Fixed sched_getaffinity for mac osx

* Started adding parachain support

* [hot-fix] fix indent again. add test (#907)

fix indent again. add test

* Fixed registration check and first time registration

* Removed old entrypoint list structure

* Fixed unit tests

Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>

* Bit 583 memory optimization v4 (#929)

* set allowed receptor to be 0 in validator to not store any receptor

* max_active receptro to 0

* fix

* feature/BIT-579/Adding Prometheus (#928)

* BIT-582 Adding development workflow documentation and script for bumping the version

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* Revert "BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory"

This reverts commit 8742d7f.

* BIT-579 Adding _prometheus to bittensor

* BIT-579 Adding prometheus code to bittensor/_neuron/text/core_*

* BIT-579 Adding prometheus code to bittensor/_config/config_impl.py. Sends the config to the inprocess prometheus server if it exists.

* BIT-579 Adding prometheus code to bittensor/_axon/*

* BIT-579 Adding prometheus code to bittensor/_dendrite/*

* BIT-579 Fixing syntax error

* BIT-579 Fixing missing import: time

* BIT-579 fixing typo

* BIT-579 fixing test: unit_tests/bittensor_tests/test_neuron.py

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* Dendrite Text Generate (#941)

* adds generate to dendrite

* vune fixes

* extend readme

Co-authored-by: unconst <jake@bittensor.com>

* Subtensor and Normalization updates (#936)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* Prometheus bug fix (#942)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* small bug fix

* [Fix] only reregister if flag is set (#937)

* add test for expected reregister behaviour

* add fix

* pass passed args into earlier parse

* fix test by using args

* exit before actual register

* use strtobool

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [BIT 584] [feature] btcli register output stats not in place (#923)

* add flags for output_in_place during registration

* stop tracking best

* refactor registration logging output

* fix reregister from type bool

* change in_place and use_cuda to strtobool

* add param and defaults

* fix reference before assignment

* add new logger to cuda rege

* pass param to btcli register call

* oops

* fix init

* try slight timeout

* try fix

* oop

* ?

* fix use_cuda flag

* add test for new use_cuda flag setup

* use create pow to patch

* all no prompt dev id

* fix console.error

* use lower for str comparison

* call self register instead

* add test for wallet register call

* tests are for wallet reregister

* fix typo

* no self on top-level test

* fix tests?

* use reregister

* typo in test

* fix assert

* fix assert

* should be False

* fix time output to use timedelta

* add log verbose as option to reg output

* should be action

* fix typo

* add missing function arg

* fix spacing

* fix flags

* fix flags

* fix test

* should pass in args to config pre-parse

* use None instead of NA

Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [Fix] multi cuda fix (#940)

* adjust none end calculation

* attempt to fix stop issue

* modify stop

* update nonce_start by correct amount

* fix nonce init to only random and update

* fix update amount

* add start values

* add test

* try different hashrate calc

* try EWMA for hash_rate

* oops bad import

* change name to worker

* extract helper and modify comment

* fix time now

* catch Full

* use a finished queue instead of times

* move constants to function params

* fix name of n

* fix verbose log

* allow --output_in_place

* fix n

* change to --no_ouput_in_place

* fix test

* Fix/pin wandb (#945)

pin below 0.13.4

* [Fix] change bellagene entrypoint string (#938)

dont add special case for network endpoint

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update dockerfile to current on dockerhub (#934)

* update dockerfile to current on dockerhub

* add netcat

* move nvm install up to take advantage of caching

* use pip

* add nvm install checksum

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Minor fixes (#955)

minor fixes

Co-authored-by: unconst <jake@bittensor.com>

* Remove locals from cli and bittensor common (#947)

remove locals from cli and bittensor common

Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* [feature] Improve dataloader performance (#950)

* use threadpool and futures for dataloader

* add cli arg for max directories

Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* No set weights (#959)

* add no set weights

* add no_set_weights

* fix logging

* comments fix;

Co-authored-by: unconst <jake@bittensor.com>

* Bit 590 backward fix (#957)

* init

* no local forward and remote forward overlap

* clean up

* saving remote

* fix local size mismatch

* clean up

* fix

* hidden state and causalLM deterministicness

* rm backward

* default to have dendrite backward

* [Fix] add perpet hash rate and adjust alpha (#960)

* perpet hash rate and adjust alpha

* move reg code to registrationpy

* try different calc

* fix div by 0

* fix for cpu too

* fix race

* modify reg metrics output

* fix test mock

* oops

* [Fix] stake conversion issue (#958)

* modify balance arithm to cast to float first

* fix tests to model this behavior

* fix prompt spacing

* should be value error

* add test for eq balance other

* add comment to explain change

* fix tests

* .

* fix class

* balance fix

* try fix to staking

* fix comments

* add test for fix

* fix test

* fix impl

* add tests with bad types

* catch Typerror too and notimplerror

* catch typeerror

* .

* catch valueerror also

* initial commit

* fix manager server no return

* Dasyncio (#967)

* initial commit

* fix manager server no return

Co-authored-by: unconst <jake@bittensor.com>

* Update __init__.py

* Moving to release

* Release 3.4.2 (#969)

* initial commit

* fix manager server no return

* Moving to release

Co-authored-by: unconst <jake@bittensor.com>

* fix failing test_forward_priority_2nd_request_timeout

* Decrease validator moving average window

Decrease validator moving average window from 20 (alpha=0.05) to 10 (alpha=0.1) steps. This parameter could probably eventually be set to alpha=0.2.

The current 20-step window means that a server model change will take 20 steps * ~250 blocks/epoch * 12 sec = approx. 17 hours to reach full score in the validator neuron stats, because of the moving average slowly weighing in new model performance. 17 hours is probably too long, and it is also likely affecting registration immunity.

* Release 3.4.2 (#972)

* remove test_receptor test

* fix tests

Co-authored-by: unconst <jake@bittensor.com>

* No version checking (#974)

* no version checking

* fix integration tests

* remove print

Co-authored-by: Thebes <jake@bittensor.com>

* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <jake@bittensor.com>

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <jake@bittensor.com>

* Support arbitrary gRPC request metadata order (#976)

* Format AuthInterceptor using black

* Parse request metadata as key value pairs

* Use request method to black list calls

* Fix request type provided on backward

* Add type hints

* Refactor signature parsing

* [Fix] Dockerfile: clone the repo to install instead (#984)

* clone the repo to install instead

* no cd

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update bittensor version to 3.4.3

(cherry picked from commit 43110cf)

* Catch precision errors in synapse forward responses

Response serialization/deserialization introduces precision errors that may cause probability sums to exceed permissible boundaries. Now checks to see if precision errors are within established absolute tolerance (atol = 1e-6 currently).

(cherry picked from commit d96b625)

* Comment update for tensor size

(cherry picked from commit 6dd06f9)

* Fix/allow synapse all (#988)

* allow set synapse All using flag

* add test

* use dot get

* Mark registration threads as daemons (#998)

Make solver processes daemons

* Add response table to validator debugging

* Add return_ops to parameters

* Decode context and answer

* Add validation length parameter

* Itemize probabilities

* Add debug prints

* Change table formatting

* Add extra tasks to response table

* Debug add print

* Remove batch_size parameter

* Switch modulo order

* Modify table format

* Modify table format

* Modify table format

* Modify table format

* Try table print to catch rich errors

* Modify table title and caption

* Add shapley_values_nxt column

* Refactor response table functions

* Correct responsive count

* Format table caption

* [BIT-599] Validator weight setting improvements (#1000)

* Remove responsive prioritization from validator weight calculation

Weight setting limitation based on responsive prioritization is no longer needed, so has been removed. Part of the original intent of the limitation was a chain storage concern of setting too many weights, but since the network has gained very high > 90% response rate the concern is moot.

The downside of applying the limitation is that validators with longer step size artificially set fewer weights simply because they could not query the network in a single epoch. This introduced a counterproductive weight setting variability across validators.

Responsiveness is scored in any case via a Shapley EMA zero-push penalty, so setting weights on non-responsive nodes still relay an accurate scoring.

* Move metagraph_sync just before weight setting

The metagraph is one epoch (~250 blocks) outdated by the time weight setting is done. This means that newly registered keys will have weights set based on the stats of the key just previously registered on the same UID.

Weights will now be set more accurately when the metagraph is updated just before weight setting, which will reset stats of a UID that has changed ownership.

* Update neuron stats table caption

* Add metagraph register to validator

Tracks new hotkey registrations on UIDs, and records statistics of previous hotkey.

* Update validator epoch conditions

Epoch overrun beyond blocks_per_epoch no longer needed, since weight setting no longer restricted to epoch responsives.

Normal epoch duration is blocks_per_epoch if all UIDs have been queried try to query each UID at least once - assumes nucleus samples without replacement but keep minimum epoch duration at blocks_per_epoch * 12 sec block_period in case of subtensor outage causing invalid block readings to prevent fast repeated weight setting.

Also logs while conditions per step

* Cast phrase_cross_entropy tokens to int

* Assert self.metagraph.n == len(self.neuron_hotkeys)

* Round before casting to int in phrase_cross_entropy

* Log epoch while condition details

* Update validator weights console message

* Consume validator nucleus UID queue fully

Since epochs are now concluded at blocks_per_epoch many validators will have unqueried UIDs still in queue. It is better that these are retained and consumed in the next epoch, since this will ensure all UIDs are queried over a cycle that may involve successive epochs.

The inefficiency introduced is that if the queue has to be refilled in the same epoch, then a UID may be queried twice, although this is preferred over the risk of not querying a UID at all if the remaining queue is not fully consumed even if another epoch is required.

* No cache: metagraph.sync(cached=False)

Turns off metagraph cache, now requires ~10 minutes (on local) to update metagraph before weight setting, but typically provides more recent values that can catch UID ownership changes.

* Add validator --neuron.metagraph_cached

When flag --neuron.metagraph_cached is set the validator uses metagraph.sync(cached=True).

* Record block before validator neuron register entry

* Force validators without local subtensor to use cached metagraph

* Incorporate __blocktime__ and remove asserts

* Refactor neuron_register to neuron_changes and add flag

* Use cached metagraph.sync()

* Remove unused print_neuron_stats flag

* Strip lead 0. from table float displays

* Increase synergy table display precision

* Increase synergy table display precision

* Format validator query response table

* Improve release scripts and adding some labels to dockerfile #1004  (#1005)

* Pinning requirements versions in requirements/prod.txt

* Improve release scripts. Generating RELEASING.md document for release releasing guidelines. Adding more labels to dockerfile

Removing dockerfile label bittensor.packages

* Removing dockerfile label bittensor.dependencies.versions.cudnn

* Removing file that came from a wrong rebase

* Renaming RELEASING.md to RELEASE_GUIDELINES.md

* Modifying release scripts so versioning is independent of the relese process. Modifying RELEASE_GUIDELINES adding more information

* Modifying RELEASE_GUIDELINES adding more information

* Fixing the versioning script. Modifying RELEASE_GUIDELINES.md

* Version: 3.5.0. Applying ./scripts/release/versioning.sh --update minor -A

Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>
Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Eugene-hu <85906264+Eugene-hu@users.noreply.github.com>
Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>
Co-authored-by: opentaco <opentaco@protonmail.com>
Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>
Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>
Co-authored-by: joeylegere <joeylegere@gmail.com>
Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Adrian-Stefan Mares <36161392+adriansmares@users.noreply.github.com>
Eugene-hu added a commit that referenced this pull request Dec 13, 2022
* release/3.5.0 (#1006)

* [feature] external axon flags (#887)

* add external axon changes

* add defaults for new axon flags

* fix args to axon

* default to internal ip and port if not specified

* add new args and todefaults

* add axon unit tests

* add description for subtensor integration test

* move test to unit test

* create new test file
add/update copyright notices

* don't default to internal ip

* add tests for setting the full_address

* add tests for subtensor.serve w/external axon info

* allow external port config to be None

* switch to mock instead of patch

* fix test mocks

* change mock config create

* fix/add default config

* change asserts add mesage

* fix check call args

* fix mock config set

* only call once

* fix help wording

* should be True

* [fix] fixes unstake with max-stake flag (#905)

* add equality to None to the balance class

* add tests for the None case

* local train bug fix (#906)

* [feature] [CUDA solver] Add multi-GPU and ask for CUDA during btcli run (#893)

* added cuda solver

* boost versions to fix pip error

* allow choosing device id

* fix solution check to use keccak

* adds params for cuda and dev_id to register

* list devices by name during selection

* add block number logging

* fix calculation of hashrate

* fix update interval default

* add --TPB arg to register

* add update_interval flag

* switch back to old looping/work structure

* change typing

* device count is a function

* stop early if wallet registered

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* change import to cubit

* fix import and default

* up default
should have default in CLI call

* add comments about params

* fix config var access

* add cubit as extra

* handle stale pow differently
check registration after failure

* restrict number of processes for integration test

* fix stale check

* use wallet.is_registered instead

* attempt to fix test issue

* fix my test

* oops typo

* typo again ugh

* remove print out

* fix partly reg test

* fix if solution None

* fix test?

* fix patch

* add args for cuda to subtensor

* add cuda args to reregister call

* add to wallet register the cuda args

* fix refs and tests

* add for val test also

* fix tests with rereg

* fix patch for tests

* add mock_register to subtensor passed instead

* move register under the check for isregistered

* use patch obj instead

* fit patch object

* fix prompt

* remove unneeded if

* modify POW submit to use rolling submit again

* add backoff to block get from network

* add test for backoff get block

* suppress the dev id flag if not set

* remove dest so it uses first arg

* fix pow submit loop

* move registration status with

* fix max attempts check

* remove status in subtensor.register

* add submit status

* change to neuron get instead

* fix count

* try to patch live display

* fix patch

* .

* separate test cases

* add POWNotStale and tests

* add more test cases for block get with retry

* fix return to None

* fix arg order

* fix indent

* add test to verify solution is submitted

* fix mock call

* patch hex bytes instead

* typo :/

* fix print out for unstake

* fix indexing into mock call

* call indexing

* access dict not with dot

* fix other indent

* add CUDAException for cubit

* up cubit version

* [Feature] ask cuda during btcli run (#890)

* add ask for cuda reg config in btcli run

* suppress unset arg

* [Feature] [cuda solver] multi gpu (#891)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* Feature/cuda solver multi gpu (#892)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* add All gpus specification

* continue trying reg after Stale

* catch for OSX

* dont use qsize

* add test for continue after being stale

* patch get_nowait instead of qsize

* [Docs] Update old docs link to new link. Change discord invite to custom link (#915)

* Update old docs link to new one

This change deletes the old gitbooks documentation link and replaces it with the new one.

* fix discord links

Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>

* Fix for test_neuron.py (#917)

prevents downloading from huggingface

* [feature] add --seed option to regen_hotkey (#916)

* add seed option to regen hotkey

* make seed optional and fix docstring

* add tests for both coldkey and hotkey regen w/seed

* oops, make seed optional

* fix old test, add config.seed

* circle ci version update and fix (#920)

* Add test_phrases_split unit test

Asserts that randomly instantiated compact_topk encodings can be correctly decoded to recover the original topk_tensor.

* Update unravel_topk_token_phrases with faster implementation

Replaces .tensor_split() with block indexing to avoid extra copy operations.

* Rename test_phrases_split to test_random_topk_token_phrases

* Unit tests cleanup (#922)

* circle ci version update and fix

* Test clean up

* uncomment test and remove specific test

* remove loguru and fix flaky tests

* fix syncing

* removing tokenizer equivalence + some bug fixes

* moving old dataset test

* Deactivate test_random_topk_token_phrases unit test

* Create topk_tensor on origin device

* Normalization Update (#909)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* Adding development workflow documentation and script for bumping the version (#918)

BIT-582 Adding development workflow documentation and script for bumping the version

* Revert "Normalization Update (#909)"

This reverts commit 3990a28.

* Parachain registration (#912)

* removed ws assumption

* removing check

* never registered

* Fixed sched_getaffinity for mac osx

* Started adding parachain support

* [hot-fix] fix indent again. add test (#907)

fix indent again. add test

* Fixed registration check and first time registration

* Removed old entrypoint list structure

* Fixed unit tests

Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>

* Bit 583 memory optimization v4 (#929)

* set allowed receptor to be 0 in validator to not store any receptor

* max_active receptro to 0

* fix

* feature/BIT-579/Adding Prometheus (#928)

* BIT-582 Adding development workflow documentation and script for bumping the version

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* Revert "BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory"

This reverts commit 8742d7f.

* BIT-579 Adding _prometheus to bittensor

* BIT-579 Adding prometheus code to bittensor/_neuron/text/core_*

* BIT-579 Adding prometheus code to bittensor/_config/config_impl.py. Sends the config to the inprocess prometheus server if it exists.

* BIT-579 Adding prometheus code to bittensor/_axon/*

* BIT-579 Adding prometheus code to bittensor/_dendrite/*

* BIT-579 Fixing syntax error

* BIT-579 Fixing missing import: time

* BIT-579 fixing typo

* BIT-579 fixing test: unit_tests/bittensor_tests/test_neuron.py

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* Dendrite Text Generate (#941)

* adds generate to dendrite

* vune fixes

* extend readme

Co-authored-by: unconst <jake@bittensor.com>

* Subtensor and Normalization updates (#936)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* Prometheus bug fix (#942)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* small bug fix

* [Fix] only reregister if flag is set (#937)

* add test for expected reregister behaviour

* add fix

* pass passed args into earlier parse

* fix test by using args

* exit before actual register

* use strtobool

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [BIT 584] [feature] btcli register output stats not in place (#923)

* add flags for output_in_place during registration

* stop tracking best

* refactor registration logging output

* fix reregister from type bool

* change in_place and use_cuda to strtobool

* add param and defaults

* fix reference before assignment

* add new logger to cuda rege

* pass param to btcli register call

* oops

* fix init

* try slight timeout

* try fix

* oop

* ?

* fix use_cuda flag

* add test for new use_cuda flag setup

* use create pow to patch

* all no prompt dev id

* fix console.error

* use lower for str comparison

* call self register instead

* add test for wallet register call

* tests are for wallet reregister

* fix typo

* no self on top-level test

* fix tests?

* use reregister

* typo in test

* fix assert

* fix assert

* should be False

* fix time output to use timedelta

* add log verbose as option to reg output

* should be action

* fix typo

* add missing function arg

* fix spacing

* fix flags

* fix flags

* fix test

* should pass in args to config pre-parse

* use None instead of NA

Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [Fix] multi cuda fix (#940)

* adjust none end calculation

* attempt to fix stop issue

* modify stop

* update nonce_start by correct amount

* fix nonce init to only random and update

* fix update amount

* add start values

* add test

* try different hashrate calc

* try EWMA for hash_rate

* oops bad import

* change name to worker

* extract helper and modify comment

* fix time now

* catch Full

* use a finished queue instead of times

* move constants to function params

* fix name of n

* fix verbose log

* allow --output_in_place

* fix n

* change to --no_ouput_in_place

* fix test

* Fix/pin wandb (#945)

pin below 0.13.4

* [Fix] change bellagene entrypoint string (#938)

dont add special case for network endpoint

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update dockerfile to current on dockerhub (#934)

* update dockerfile to current on dockerhub

* add netcat

* move nvm install up to take advantage of caching

* use pip

* add nvm install checksum

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Minor fixes (#955)

minor fixes

Co-authored-by: unconst <jake@bittensor.com>

* Remove locals from cli and bittensor common (#947)

remove locals from cli and bittensor common

Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* [feature] Improve dataloader performance (#950)

* use threadpool and futures for dataloader

* add cli arg for max directories

Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* No set weights (#959)

* add no set weights

* add no_set_weights

* fix logging

* comments fix;

Co-authored-by: unconst <jake@bittensor.com>

* Bit 590 backward fix (#957)

* init

* no local forward and remote forward overlap

* clean up

* saving remote

* fix local size mismatch

* clean up

* fix

* hidden state and causalLM deterministicness

* rm backward

* default to have dendrite backward

* [Fix] add perpet hash rate and adjust alpha (#960)

* perpet hash rate and adjust alpha

* move reg code to registrationpy

* try different calc

* fix div by 0

* fix for cpu too

* fix race

* modify reg metrics output

* fix test mock

* oops

* [Fix] stake conversion issue (#958)

* modify balance arithm to cast to float first

* fix tests to model this behavior

* fix prompt spacing

* should be value error

* add test for eq balance other

* add comment to explain change

* fix tests

* .

* fix class

* balance fix

* try fix to staking

* fix comments

* add test for fix

* fix test

* fix impl

* add tests with bad types

* catch Typerror too and notimplerror

* catch typeerror

* .

* catch valueerror also

* initial commit

* fix manager server no return

* Dasyncio (#967)

* initial commit

* fix manager server no return

Co-authored-by: unconst <jake@bittensor.com>

* Update __init__.py

* Moving to release

* Release 3.4.2 (#969)

* initial commit

* fix manager server no return

* Moving to release

Co-authored-by: unconst <jake@bittensor.com>

* fix failing test_forward_priority_2nd_request_timeout

* Decrease validator moving average window

Decrease validator moving average window from 20 (alpha=0.05) to 10 (alpha=0.1) steps. This parameter could probably eventually be set to alpha=0.2.

The current 20-step window means that a server model change will take 20 steps * ~250 blocks/epoch * 12 sec = approx. 17 hours to reach full score in the validator neuron stats, because of the moving average slowly weighing in new model performance. 17 hours is probably too long, and it is also likely affecting registration immunity.

* Release 3.4.2 (#972)

* remove test_receptor test

* fix tests

Co-authored-by: unconst <jake@bittensor.com>

* No version checking (#974)

* no version checking

* fix integration tests

* remove print

Co-authored-by: Thebes <jake@bittensor.com>

* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <jake@bittensor.com>

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <jake@bittensor.com>

* Support arbitrary gRPC request metadata order (#976)

* Format AuthInterceptor using black

* Parse request metadata as key value pairs

* Use request method to black list calls

* Fix request type provided on backward

* Add type hints

* Refactor signature parsing

* [Fix] Dockerfile: clone the repo to install instead (#984)

* clone the repo to install instead

* no cd

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update bittensor version to 3.4.3

(cherry picked from commit 43110cf)

* Catch precision errors in synapse forward responses

Response serialization/deserialization introduces precision errors that may cause probability sums to exceed permissible boundaries. Now checks to see if precision errors are within established absolute tolerance (atol = 1e-6 currently).

(cherry picked from commit d96b625)

* Comment update for tensor size

(cherry picked from commit 6dd06f9)

* Fix/allow synapse all (#988)

* allow set synapse All using flag

* add test

* use dot get

* Mark registration threads as daemons (#998)

Make solver processes daemons

* Add response table to validator debugging

* Add return_ops to parameters

* Decode context and answer

* Add validation length parameter

* Itemize probabilities

* Add debug prints

* Change table formatting

* Add extra tasks to response table

* Debug add print

* Remove batch_size parameter

* Switch modulo order

* Modify table format

* Modify table format

* Modify table format

* Modify table format

* Try table print to catch rich errors

* Modify table title and caption

* Add shapley_values_nxt column

* Refactor response table functions

* Correct responsive count

* Format table caption

* [BIT-599] Validator weight setting improvements (#1000)

* Remove responsive prioritization from validator weight calculation

Weight setting limitation based on responsive prioritization is no longer needed, so has been removed. Part of the original intent of the limitation was a chain storage concern of setting too many weights, but since the network has gained very high > 90% response rate the concern is moot.

The downside of applying the limitation is that validators with longer step size artificially set fewer weights simply because they could not query the network in a single epoch. This introduced a counterproductive weight setting variability across validators.

Responsiveness is scored in any case via a Shapley EMA zero-push penalty, so setting weights on non-responsive nodes still relay an accurate scoring.

* Move metagraph_sync just before weight setting

The metagraph is one epoch (~250 blocks) outdated by the time weight setting is done. This means that newly registered keys will have weights set based on the stats of the key just previously registered on the same UID.

Weights will now be set more accurately when the metagraph is updated just before weight setting, which will reset stats of a UID that has changed ownership.

* Update neuron stats table caption

* Add metagraph register to validator

Tracks new hotkey registrations on UIDs, and records statistics of previous hotkey.

* Update validator epoch conditions

Epoch overrun beyond blocks_per_epoch no longer needed, since weight setting no longer restricted to epoch responsives.

Normal epoch duration is blocks_per_epoch if all UIDs have been queried try to query each UID at least once - assumes nucleus samples without replacement but keep minimum epoch duration at blocks_per_epoch * 12 sec block_period in case of subtensor outage causing invalid block readings to prevent fast repeated weight setting.

Also logs while conditions per step

* Cast phrase_cross_entropy tokens to int

* Assert self.metagraph.n == len(self.neuron_hotkeys)

* Round before casting to int in phrase_cross_entropy

* Log epoch while condition details

* Update validator weights console message

* Consume validator nucleus UID queue fully

Since epochs are now concluded at blocks_per_epoch many validators will have unqueried UIDs still in queue. It is better that these are retained and consumed in the next epoch, since this will ensure all UIDs are queried over a cycle that may involve successive epochs.

The inefficiency introduced is that if the queue has to be refilled in the same epoch, then a UID may be queried twice, although this is preferred over the risk of not querying a UID at all if the remaining queue is not fully consumed even if another epoch is required.

* No cache: metagraph.sync(cached=False)

Turns off metagraph cache, now requires ~10 minutes (on local) to update metagraph before weight setting, but typically provides more recent values that can catch UID ownership changes.

* Add validator --neuron.metagraph_cached

When flag --neuron.metagraph_cached is set the validator uses metagraph.sync(cached=True).

* Record block before validator neuron register entry

* Force validators without local subtensor to use cached metagraph

* Incorporate __blocktime__ and remove asserts

* Refactor neuron_register to neuron_changes and add flag

* Use cached metagraph.sync()

* Remove unused print_neuron_stats flag

* Strip lead 0. from table float displays

* Increase synergy table display precision

* Increase synergy table display precision

* Format validator query response table

* Improve release scripts and adding some labels to dockerfile #1004  (#1005)

* Pinning requirements versions in requirements/prod.txt

* Improve release scripts. Generating RELEASING.md document for release releasing guidelines. Adding more labels to dockerfile

Removing dockerfile label bittensor.packages

* Removing dockerfile label bittensor.dependencies.versions.cudnn

* Removing file that came from a wrong rebase

* Renaming RELEASING.md to RELEASE_GUIDELINES.md

* Modifying release scripts so versioning is independent of the relese process. Modifying RELEASE_GUIDELINES adding more information

* Modifying RELEASE_GUIDELINES adding more information

* Fixing the versioning script. Modifying RELEASE_GUIDELINES.md

* Version: 3.5.0. Applying ./scripts/release/versioning.sh --update minor -A

Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>
Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Eugene-hu <85906264+Eugene-hu@users.noreply.github.com>
Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>
Co-authored-by: opentaco <opentaco@protonmail.com>
Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>
Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>
Co-authored-by: joeylegere <joeylegere@gmail.com>
Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Adrian-Stefan Mares <36161392+adriansmares@users.noreply.github.com>

* [hotfix] pin scalecodec lower (#1013)

* Modifying Dockerfile to build bittensor from repository version and not from github (#1016)

* Modifying Dockerfile to build bittensor from repository version and not git clone the github repo. Updating version to 3.5.1

* CircleCI check to check that version was updated

* CircleCI check to check that version was updated

* CircleCI check to check that version was updated

* CircleCI check to check that version was updated

* CircleCI check to check that version was updated

* Updating version to 3.6.0 running './scripts/release/versioning.sh --update minor -A'

* [BIT-602] Update scaling power from subtensor (#1027)

Update power from subtensor always

Since self.config.nucleus.scaling_law_power is updated from default -1 at nucleus init, the condition here at epoch start needs to be removed and has to update from subtensor always.

* Fix extras for wheel package

Co-authored-by: Eduardo García <garciaruiz.edu+github@gmail.com>
Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>
Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>
Co-authored-by: opentaco <opentaco@protonmail.com>
Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>
Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>
Co-authored-by: joeylegere <joeylegere@gmail.com>
Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Adrian-Stefan Mares <36161392+adriansmares@users.noreply.github.com>
Co-authored-by: Eduardo <garciaruiz.edu@gmail.com>
Eugene-hu added a commit that referenced this pull request Jan 24, 2023
* release/3.5.0 (#1006)

* [feature] external axon flags (#887)

* add external axon changes

* add defaults for new axon flags

* fix args to axon

* default to internal ip and port if not specified

* add new args and todefaults

* add axon unit tests

* add description for subtensor integration test

* move test to unit test

* create new test file
add/update copyright notices

* don't default to internal ip

* add tests for setting the full_address

* add tests for subtensor.serve w/external axon info

* allow external port config to be None

* switch to mock instead of patch

* fix test mocks

* change mock config create

* fix/add default config

* change asserts add mesage

* fix check call args

* fix mock config set

* only call once

* fix help wording

* should be True

* [fix] fixes unstake with max-stake flag (#905)

* add equality to None to the balance class

* add tests for the None case

* local train bug fix (#906)

* [feature] [CUDA solver] Add multi-GPU and ask for CUDA during btcli run (#893)

* added cuda solver

* boost versions to fix pip error

* allow choosing device id

* fix solution check to use keccak

* adds params for cuda and dev_id to register

* list devices by name during selection

* add block number logging

* fix calculation of hashrate

* fix update interval default

* add --TPB arg to register

* add update_interval flag

* switch back to old looping/work structure

* change typing

* device count is a function

* stop early if wallet registered

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* change import to cubit

* fix import and default

* up default
should have default in CLI call

* add comments about params

* fix config var access

* add cubit as extra

* handle stale pow differently
check registration after failure

* restrict number of processes for integration test

* fix stale check

* use wallet.is_registered instead

* attempt to fix test issue

* fix my test

* oops typo

* typo again ugh

* remove print out

* fix partly reg test

* fix if solution None

* fix test?

* fix patch

* add args for cuda to subtensor

* add cuda args to reregister call

* add to wallet register the cuda args

* fix refs and tests

* add for val test also

* fix tests with rereg

* fix patch for tests

* add mock_register to subtensor passed instead

* move register under the check for isregistered

* use patch obj instead

* fit patch object

* fix prompt

* remove unneeded if

* modify POW submit to use rolling submit again

* add backoff to block get from network

* add test for backoff get block

* suppress the dev id flag if not set

* remove dest so it uses first arg

* fix pow submit loop

* move registration status with

* fix max attempts check

* remove status in subtensor.register

* add submit status

* change to neuron get instead

* fix count

* try to patch live display

* fix patch

* .

* separate test cases

* add POWNotStale and tests

* add more test cases for block get with retry

* fix return to None

* fix arg order

* fix indent

* add test to verify solution is submitted

* fix mock call

* patch hex bytes instead

* typo :/

* fix print out for unstake

* fix indexing into mock call

* call indexing

* access dict not with dot

* fix other indent

* add CUDAException for cubit

* up cubit version

* [Feature] ask cuda during btcli run (#890)

* add ask for cuda reg config in btcli run

* suppress unset arg

* [Feature] [cuda solver] multi gpu (#891)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* Feature/cuda solver multi gpu (#892)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* add All gpus specification

* continue trying reg after Stale

* catch for OSX

* dont use qsize

* add test for continue after being stale

* patch get_nowait instead of qsize

* [Docs] Update old docs link to new link. Change discord invite to custom link (#915)

* Update old docs link to new one

This change deletes the old gitbooks documentation link and replaces it with the new one.

* fix discord links

Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>

* Fix for test_neuron.py (#917)

prevents downloading from huggingface

* [feature] add --seed option to regen_hotkey (#916)

* add seed option to regen hotkey

* make seed optional and fix docstring

* add tests for both coldkey and hotkey regen w/seed

* oops, make seed optional

* fix old test, add config.seed

* circle ci version update and fix (#920)

* Add test_phrases_split unit test

Asserts that randomly instantiated compact_topk encodings can be correctly decoded to recover the original topk_tensor.

* Update unravel_topk_token_phrases with faster implementation

Replaces .tensor_split() with block indexing to avoid extra copy operations.

* Rename test_phrases_split to test_random_topk_token_phrases

* Unit tests cleanup (#922)

* circle ci version update and fix

* Test clean up

* uncomment test and remove specific test

* remove loguru and fix flaky tests

* fix syncing

* removing tokenizer equivalence + some bug fixes

* moving old dataset test

* Deactivate test_random_topk_token_phrases unit test

* Create topk_tensor on origin device

* Normalization Update (#909)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* Adding development workflow documentation and script for bumping the version (#918)

BIT-582 Adding development workflow documentation and script for bumping the version

* Revert "Normalization Update (#909)"

This reverts commit 3990a28.

* Parachain registration (#912)

* removed ws assumption

* removing check

* never registered

* Fixed sched_getaffinity for mac osx

* Started adding parachain support

* [hot-fix] fix indent again. add test (#907)

fix indent again. add test

* Fixed registration check and first time registration

* Removed old entrypoint list structure

* Fixed unit tests

Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>

* Bit 583 memory optimization v4 (#929)

* set allowed receptor to be 0 in validator to not store any receptor

* max_active receptro to 0

* fix

* feature/BIT-579/Adding Prometheus (#928)

* BIT-582 Adding development workflow documentation and script for bumping the version

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* Revert "BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory"

This reverts commit 8742d7f.

* BIT-579 Adding _prometheus to bittensor

* BIT-579 Adding prometheus code to bittensor/_neuron/text/core_*

* BIT-579 Adding prometheus code to bittensor/_config/config_impl.py. Sends the config to the inprocess prometheus server if it exists.

* BIT-579 Adding prometheus code to bittensor/_axon/*

* BIT-579 Adding prometheus code to bittensor/_dendrite/*

* BIT-579 Fixing syntax error

* BIT-579 Fixing missing import: time

* BIT-579 fixing typo

* BIT-579 fixing test: unit_tests/bittensor_tests/test_neuron.py

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* Dendrite Text Generate (#941)

* adds generate to dendrite

* vune fixes

* extend readme

Co-authored-by: unconst <jake@bittensor.com>

* Subtensor and Normalization updates (#936)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* Prometheus bug fix (#942)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* small bug fix

* [Fix] only reregister if flag is set (#937)

* add test for expected reregister behaviour

* add fix

* pass passed args into earlier parse

* fix test by using args

* exit before actual register

* use strtobool

Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [BIT 584] [feature] btcli register output stats not in place (#923)

* add flags for output_in_place during registration

* stop tracking best

* refactor registration logging output

* fix reregister from type bool

* change in_place and use_cuda to strtobool

* add param and defaults

* fix reference before assignment

* add new logger to cuda rege

* pass param to btcli register call

* oops

* fix init

* try slight timeout

* try fix

* oop

* ?

* fix use_cuda flag

* add test for new use_cuda flag setup

* use create pow to patch

* all no prompt dev id

* fix console.error

* use lower for str comparison

* call self register instead

* add test for wallet register call

* tests are for wallet reregister

* fix typo

* no self on top-level test

* fix tests?

* use reregister

* typo in test

* fix assert

* fix assert

* should be False

* fix time output to use timedelta

* add log verbose as option to reg output

* should be action

* fix typo

* add missing function arg

* fix spacing

* fix flags

* fix flags

* fix test

* should pass in args to config pre-parse

* use None instead of NA

Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>

* [Fix] multi cuda fix (#940)

* adjust none end calculation

* attempt to fix stop issue

* modify stop

* update nonce_start by correct amount

* fix nonce init to only random and update

* fix update amount

* add start values

* add test

* try different hashrate calc

* try EWMA for hash_rate

* oops bad import

* change name to worker

* extract helper and modify comment

* fix time now

* catch Full

* use a finished queue instead of times

* move constants to function params

* fix name of n

* fix verbose log

* allow --output_in_place

* fix n

* change to --no_ouput_in_place

* fix test

* Fix/pin wandb (#945)

pin below 0.13.4

* [Fix] change bellagene entrypoint string (#938)

dont add special case for network endpoint

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update dockerfile to current on dockerhub (#934)

* update dockerfile to current on dockerhub

* add netcat

* move nvm install up to take advantage of caching

* use pip

* add nvm install checksum

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Minor fixes (#955)

minor fixes

Co-authored-by: unconst <jake@bittensor.com>

* Remove locals from cli and bittensor common (#947)

remove locals from cli and bittensor common

Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* [feature] Improve dataloader performance (#950)

* use threadpool and futures for dataloader

* add cli arg for max directories

Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* No set weights (#959)

* add no set weights

* add no_set_weights

* fix logging

* comments fix;

Co-authored-by: unconst <jake@bittensor.com>

* Bit 590 backward fix (#957)

* init

* no local forward and remote forward overlap

* clean up

* saving remote

* fix local size mismatch

* clean up

* fix

* hidden state and causalLM deterministicness

* rm backward

* default to have dendrite backward

* [Fix] add perpet hash rate and adjust alpha (#960)

* perpet hash rate and adjust alpha

* move reg code to registrationpy

* try different calc

* fix div by 0

* fix for cpu too

* fix race

* modify reg metrics output

* fix test mock

* oops

* [Fix] stake conversion issue (#958)

* modify balance arithm to cast to float first

* fix tests to model this behavior

* fix prompt spacing

* should be value error

* add test for eq balance other

* add comment to explain change

* fix tests

* .

* fix class

* balance fix

* try fix to staking

* fix comments

* add test for fix

* fix test

* fix impl

* add tests with bad types

* catch Typerror too and notimplerror

* catch typeerror

* .

* catch valueerror also

* initial commit

* fix manager server no return

* Dasyncio (#967)

* initial commit

* fix manager server no return

Co-authored-by: unconst <jake@bittensor.com>

* Update __init__.py

* Moving to release

* Release 3.4.2 (#969)

* initial commit

* fix manager server no return

* Moving to release

Co-authored-by: unconst <jake@bittensor.com>

* fix failing test_forward_priority_2nd_request_timeout

* Decrease validator moving average window

Decrease validator moving average window from 20 (alpha=0.05) to 10 (alpha=0.1) steps. This parameter could probably eventually be set to alpha=0.2.

The current 20-step window means that a server model change will take 20 steps * ~250 blocks/epoch * 12 sec = approx. 17 hours to reach full score in the validator neuron stats, because of the moving average slowly weighing in new model performance. 17 hours is probably too long, and it is also likely affecting registration immunity.

* Release 3.4.2 (#972)

* remove test_receptor test

* fix tests

Co-authored-by: unconst <jake@bittensor.com>

* No version checking (#974)

* no version checking

* fix integration tests

* remove print

Co-authored-by: Thebes <jake@bittensor.com>

* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <jake@bittensor.com>

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <jake@bittensor.com>

* Support arbitrary gRPC request metadata order (#976)

* Format AuthInterceptor using black

* Parse request metadata as key value pairs

* Use request method to black list calls

* Fix request type provided on backward

* Add type hints

* Refactor signature parsing

* [Fix] Dockerfile: clone the repo to install instead (#984)

* clone the repo to install instead

* no cd

Co-authored-by: Ala Shaabana <shaabana@gmail.com>

* Update bittensor version to 3.4.3

(cherry picked from commit 43110cf)

* Catch precision errors in synapse forward responses

Response serialization/deserialization introduces precision errors that may cause probability sums to exceed permissible boundaries. Now checks to see if precision errors are within established absolute tolerance (atol = 1e-6 currently).

(cherry picked from commit d96b625)

* Comment update for tensor size

(cherry picked from commit 6dd06f9)

* Fix/allow synapse all (#988)

* allow set synapse All using flag

* add test

* use dot get

* Mark registration threads as daemons (#998)

Make solver processes daemons

* Add response table to validator debugging

* Add return_ops to parameters

* Decode context and answer

* Add validation length parameter

* Itemize probabilities

* Add debug prints

* Change table formatting

* Add extra tasks to response table

* Debug add print

* Remove batch_size parameter

* Switch modulo order

* Modify table format

* Modify table format

* Modify table format

* Modify table format

* Try table print to catch rich errors

* Modify table title and caption

* Add shapley_values_nxt column

* Refactor response table functions

* Correct responsive count

* Format table caption

* [BIT-599] Validator weight setting improvements (#1000)

* Remove responsive prioritization from validator weight calculation

Weight setting limitation based on responsive prioritization is no longer needed, so has been removed. Part of the original intent of the limitation was a chain storage concern of setting too many weights, but since the network has gained very high > 90% response rate the concern is moot.

The downside of applying the limitation is that validators with longer step size artificially set fewer weights simply because they could not query the network in a single epoch. This introduced a counterproductive weight setting variability across validators.

Responsiveness is scored in any case via a Shapley EMA zero-push penalty, so setting weights on non-responsive nodes still relay an accurate scoring.

* Move metagraph_sync just before weight setting

The metagraph is one epoch (~250 blocks) outdated by the time weight setting is done. This means that newly registered keys will have weights set based on the stats of the key just previously registered on the same UID.

Weights will now be set more accurately when the metagraph is updated just before weight setting, which will reset stats of a UID that has changed ownership.

* Update neuron stats table caption

* Add metagraph register to validator

Tracks new hotkey registrations on UIDs, and records statistics of previous hotkey.

* Update validator epoch conditions

Epoch overrun beyond blocks_per_epoch no longer needed, since weight setting no longer restricted to epoch responsives.

Normal epoch duration is blocks_per_epoch if all UIDs have been queried try to query each UID at least once - assumes nucleus samples without replacement but keep minimum epoch duration at blocks_per_epoch * 12 sec block_period in case of subtensor outage causing invalid block readings to prevent fast repeated weight setting.

Also logs while conditions per step

* Cast phrase_cross_entropy tokens to int

* Assert self.metagraph.n == len(self.neuron_hotkeys)

* Round before casting to int in phrase_cross_entropy

* Log epoch while condition details

* Update validator weights console message

* Consume validator nucleus UID queue fully

Since epochs are now concluded at blocks_per_epoch many validators will have unqueried UIDs still in queue. It is better that these are retained and consumed in the next epoch, since this will ensure all UIDs are queried over a cycle that may involve successive epochs.

The inefficiency introduced is that if the queue has to be refilled in the same epoch, then a UID may be queried twice, although this is preferred over the risk of not querying a UID at all if the remaining queue is not fully consumed even if another epoch is required.

* No cache: metagraph.sync(cached=False)

Turns off metagraph cache, now requires ~10 minutes (on local) to update metagraph before weight setting, but typically provides more recent values that can catch UID ownership changes.

* Add validator --neuron.metagraph_cached

When flag --neuron.metagraph_cached is set the validator uses metagraph.sync(cached=True).

* Record block before validator neuron register entry

* Force validators without local subtensor to use cached metagraph

* Incorporate __blocktime__ and remove asserts

* Refactor neuron_register to neuron_changes and add flag

* Use cached metagraph.sync()

* Remove unused print_neuron_stats flag

* Strip lead 0. from table float displays

* Increase synergy table display precision

* Increase synergy table display precision

* Format validator query response table

* Improve release scripts and adding some labels to dockerfile #1004  (#1005)

* Pinning requirements versions in requirements/prod.txt

* Improve release scripts. Generating RELEASING.md document for release releasing guidelines. Adding more labels to dockerfile

Removing dockerfile label bittensor.packages

* Removing dockerfile label bittensor.dependencies.versions.cudnn

* Removing file that came from a wrong rebase

* Renaming RELEASING.md to RELEASE_GUIDELINES.md

* Modifying release scripts so versioning is independent of the relese process. Modifying RELEASE_GUIDELINES adding more information

* Modifying RELEASE_GUIDELINES adding more information

* Fixing the versioning script. Modifying RELEASE_GUIDELINES.md

* Version: 3.5.0. Applying ./scripts/release/versioning.sh --update minor -A

Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>
Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Eugene-hu <85906264+Eugene-hu@users.noreply.github.com>
Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>
Co-authored-by: opentaco <opentaco@protonmail.com>
Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>
Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>
Co-authored-by: joeylegere <joeylegere@gmail.com>
Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Adrian-Stefan Mares <36161392+adriansmares@users.noreply.github.com>

* [hotfix] pin scalecodec lower (#1013)

* Modifying Dockerfile to build bittensor from repository version and not from github (#1016)

* Modifying Dockerfile to build bittensor from repository version and not git clone the github repo. Updating version to 3.5.1

* CircleCI check to check that version was updated

* CircleCI check to check that version was updated

* CircleCI check to check that version was updated

* CircleCI check to check that version was updated

* CircleCI check to check that version was updated

* Updating version to 3.6.0 running './scripts/release/versioning.sh --update minor -A'

* [BIT-602] Update scaling power from subtensor (#1027)

Update power from subtensor always

Since self.config.nucleus.scaling_law_power is updated from default -1 at nucleus init, the condition here at epoch start needs to be removed and has to update from subtensor always.

* Fix extras for wheel package

* Update version to 3.6.1

* Update README.md

* Hotfix/3.6.2/validator logit parameters (#1057)

* additional parameters

* fixed naming to logit divergence

* versioning and fixes

* typo fixes

* bug fixes

* Tests cli fixes (#1058)

* fix btcli list with wallet.path (#1036)

fix path join

* remove mock subtensor and replace with mock calls

* additional fixes

* mock wallet

Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>

* Log prune_len and logits_divergence

* Always get latest prune_len

Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>
Co-authored-by: opentaco <opentaco@protonmail.com>

* fixing no_version_checking error

* updating version to 3.6.3

Co-authored-by: Cameron Fairchild <cameron.fairchild@mail.utoronto.ca>
Co-authored-by: Eugene <etesting007@gmail.com>
Co-authored-by: Eugene-hu <85906264+Eugene-hu@users.noreply.github.com>
Co-authored-by: Mac Thrasher <95183714+quac88@users.noreply.github.com>
Co-authored-by: opentaco <opentaco@protonmail.com>
Co-authored-by: opentaco <93473497+opentaco@users.noreply.github.com>
Co-authored-by: Ala Shaabana <shaabana@gmail.com>
Co-authored-by: Ala Shaabana <ala@bittensor.com>
Co-authored-by: isabella618033 <49876827+isabella618033@users.noreply.github.com>
Co-authored-by: Unconst <32490803+unconst@users.noreply.github.com>
Co-authored-by: unconst <jake@bittensor.com>
Co-authored-by: Cameron Fairchild <cameron@opentensor.ai>
Co-authored-by: joeylegere <joeylegere@gmail.com>
Co-authored-by: Joey Legere <joey@opentensor.ai>
Co-authored-by: Adrian-Stefan Mares <36161392+adriansmares@users.noreply.github.com>
@ifrit98 ifrit98 deleted the validator_exit branch May 24, 2023 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants