Skip to content

Conversation

@dbobrenko
Copy link
Collaborator

@dbobrenko dbobrenko commented Sep 3, 2024

Patch v2.7.2 changes

  • Add signature to W&B logs to ensure authenticity of logs.
  • Increase Multiple Choice task rate from 0.05 to 0.2.

bkb2135 and others added 30 commits July 12, 2024 10:28
* Point to macrocosmos entity

* Adjust project name

* Increase spec version to 2.5.2

Co-authored-by: bkb2135 <98138173+bkb2135@users.noreply.github.com>
* Point to macrocosmos entity

* Adjust project name

* Increase spec version to 2.5.2
* v2.5.2 (#287)

* Point to macrocosmos entity

* Adjust project name

* Increase spec version to 2.5.2

* correct image switching based on system color scheme

refs #302

---------

Co-authored-by: bkb2135 <98138173+bkb2135@users.noreply.github.com>
…ery (#304)

* Retry creation of challenge query multiple times
Organic Scoring Implementation

Changes:
- This implementation is based on the Generic Organic Scoring framework introduced [here](macrocosm-os/organic-scoring#1).
- Organic scoring runs in a separate `asyncio` task alongside current benchmarking tasks.
- Organic queries are received via an open validator axon and stored in the organic queue.
- For each organic or synthetic query, a reference answer is generated by the LLM.
- Rewards and penalties are calculated based on the `relevance` metric for both organic 
and synthetic queries, which is defined as the cosine similarity between sentence embeddings 
of the reference and completions.
- Augmented LMSys-Chat is used for synthetic queries.
- Logging includes elapsed time between steps inside the organic loop, organic queue length,
and other default logs used by benchmarking tasks, except prompts and completions, which are
excluded from logging into W&B.
- Validator queries 5 random miners from the network to stream back completions for organic
queries (defined in config as `neuron.organic_sample_size`).
- Reward step for organic or synthetic queue is triggered every 15 seconds and scaled down to
2 seconds if the organic queue is growing (defined in config as `neuron.organic_trigger`, `neuron.organic_trigger_frequency`, and `neuron.organic_trigger_frequency_min`).

Process Workflow:
1. **Trigger Check**: Upon triggering the rewarding process, the system checks if the organic
queue is empty.
If the queue is empty, synthetic datasets (defined in `organic_scoring/synth_dataset_base.py`)
are used to bootstrap
the organic scoring mechanism. Otherwise, samples from the organic queue are utilized.
2. **Data Processing**: The sampled data is concurrently passed to the `_query_miners` and
`_generate_reference` methods.
3. **Reward Generation**: After receiving responses from miners and any reference data,
the information is processed by the `_generate_rewards` method.
4. **Weight Setting**: The generated rewards are then applied through the `_set_weights` method.
5. **Logging**: Finally, the results can be logged using the `_log_results` method, along
with all relevant data provided as arguments, and default time elapsed on each step of rewarding process.
* Hotfix undefined HumanAgent.challenge_time

* Set begin_conversation to True for organics
Fix Wikipedia broken sections
Restart when an error is encountered in the get_block function.
Errors when making substrate calls usually result in the validator failing quietly, 
often requiring a manual restart. 
This PR is intended to catch errors originating from calls to the Bittensor package,
raise them as BittensorError, and then restart.
- Enable organic scoring weight setting.
- Fix bittensor WASM errors by switching to another bittensor branch.
Fix 'MockPipeline' object has no attribute 'generate' errors when using --neuron.model_id mock.
Fix AttributeError (no attribute ‘isdigit’) for Wikipedia Summary and Date.
Changes:
- New multi-choice benchmarking task;
- Refactor changes (.env config-based, decoupled parts of the code);
- Poetry setup;
- Only 5 tasks are included: QA, DateQA, Summary, MultiChoice, Organic.
dbobrenko and others added 4 commits August 28, 2024 01:24
Add hotkey signature to the wandb run for multi-choice verification
Changes:
- Bump v2.7.2.
- Raise multi-choice probability from 0.05 to 0.2.
@dbobrenko dbobrenko self-assigned this Sep 3, 2024
Copy link
Collaborator

@Hollyqui Hollyqui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@cassova cassova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dbobrenko dbobrenko merged commit 91f1fc0 into main Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants