Skip to content

Conversation

@bkb2135
Copy link
Collaborator

@bkb2135 bkb2135 commented Aug 7, 2024

Hotfix to check if a summarization task will have a reference before generating the challenge.

bkb2135 and others added 29 commits July 12, 2024 10:28
* Point to macrocosmos entity

* Adjust project name

* Increase spec version to 2.5.2

Co-authored-by: bkb2135 <98138173+bkb2135@users.noreply.github.com>
* Point to macrocosmos entity

* Adjust project name

* Increase spec version to 2.5.2
* v2.5.2 (#287)

* Point to macrocosmos entity

* Adjust project name

* Increase spec version to 2.5.2

* correct image switching based on system color scheme

refs #302

---------

Co-authored-by: bkb2135 <98138173+bkb2135@users.noreply.github.com>
…ery (#304)

* Retry creation of challenge query multiple times
Organic Scoring Implementation

Changes:
- This implementation is based on the Generic Organic Scoring framework introduced [here](macrocosm-os/organic-scoring#1).
- Organic scoring runs in a separate `asyncio` task alongside current benchmarking tasks.
- Organic queries are received via an open validator axon and stored in the organic queue.
- For each organic or synthetic query, a reference answer is generated by the LLM.
- Rewards and penalties are calculated based on the `relevance` metric for both organic 
and synthetic queries, which is defined as the cosine similarity between sentence embeddings 
of the reference and completions.
- Augmented LMSys-Chat is used for synthetic queries.
- Logging includes elapsed time between steps inside the organic loop, organic queue length,
and other default logs used by benchmarking tasks, except prompts and completions, which are
excluded from logging into W&B.
- Validator queries 5 random miners from the network to stream back completions for organic
queries (defined in config as `neuron.organic_sample_size`).
- Reward step for organic or synthetic queue is triggered every 15 seconds and scaled down to
2 seconds if the organic queue is growing (defined in config as `neuron.organic_trigger`, `neuron.organic_trigger_frequency`, and `neuron.organic_trigger_frequency_min`).

Process Workflow:
1. **Trigger Check**: Upon triggering the rewarding process, the system checks if the organic
queue is empty.
If the queue is empty, synthetic datasets (defined in `organic_scoring/synth_dataset_base.py`)
are used to bootstrap
the organic scoring mechanism. Otherwise, samples from the organic queue are utilized.
2. **Data Processing**: The sampled data is concurrently passed to the `_query_miners` and
`_generate_reference` methods.
3. **Reward Generation**: After receiving responses from miners and any reference data,
the information is processed by the `_generate_rewards` method.
4. **Weight Setting**: The generated rewards are then applied through the `_set_weights` method.
5. **Logging**: Finally, the results can be logged using the `_log_results` method, along
with all relevant data provided as arguments, and default time elapsed on each step of rewarding process.
* Hotfix undefined HumanAgent.challenge_time

* Set begin_conversation to True for organics
Fix Wikipedia broken sections
Restart when an error is encountered in the get_block function.
Errors when making substrate calls usually result in the validator failing quietly, 
often requiring a manual restart. 
This PR is intended to catch errors originating from calls to the Bittensor package,
raise them as BittensorError, and then restart.
- Enable organic scoring weight setting.
- Fix bittensor WASM errors by switching to another bittensor branch.
Fix 'MockPipeline' object has no attribute 'generate' errors when using --neuron.model_id mock.
Fix AttributeError (no attribute ‘isdigit’) for Wikipedia Summary and Date.
Copy link
Collaborator

@dbobrenko dbobrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dbobrenko dbobrenko merged commit d2e90b2 into main Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants