Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
fc30b45
remove summarization
richwardle Jan 2, 2025
d470f44
remove date qa task
richwardle Jan 2, 2025
966b70f
alter task registry ratios
richwardle Jan 2, 2025
bfaa509
Adjust Task weighting
bkb2135 Jan 2, 2025
a7f8f39
Initial Upload
bkb2135 Jan 2, 2025
b988759
Fix Imports
richwardle Jan 2, 2025
8f563be
Merge branch 'staging' into SN1-360-move-qa-task-to-web-dataset
richwardle Jan 3, 2025
59ab6ff
precommit changes
richwardle Jan 4, 2025
e259a00
update readme to include multistep reasoning descriptions
richwardle Jan 4, 2025
d082fc3
fix retrieval
richwardle Jan 7, 2025
432446e
precommit fixes
richwardle Jan 7, 2025
222377f
Initial upload
bkb2135 Jan 7, 2025
fcc0683
precommit fixes
richwardle Jan 8, 2025
cb5f6f1
Hotfix: Using multiple messages for inference (#531)
Hollyqui Jan 8, 2025
6f8c080
Merge branch 'staging' into SN1-360-move-qa-task-to-web-dataset
bkb2135 Jan 8, 2025
4fe84ad
Merge pull request #524 from macrocosm-os/SN1-360-move-qa-task-to-web…
bkb2135 Jan 9, 2025
dd52b9f
Stash Changes
Hollyqui Jan 9, 2025
dadb0e5
Merge branch 'staging' into SN1-372-scaling-timeout-with-new-tokens
richwardle Jan 10, 2025
060f621
Code clean up and optimizations (#526)
dbobrenko Jan 10, 2025
5ff1e24
Don't unpack dictionary
richwardle Jan 10, 2025
8bce312
Fix DDGS and PatchedDDGS; Fix staging web retrieval (#535)
dbobrenko Jan 10, 2025
6e04436
Merge pull request #537 from macrocosm-os/main
bkb2135 Jan 12, 2025
fa76a61
linting fixes
richwardle Jan 12, 2025
5e96b7b
Fix mixture endpoint (#536)
dbobrenko Jan 13, 2025
a1257ba
Fix availability
Hollyqui Jan 13, 2025
602ced1
Lint
Hollyqui Jan 13, 2025
7eba530
Merge pull request #530 from macrocosm-os/SN1-372-scaling-timeout-wit…
bkb2135 Jan 13, 2025
855cf67
Log Context source as source in wandb
bkb2135 Jan 13, 2025
a573d42
Add source to git context
Hollyqui Jan 13, 2025
f248614
Linting
Hollyqui Jan 13, 2025
d1445c6
Remove default source
Hollyqui Jan 13, 2025
0c3b912
Linting
Hollyqui Jan 14, 2025
72634fe
Merge pull request #539 from macrocosm-os/observability/log-context-s…
bkb2135 Jan 14, 2025
a327462
Update pyproject.toml
bkb2135 Jan 14, 2025
b453b93
Merge pull request #540 from macrocosm-os/bump-version-number
bkb2135 Jan 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .env.validator.example
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ SN19_API_URL = "e.g. http://24.199.112.174:4051/"
OPENAI_API_KEY = "your_openai_api_key_here"
HF_TOKEN = "your_huggingface_token_here"

# Scoring API.
# Scoring API (optional).
DEPLOY_SCORING_API = true
SCORING_ADMIN_KEY = "123456"
SCORING_API_PORT = 8094
# Scoring key must match the scoring key in the .env.api.
# SCORING_KEY="..."
17 changes: 7 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,24 +49,21 @@ Subnet one utilizes the concept of "Tasks" to control the behavior of miners. Va
### 1. **QA (Question Answering)**
The miner receives a question about a specific section from a Wikipedia page. The miner must then find the original context in the specified section and use it to return an accurate answer. References are generated using the validators privileged knowledge of the context, and miner complestions are scored based on similarity metrics.

### 2. **Summarization**
Similar to QA, but the miner uses the entire Wikipedia page instead of a specific section. The miner reads the whole page, summarizes it, and provides a concise answer.

### 3. **DateQA**
The miner receives a question about an event from Wikipedia. The miner must search through Wikipedia for the relevant event and return the correct answer based on the findings. References are again generated with validator's knowledge of the context, and similarity metrics are used to score miner completions.

### 4. **Inference**
### 2. **Inference**
A question is given with some pre-seeded information and a random seed. The miner must perform an inference based on this information to provide the correct answer. Completions are scored based on similarity metrics.

### 5. **MultiChoice**
### 3. **MultiChoice**
The miner is presented with a question from Wikipedia along with four possible answers (A, B, C, or D). The miner must search Wikipedia and return the correct answer by selecting one of the given options. Miner completions are scored by Regex matching.

### 6. **Programming**
### 5. **Programming**
The miner receives a code snippet that is incomplete. The task is to complete the code snippet to perform its intended function. The validator generates a reference using it's internal LLM, and the miner is scored based on its similarity to this reference.

### 7. **Web Retrieval**
### 6. **Web Retrieval**
The miner is given a question based on a random web page and must return a scraped website that contains the answer. This requires searching the web to locate the most accurate and reliable source to provide the answer. The miner is scored based on the embedding similarity between the answer it returns and the original website that the validator generated the reference from.

### 7. **Multistep Reasoning**
The miner is given a complex problem that requires multiple steps to solve. Each step builds upon the previous one, and the miner must provide intermediate results before arriving at the final answer. The validator generates a reference solution using its internal LLM, and the miner is scored based on the accuracy and coherence of the intermediate and final results.

# API Documentation

For detailed information on the available API endpoints, request/response formats, and usage examples, please refer to the [API Documentation](./validator_api/API_docs.md).
Expand Down
1 change: 0 additions & 1 deletion docs/SN1_validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ More tooling will be included in future releases.
# Tasks
The validation process supports an ever-growing number of tasks. Tasks drive agent behaviour based on specific goals, such as;
- Question answering
- Summarization
- Code debugging
- Mathematics
and more.
Expand Down
2 changes: 1 addition & 1 deletion neurons/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

torch.multiprocessing.set_start_method("spawn", force=True)

NEURON_SAMPLE_SIZE = 100
NEURON_SAMPLE_SIZE = 100 # TODO: Should add this to constants.py


def create_loop_process(task_queue, scoring_queue, reward_events):
Expand Down
1,010 changes: 600 additions & 410 deletions poetry.lock

Large diffs are not rendered by default.

39 changes: 22 additions & 17 deletions prompting/api/scoring/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,29 +27,34 @@ async def score_response(request: Request, api_key_data: dict = Depends(validate
model = None
payload: dict[str, Any] = await request.json()
body = payload.get("body")

try:
if body.get("model") is not None:
model = ModelZoo.get_model_by_id(body.get("model"))
except Exception:
logger.warning(
f"Organic request with model {body.get('model')} made but the model cannot be found in model zoo. Skipping scoring."
)
return
uid = int(payload.get("uid"))
chunks = payload.get("chunks")
llm_model = ModelZoo.get_model_by_id(model) if (model := body.get("model")) else None
model = body.get("model")
if model:
try:
llm_model = ModelZoo.get_model_by_id(model)
except Exception:
logger.warning(
f"Organic request with model {body.get('model')} made but the model cannot be found in model zoo. Skipping scoring."
)
return
else:
llm_model = None
task = body.get("task")
if task == "InferenceTask":
logger.info(f"Received Organic InferenceTask with body: {body}")
logger.info(f"With model of type {type(body.get('model'))}")
organic_task = InferenceTask(
messages=body.get("messages"),
llm_model=llm_model,
llm_model_id=body.get("model"),
seed=int(body.get("seed", 0)),
sampling_params=body.get("sampling_parameters", shared_settings.SAMPLING_PARAMS),
query=body.get("messages")[0]["content"],
)
logger.info(f"Task created: {organic_task}")
task_scorer.add_to_queue(
task=InferenceTask(
messages=[msg["content"] for msg in body.get("messages")],
llm_model=llm_model,
llm_model_id=body.get("model"),
seed=int(body.get("seed", 0)),
sampling_params=body.get("sampling_params", {}),
),
task=organic_task,
response=DendriteResponseEvent(
uids=[uid],
stream_results=[
Expand Down
2 changes: 2 additions & 0 deletions prompting/base/duckduckgo_patch.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from threading import Event
from typing import cast

import httpx
Expand All @@ -13,6 +14,7 @@ def __init__(self, *args, **kwargs):
timeout=kwargs.get("timeout", 10),
verify=kwargs.get("verify", True),
)
self._exception_event = Event()

def _get_url(
self: DDGS,
Expand Down
4 changes: 3 additions & 1 deletion prompting/datasets/huggingface_github.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ class HuggingFaceGithubDatasetEntry(DatasetEntry):
github_url: str
file_path: str
file_content: str
source: str | None = None


class HuggingFaceGithubDataset(BaseDataset):
Expand All @@ -46,8 +47,9 @@ def _filter_function(self, example):

def _process_entry(self, entry: dict) -> HuggingFaceGithubDatasetEntry:
file_content = "\n".join(entry["content"].split("\n")[:MAX_LINES])
url = f"https://github.com/{entry['repo_name']}"
return HuggingFaceGithubDatasetEntry(
github_url=f"https://github.com/{entry['repo_name']}", file_path=entry["path"], file_content=file_content
github_url=url, file_path=entry["path"], file_content=file_content, source=url
)

def get(self) -> HuggingFaceGithubDatasetEntry:
Expand Down
25 changes: 15 additions & 10 deletions prompting/datasets/random_website.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,24 @@ class DDGDatasetEntry(DatasetEntry):
search_term: str
website_url: str = None
website_content: str = None
query: str | None = None
source: str | None = None


class DDGDataset(BaseDataset):
english_words: list[str] = None

def search_random_term(self, retries: int = 3) -> tuple[Optional[str], Optional[list[dict[str, str]]]]:
try:
ddg = PatchedDDGS(proxy=shared_settings.PROXY_URL, verify=False)
for _ in range(retries):
random_words = " ".join(random.sample(ENGLISH_WORDS, 5))
ddg = PatchedDDGS(proxy=shared_settings.PROXY_URL, verify=False)
for _ in range(retries):
random_words = " ".join(random.sample(ENGLISH_WORDS, 3))
try:
results = list(ddg.text(random_words))
if results:
return random_words, results
except Exception as ex:
logger.error(f"Failed to get search results from DuckDuckGo: {ex}")
except Exception as ex:
logger.debug(f"Failed to get search results from DuckDuckGo: {ex}")
logger.warning(f"Failed to get search results from DuckDuckGo after {retries} tries")
return None, None

@staticmethod
Expand All @@ -41,19 +44,21 @@ def extract_website_content(url: str) -> Optional[str]:
extracted = trafilatura.extract(website)
return extracted[:MAX_CHARS] if extracted else None
except Exception as ex:
logger.error(f"Failed to extract content from website {url}: {ex}")
logger.debug(f"Failed to extract content from website {url}: {ex}")

def next(self) -> Optional[DDGDatasetEntry]:
search_term, results = self.search_random_term(retries=3)
search_term, results = self.search_random_term(retries=5)
if not results:
return None
website_url = results[0]["href"]
website_content = self.extract_website_content(website_url)
if not website_content or len(website_content) == 0:
logger.error(f"Failed to extract content from website {website_url}")
logger.debug(f"Failed to extract content from website {website_url}")
return None

return DDGDatasetEntry(search_term=search_term, website_url=website_url, website_content=website_content)
return DDGDatasetEntry(
search_term=search_term, website_url=website_url, website_content=website_content, source=website_url
)

def get(self) -> Optional[DDGDatasetEntry]:
return self.next()
Expand Down
57 changes: 6 additions & 51 deletions prompting/datasets/sn13.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,10 @@
from typing import ClassVar

import datasets
import nltk
from nltk.corpus import wordnet
from pydantic import model_validator

from shared.base import BaseDataset, ChatEntry

nltk.download("wordnet")


class SN13Dataset(BaseDataset):
_url: ClassVar[str] = "arrmlet/x_dataset_218"
Expand Down Expand Up @@ -41,51 +37,10 @@ def sample(self) -> ChatEntry:
if self.exception is not None:
raise self.exception
# Randomly select a sample from the dataset.
sample_idx = random.randint(0, len(self.dataset) - 1)
message = self.dataset[sample_idx]["text"]
role = ["user"]

# Augment the messages by modifying words and introducing errors.
messages = [self._augment_message(role, message)]

return ChatEntry(roles=role, messages=messages, organic=False, source=self._url)

def _augment_message(self, role: str, message: str) -> str:
if role == "assistant":
return message

words = message.split()
num_words_to_modify = random.randint(1, max(1, int(len(words) * self._chance_word_synonym)))
words_to_modify = random.sample(range(len(words)), num_words_to_modify)

for idx in words_to_modify:
synonym = self._get_synonym(words[idx])
if synonym:
words[idx] = synonym

message = " ".join(words)
message = self._introduce_typos(message)
return message

def _get_synonym(self, word: str) -> str:
synonyms = wordnet.synsets(word)
if synonyms:
# Choose a synonym that is not the word itself.
synonym_words = [lemma.name() for lemma in synonyms[0].lemmas() if lemma.name() != word]
if synonym_words:
return random.choice(synonym_words)
return word

def _introduce_typos(self, message: str) -> str:
message = list(message)
num_errors = random.randint(0, max(1, int(len(message) * self._chance_char_typo)))
for _ in range(num_errors):
error_type = random.choice(["remove", "add_space"])
error_position = random.randint(0, len(message) - 1)

if error_type == "remove":
message.pop(error_position)
elif error_type == "add_space":
message.insert(error_position, " ")
messages = []
for _ in range(4):
sample_idx = random.randint(0, len(self.dataset) - 1)
if message := self.dataset[sample_idx]["text"]:
messages.append({"role": random.choice(["user", "assistant"]), "content": message})

return "".join(message)
return ChatEntry(messages=messages, organic=False, source=self._url)
Loading
Loading