# Task Overview

## Summarization
This consists of a query whereby the agent (Validator) requests a summary of a topic from the miners. Always uses API

The context is from Wikipedia. The context is used for creating the reference and may or may not be also sent to the miners.

1. Select wikipedia article at random and use to define TOPIC (e.g. Pearl Harbour) and CONTEXT (article content)
2. Extract TAGS (history, WW2, Japan, USA) associated with article
3. Generate SYSTEM PROMPT such as 'You are a student who wants a summary of the main events of TOPIC (TAGS) in a XYZ tone'.
4. Generate QUERY using MODEL and SYSTEM PROMPT
5. Generate K REFERENCES using MODEL with & without CONTEXT (helps us understand the efficacy of tool use in miners)
6. Repeat step 5 using GPT and other models (e.g. mixtral, solar)

----
system_prompt = 'You are a student who want a summary of Pradeep Kumar Dubey (politics) in an interested tone.'

system prompt is given to our agent (LLM) and the agent generates a query:

query = 'Give me an overview of the politician Pradeep Kumar Dubey'
query = 'Provide me with a summary of Pradeep Kumar Dubey'
query = 'I want to know about Pradeep Kumar Dubey, can you give me a summary?'

Query is then sent to the miners.



## Question Answering
This consists of a query whereby the agent (Validator) requests an answer to a question from the miners. Always uses API.

## Debugging
This can consist of either:
- Non API: Reference answer (code snippet) provided by the agent, followed by a corruption step to create the challenge. Only a single reference answer exists
- API: Stack overflow is used to find a random thread containing a question and one or more accepted/upvoted answers. In this case the reference answers are weighted by upvotes and the challenge is the user question. Multiple reference answers exist.


In [14]:
!pip install beautifulsoup4



In [2]:
import os
import openai
openai.api_key = api_key = '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'


import bittensor as bt

import pandas as pd
       
from utils import load_llm
from prompting.agent import Agent
from prompting.tasks import DebuggingTask, QuestionAnsweringTask, SummarizationTask


gpt_judge_prompt = """I'm using a roleplaying AI assistant to imitate human queries. You task is to assess whether the following query follows the instruction correctly.  If the assistant-generated query contains system messages such as 'sure i can help' or similar, this is a bad result because humans would not talk to an AI assistant in that way.

system_prompt = {system_prompt}

query = {query}'

Does the above query follow the system prompt and strongly resemble a human message? 

Simply answer 0 or 1, and your result must be enclosed in { } tags"""

In [3]:
model = 'gpt-4'
llm = load_llm(model, api_key=api_key)


In [4]:
llm

ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7f14964aca60>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7f1495645d30>, model_name='gpt-4', temperature=0.0, openai_api_key='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', openai_proxy='')

In [173]:
import requests


def get_random_wikipedia_article(min_length=1000, min_backlinks=1):
    # Wikipedia API endpoint for a random article
    url = "https://en.wikipedia.org/w/api.php"

    # Parameters for the API request
    params = {
        'action': 'query',
        'format': 'json',
        'prop': 'info|linkshere|categories|categoryinfo|extracts',
        'generator': 'random',
        'grnnamespace': 0,  # Namespace 0 indicates articles
        'grnlimit': 10,     # Number of random articles to fetch
        'inprop': 'url|displaytitle|length',  # Requesting URL, title, and length of the page
        'lhprop': 'pageid',  # Properties for links here (backlinks)
        'lhlimit': 'max',    # Maximum number of backlinks to retrieve
        'exlimit': 'max',    # Get extracts for each page
        'cllimit': 'max'     # Get all categories for each page
    }

    
    max_tries = 10
    tries = 0
    while tries < max_tries:

        response = requests.get(url, params=params)
        tries += 1
        
        data = response.json()
        if not data.get('query'):
            continue

        for page_id, page_info in data['query']['pages'].items():

            length = page_info.get('length', 0)
            backlinks = len(page_info.get('linkshere', []))
            categories = [cat.get('title','').strip('Category:') for cat in page_info.get('categories', [{}])]
            extract = page_info.get('extract')
                
            if length >= min_length and backlinks >= min_backlinks and extract:# and views >= min_views:
                return {
                    'title': page_info['title'],
                    'url': page_info['fullurl'],
                    'length': length,
                    'extract': extract,
                    'backlinks': backlinks,
                    'categories': categories
                }
    raise Exception(f"Could not find an article with length >= {min_length} and backlinks >= {min_backlinks} after {max_tries} tries.")

# Example usage
filtered_data = get_random_wikipedia_article()
filtered_data


{'title': 'Invariant measure',
 'url': 'https://en.wikipedia.org/wiki/Invariant_measure',
 'length': 5212,
 'extract': '<p>In mathematics, an <b>invariant measure</b> is a measure that is preserved by some function. The function may be a geometric transformation. For examples, circular angle is invariant under rotation, hyperbolic angle is  invariant under squeeze mapping, and a difference of slopes is invariant under shear mapping.</p><p>Ergodic theory is the study of invariant measures in dynamical systems. The Krylov–Bogolyubov theorem proves the existence of invariant measures under certain conditions on the function and space under consideration.\n</p>\n\n\n<h2><span id="Definition">Definition</span></h2>\n<p>Let <span><span><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\\displaystyle (X,\\Sigma )}">\n  <semantics>\n    <mrow class="MJX-TeXAtom-ORD">\n      <mstyle displaystyle="true" scriptlevel="0">\n        <mo stretchy="false">(</mo>\n        <mi>X</mi>\n        <

In [99]:
import requests

def get_random_wikipedia_article():
    # Wikipedia API endpoint for a random article
    url = "https://en.wikipedia.org/w/api.php"

    # Parameters for the API request
    params = {
        'action': 'query',
        'format': 'json',
        'prop': 'info|linkshere|categories|categoryinfo|pageviews',#|extracts
        'generator': 'random',
        'grnnamespace': 0,  # Namespace 0 indicates articles
        'grnlimit': 20       # Number of random articles to fetch
    }

    # Making the API request
    response = requests.get(url, params=params)
    data = response.json()
    return data

    # Extracting the title of the random article
    title = data['query']['random'][0]['title']

    # URL of the random article
    article_url = f"https://en.wikipedia.org/wiki/{title.replace(' ', '_')}"

    return title, article_url

get_random_wikipedia_article()

{'continue': {'lhcontinue': 'Amir_Taaki|21670782',
  'clcontinue': '1132380|Infobox_mapframe_without_OSM_relation_ID_on_Wikidata',
  'pvipcontinue': 'Danny_Ben-Moshe',
  'grncontinue': '0.075810513275|0.075810513275|0|0',
  'continue': 'grncontinue||info|categoryinfo'},
 'query': {'pages': {'25822387': {'pageid': 25822387,
    'ns': 0,
    'title': "Consort Yu (Xiang Yu's wife)",
    'contentmodel': 'wikitext',
    'pagelanguage': 'en',
    'pagelanguagehtmlcode': 'en',
    'pagelanguagedir': 'ltr',
    'touched': '2024-01-02T04:57:00Z',
    'lastrevid': 1175701797,
    'length': 5407,
    'pageviews': {'2023-11-05': 95,
     '2023-11-06': 78,
     '2023-11-07': 74,
     '2023-11-08': 51,
     '2023-11-09': 82,
     '2023-11-10': 64,
     '2023-11-11': 74,
     '2023-11-12': 71,
     '2023-11-13': 72,
     '2023-11-14': 60,
     '2023-11-15': 51,
     '2023-11-16': 89,
     '2023-11-17': 66,
     '2023-11-18': 79,
     '2023-11-19': 70,
     '2023-11-20': 71,
     '2023-11-21': 63,
   

In [None]:

def get_wikipedia_article_content(title, remove_headers=False):
    # Wikipedia API endpoint
    url = "https://en.wikipedia.org/w/api.php"

    # Parameters for the API request to get article content
    params = {
        'action': 'query',
        'format': 'json',
        'titles': title,
        'prop': 'extracts',
        'explaintext': True,  # Get plain text content
    }

    # Making the API request
    response = requests.get(url, params=params)
    data = response.json()

    # Extracting the page content
    page = next(iter(data['query']['pages'].values()))
    content = page.get('extract', 'Content not found.')
    
    text = ''
    for line in content.split('\n'):
        if remove_headers and line.startswith('==') and line.endswith('=='):
            continue
        text += line + '\n'

    return text

from bs4 import BeautifulSoup
# TODO: maybe this?
def extract_categories(url):
    # Fetch the webpage
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find the category links
    categories = []
    for link in soup.find_all("a", href=lambda href: href and "Category:" in href):
        category = link.get_text()
        categories.append(category)

    return categories

# Assuming you have a title from the previous function
title, url = get_random_wikipedia_article()
content = get_wikipedia_article_content(title, remove_headers=True)
categories = extract_categories(url)
print(f"Title: {title}\nContent:\n{content}\nCategories: {categories}")


In [65]:
content['query']['pages']

{'18702414': {'pageid': 18702414,
  'ns': 0,
  'title': 'Fuller House (Barnstable, Massachusetts)',
  'extract': 'The Fuller House is a historic house on Parker Road in Barnstable, Massachusetts. Built c. 1800, the house is a well-preserved local example of a Federal period farmhouse with barn. The house was listed on the National Register of Historic Places in 1987.\n\n\n== Description and history ==\nThe Fuller House is set on the northwest side of Parker Road in West Barnstable, just northeast of its junction with Church Street, and is set near the road, behind a low fieldstone wall. Its main block is a 1+1⁄2-story wood-frame structure, three bays wide, with a side-gable roof and wood shingle siding. In a somewhat unusual arrangement for the period, the chimney is located centered behind the northernmost bay, while the main entrance, which is more typically in front of the chimney, is located in the southern bay. The entry is flanked by pilasters and topped by a transom window, typi

In [55]:
import requests
import random

def get_pages_in_category(category):
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        'action': 'query',
        'format': 'json',
        'list': 'categorymembers',
        'cmtitle': f"Category:{category}",
        'cmnamespace': 0,  # Specify namespace 0 for articles
        'cmlimit': 'max'
    }

    response = requests.get(url, params=params)
    data = response.json()

    page_titles = [page['title'] for page in data['query']['categorymembers']]
    return page_titles


def select_random_page(pages):
    if pages:
        return random.choice(pages)
    else:
        return None

# Example: Fetch a random page from the "Physics" category
category = "Science"
for i in range(3):
    pages = get_pages_in_category(category)
    print(f'Category: {category}. Content: {pages}')
    random_page_title = category = select_random_page(pages)
    print(f"Random page from {category} category: {random_page_title}. Total pages: {len(pages)}")

print(f"Random page from {category} category: {random_page_title}. Total pages: {len(pages)}")
# content = get_wikipedia_article_content(random_page_title, remove_headers=False)
# print(content)

Category: Science. Content: ['Science', 'Outline of science', 'Fanzor', 'IdeaSquare', 'Methoxyacetic acid', 'Potassic-magnesio-fluoro-arfvedsonite', 'Mortimer Rogoff']
Random page from Mortimer Rogoff category: Mortimer Rogoff. Total pages: 7
Category: Mortimer Rogoff. Content: []
Random page from None category: None. Total pages: 0
Category: None. Content: []
Random page from None category: None. Total pages: 0
Random page from None category: None. Total pages: 0


In [32]:
import requests

def get_top_level_categories(category="Contents"):
    # Wikipedia API endpoint
    url = "https://en.wikipedia.org/w/api.php"

    # Parameters for the API request to get subcategories
    params = {
        'action': 'query',
        'format': 'json',
        'list': 'categorymembers',
        'cmtitle': f"Category:{category}",
        'cmtype': 'subcat',  # Fetch subcategories
        'cmlimit': 'max'     # Maximum number of subcategories
    }

    response = requests.get(url, params=params)
    data = response.json()

    subcategories = [subcat['title'].replace("Category:", "") for subcat in data['query']['categorymembers']]
    return subcategories

# Fetch top-level categories
top_level_categories = get_top_level_categories(category='History')
print(f"Top-level Categories:\n{top_level_categories}")


Top-level Categories:
['History by ethnic group', 'History by location', 'History by mountain range', 'History by period', 'Fields of history', 'Historiography', 'People in history occupations', 'Chronology', 'Origins', 'Outlines of history and events', 'History-related lists', 'History awards', 'Historical controversies', 'History in culture', 'History education', 'Historical geography', 'Historicity', 'Legacies', 'Historical objects', 'History organizations', 'Philosophy of history', 'Historic preservation', 'Pseudohistory', 'Historical works', 'History images', 'History stubs']


In [81]:
import requests
import random

def get_pages_in_category(category):
    # Encode spaces for URL
    category = category.replace(' ', '_')
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        'action': 'query',
        'format': 'json',
        'list': 'categorymembers',
        'cmtitle': f"Category:{category}",
        'cmnamespace': 0,
        'cmlimit': 'max'
    }
    
    response = requests.get(url, params=params)
    data = response.json()

    page_titles = [page['title'] for page in data['query']['categorymembers']]
    return page_titles

def get_random_wikipedia_article(categories):
    # Select a random category
    selected_category = random.choice(categories)
    print(f"Selected Category: {selected_category}")

    # Get pages in the selected category
    pages = get_pages_in_category(selected_category)
    if not pages:
        raise ValueError( "No articles found in the category." )

    # Select a random page
    random_page_title = random.choice(pages)
    print(f"Selected Article: {random_page_title}")

    # Get the content of the random page
    url = f"https://en.wikipedia.org/w/api.php"
    params = {
        'action': 'query',
        'format': 'json',
        'titles': random_page_title,
        'prop': 'extracts',
        'explaintext': True,
    }

    response = requests.get(url, params=params)
    data = response.json()
    page = next(iter(data['query']['pages'].values()))
    content = page.get('extract', 'Content not found.')

    # Construct the article URL
    article_url = f"https://en.wikipedia.org/wiki/{random_page_title.replace(' ', '_')}"

    return {'title': random_page_title, 'url': article_url, 'content': content, 'category': selected_category}

# List of categories
categories = ['Artificial intelligence', 'World history', 'Astrophysics', 'Classical music', 
              'Environmental science', 'Food', 'Mythology', 'Contemporary art', 'Linguistics']

categories = ['Machine learning algorithms']
# Get a random article
results = []
import tqdm
for i in tqdm.tqdm(range(100)):
    try:
        data = get_random_wikipedia_article(categories)
        results.append(data)
    except ValueError as e:
        print(e)
    
print(f"Title: {title}\nURL: {url}\nContent: {content[:500]}...")  # Print the first 500 characters of content


  0%|          | 0/100 [00:00<?, ?it/s]

Selected Category: Machine learning algorithms
Selected Article: Kernel methods for vector output


  1%|          | 1/100 [00:00<00:47,  2.07it/s]

Selected Category: Machine learning algorithms
Selected Article: Distributional Soft Actor Critic


  2%|▏         | 2/100 [00:00<00:36,  2.67it/s]

Selected Category: Machine learning algorithms
Selected Article: Error-driven learning


  3%|▎         | 3/100 [00:01<00:33,  2.92it/s]

Selected Category: Machine learning algorithms
Selected Article: Federated Learning of Cohorts


  4%|▍         | 4/100 [00:01<00:31,  3.09it/s]

Selected Category: Machine learning algorithms
Selected Article: Diffusion map


  5%|▌         | 5/100 [00:01<00:32,  2.94it/s]

Selected Category: Machine learning algorithms
Selected Article: Radial basis function network


  6%|▌         | 6/100 [00:02<00:36,  2.59it/s]

Selected Category: Machine learning algorithms
Selected Article: Sparse PCA


  7%|▋         | 7/100 [00:02<00:34,  2.66it/s]

Selected Category: Machine learning algorithms
Selected Article: Repeated incremental pruning to produce error reduction (RIPPER)


  8%|▊         | 8/100 [00:02<00:32,  2.84it/s]

Selected Category: Machine learning algorithms
Selected Article: Prefrontal cortex basal ganglia working memory


  9%|▉         | 9/100 [00:03<00:30,  3.00it/s]

Selected Category: Machine learning algorithms
Selected Article: Skill chaining


 10%|█         | 10/100 [00:03<00:29,  3.10it/s]

Selected Category: Machine learning algorithms
Selected Article: Randomized weighted majority algorithm


 11%|█         | 11/100 [00:03<00:30,  2.94it/s]

Selected Category: Machine learning algorithms
Selected Article: GeneRec


 12%|█▏        | 12/100 [00:04<00:28,  3.04it/s]

Selected Category: Machine learning algorithms
Selected Article: Query-level feature


 13%|█▎        | 13/100 [00:04<00:28,  3.09it/s]

Selected Category: Machine learning algorithms
Selected Article: Almeida–Pineda recurrent backpropagation


 14%|█▍        | 14/100 [00:04<00:27,  3.16it/s]

Selected Category: Machine learning algorithms
Selected Article: Kernel principal component analysis


 15%|█▌        | 15/100 [00:05<00:28,  3.00it/s]

Selected Category: Machine learning algorithms
Selected Article: Local outlier factor


 16%|█▌        | 16/100 [00:05<00:27,  3.11it/s]

Selected Category: Machine learning algorithms
Selected Article: Diffusion model


 17%|█▋        | 17/100 [00:05<00:31,  2.62it/s]

Selected Category: Machine learning algorithms
Selected Article: Accumulated local effects


 18%|█▊        | 18/100 [00:06<00:29,  2.80it/s]

Selected Category: Machine learning algorithms
Selected Article: Radial basis function network


 19%|█▉        | 19/100 [00:06<00:29,  2.71it/s]

Selected Category: Machine learning algorithms
Selected Article: Open Syllabus Project


 20%|██        | 20/100 [00:06<00:27,  2.89it/s]

Selected Category: Machine learning algorithms
Selected Article: Out-of-bag error


 21%|██        | 21/100 [00:07<00:25,  3.04it/s]

Selected Category: Machine learning algorithms
Selected Article: Neural radiance field


 22%|██▏       | 22/100 [00:07<00:25,  3.04it/s]

Selected Category: Machine learning algorithms
Selected Article: Prefrontal cortex basal ganglia working memory


 23%|██▎       | 23/100 [00:07<00:24,  3.12it/s]

Selected Category: Machine learning algorithms
Selected Article: Distributional Soft Actor Critic


 24%|██▍       | 24/100 [00:08<00:24,  3.15it/s]

Selected Category: Machine learning algorithms
Selected Article: Repeated incremental pruning to produce error reduction (RIPPER)


 25%|██▌       | 25/100 [00:08<00:23,  3.20it/s]

Selected Category: Machine learning algorithms
Selected Article: Quickprop


 26%|██▌       | 26/100 [00:08<00:22,  3.25it/s]

Selected Category: Machine learning algorithms
Selected Article: Dynamic time warping


 27%|██▋       | 27/100 [00:09<00:23,  3.10it/s]

Selected Category: Machine learning algorithms
Selected Article: Rule-based machine learning


 28%|██▊       | 28/100 [00:09<00:22,  3.22it/s]

Selected Category: Machine learning algorithms
Selected Article: Elastic net regularization


 29%|██▉       | 29/100 [00:09<00:21,  3.26it/s]

Selected Category: Machine learning algorithms
Selected Article: Loss functions for classification


 30%|███       | 30/100 [00:10<00:22,  3.05it/s]

Selected Category: Machine learning algorithms
Selected Article: Structured kNN


 31%|███       | 31/100 [00:10<00:22,  3.12it/s]

Selected Category: Machine learning algorithms
Selected Article: Constructing skill trees


 32%|███▏      | 32/100 [00:10<00:22,  3.02it/s]

Selected Category: Machine learning algorithms
Selected Article: Bioz


 33%|███▎      | 33/100 [00:11<00:21,  3.07it/s]

Selected Category: Machine learning algorithms
Selected Article: Radial basis function network


 34%|███▍      | 34/100 [00:11<00:22,  2.93it/s]

Selected Category: Machine learning algorithms
Selected Article: Hyper basis function network


 35%|███▌      | 35/100 [00:11<00:21,  3.01it/s]

Selected Category: Machine learning algorithms
Selected Article: Gaussian splatting


 36%|███▌      | 36/100 [00:12<00:20,  3.13it/s]

Selected Category: Machine learning algorithms
Selected Article: Incremental learning


 37%|███▋      | 37/100 [00:12<00:19,  3.24it/s]

Selected Category: Machine learning algorithms
Selected Article: Prototype methods


 38%|███▊      | 38/100 [00:12<00:18,  3.28it/s]

Selected Category: Machine learning algorithms
Selected Article: Kernel principal component analysis


 39%|███▉      | 39/100 [00:12<00:18,  3.33it/s]

Selected Category: Machine learning algorithms
Selected Article: Growing self-organizing map


 40%|████      | 40/100 [00:13<00:18,  3.33it/s]

Selected Category: Machine learning algorithms
Selected Article: Diffusion map


 41%|████      | 41/100 [00:13<00:18,  3.19it/s]

Selected Category: Machine learning algorithms
Selected Article: Prefrontal cortex basal ganglia working memory


 42%|████▏     | 42/100 [00:13<00:18,  3.21it/s]

Selected Category: Machine learning algorithms
Selected Article: Online machine learning


 43%|████▎     | 43/100 [00:14<00:20,  2.85it/s]

Selected Category: Machine learning algorithms
Selected Article: Growing self-organizing map


 44%|████▍     | 44/100 [00:14<00:18,  2.98it/s]

Selected Category: Machine learning algorithms
Selected Article: GeneRec


 45%|████▌     | 45/100 [00:14<00:17,  3.12it/s]

Selected Category: Machine learning algorithms
Selected Article: Quadratic unconstrained binary optimization


 46%|████▌     | 46/100 [00:15<00:18,  2.85it/s]

Selected Category: Machine learning algorithms
Selected Article: Incremental learning


 47%|████▋     | 47/100 [00:15<00:17,  2.98it/s]

Selected Category: Machine learning algorithms
Selected Article: Elastic net regularization


 48%|████▊     | 48/100 [00:15<00:16,  3.09it/s]

Selected Category: Machine learning algorithms
Selected Article: Zero-shot learning


 49%|████▉     | 49/100 [00:16<00:15,  3.20it/s]

Selected Category: Machine learning algorithms
Selected Article: Triplet loss


 50%|█████     | 50/100 [00:16<00:15,  3.25it/s]

Selected Category: Machine learning algorithms
Selected Article: K-nearest neighbors algorithm


 51%|█████     | 51/100 [00:16<00:15,  3.17it/s]

Selected Category: Machine learning algorithms
Selected Article: Structured kNN


 52%|█████▏    | 52/100 [00:17<00:14,  3.27it/s]

Selected Category: Machine learning algorithms
Selected Article: Deep reinforcement learning


 53%|█████▎    | 53/100 [00:17<00:14,  3.14it/s]

Selected Category: Machine learning algorithms
Selected Article: Distributional Soft Actor Critic


 54%|█████▍    | 54/100 [00:17<00:14,  3.25it/s]

Selected Category: Machine learning algorithms
Selected Article: FastICA


 55%|█████▌    | 55/100 [00:18<00:14,  3.07it/s]

Selected Category: Machine learning algorithms
Selected Article: K-nearest neighbors algorithm


 56%|█████▌    | 56/100 [00:18<00:14,  3.02it/s]

Selected Category: Machine learning algorithms
Selected Article: Deep reinforcement learning


 57%|█████▋    | 57/100 [00:18<00:14,  2.91it/s]

Selected Category: Machine learning algorithms
Selected Article: Non-negative matrix factorization


 58%|█████▊    | 58/100 [00:19<00:14,  2.94it/s]

Selected Category: Machine learning algorithms
Selected Article: FastICA


 59%|█████▉    | 59/100 [00:19<00:14,  2.83it/s]

Selected Category: Machine learning algorithms
Selected Article: Bootstrap aggregating


 60%|██████    | 60/100 [00:19<00:14,  2.85it/s]

Selected Category: Machine learning algorithms
Selected Article: Linde–Buzo–Gray algorithm


 61%|██████    | 61/100 [00:20<00:13,  2.96it/s]

Selected Category: Machine learning algorithms
Selected Article: Skill chaining


 62%|██████▏   | 62/100 [00:20<00:12,  3.09it/s]

Selected Category: Machine learning algorithms
Selected Article: Sparse PCA


 63%|██████▎   | 63/100 [00:20<00:12,  3.06it/s]

Selected Category: Machine learning algorithms
Selected Article: Lasso (statistics)


 64%|██████▍   | 64/100 [00:21<00:14,  2.47it/s]

Selected Category: Machine learning algorithms
Selected Article: Sparse PCA


 65%|██████▌   | 65/100 [00:21<00:13,  2.60it/s]

Selected Category: Machine learning algorithms
Selected Article: Self-play


 66%|██████▌   | 66/100 [00:22<00:12,  2.82it/s]

Selected Category: Machine learning algorithms
Selected Article: Quickprop


 67%|██████▋   | 67/100 [00:22<00:11,  2.99it/s]

Selected Category: Machine learning algorithms
Selected Article: Prescription monitoring program


 68%|██████▊   | 68/100 [00:22<00:10,  3.07it/s]

Selected Category: Machine learning algorithms
Selected Article: Federated Learning of Cohorts


 69%|██████▉   | 69/100 [00:22<00:09,  3.20it/s]

Selected Category: Machine learning algorithms
Selected Article: Sparse PCA


 70%|███████   | 70/100 [00:23<00:09,  3.09it/s]

Selected Category: Machine learning algorithms
Selected Article: Loss functions for classification


 71%|███████   | 71/100 [00:23<00:10,  2.89it/s]

Selected Category: Machine learning algorithms
Selected Article: Forward–backward algorithm


 72%|███████▏  | 72/100 [00:24<00:10,  2.62it/s]

Selected Category: Machine learning algorithms
Selected Article: Constructing skill trees


 73%|███████▎  | 73/100 [00:24<00:09,  2.72it/s]

Selected Category: Machine learning algorithms
Selected Article: Accumulated local effects


 74%|███████▍  | 74/100 [00:24<00:08,  2.90it/s]

Selected Category: Machine learning algorithms
Selected Article: Open Syllabus Project


 75%|███████▌  | 75/100 [00:25<00:08,  2.98it/s]

Selected Category: Machine learning algorithms
Selected Article: CN2 algorithm


 76%|███████▌  | 76/100 [00:25<00:07,  3.11it/s]

Selected Category: Machine learning algorithms
Selected Article: Minimum redundancy feature selection


 77%|███████▋  | 77/100 [00:25<00:07,  3.11it/s]

Selected Category: Machine learning algorithms
Selected Article: Wake-sleep algorithm


 78%|███████▊  | 78/100 [00:25<00:07,  3.14it/s]

Selected Category: Machine learning algorithms
Selected Article: Self-play


 79%|███████▉  | 79/100 [00:26<00:06,  3.23it/s]

Selected Category: Machine learning algorithms
Selected Article: Linde–Buzo–Gray algorithm


 80%|████████  | 80/100 [00:26<00:06,  3.28it/s]

Selected Category: Machine learning algorithms
Selected Article: Bioz


 81%|████████  | 81/100 [00:26<00:05,  3.30it/s]

Selected Category: Machine learning algorithms
Selected Article: Distributional Soft Actor Critic


 82%|████████▏ | 82/100 [00:27<00:05,  3.36it/s]

Selected Category: Machine learning algorithms
Selected Article: Quadratic unconstrained binary optimization


 83%|████████▎ | 83/100 [00:27<00:05,  3.14it/s]

Selected Category: Machine learning algorithms
Selected Article: Graphical time warping


 84%|████████▍ | 84/100 [00:27<00:05,  3.00it/s]

Selected Category: Machine learning algorithms
Selected Article: Wake-sleep algorithm


 85%|████████▌ | 85/100 [00:28<00:04,  3.12it/s]

Selected Category: Machine learning algorithms
Selected Article: Open Syllabus Project


 86%|████████▌ | 86/100 [00:28<00:04,  3.20it/s]

Selected Category: Machine learning algorithms
Selected Article: Repeated incremental pruning to produce error reduction (RIPPER)


 87%|████████▋ | 87/100 [00:28<00:03,  3.28it/s]

Selected Category: Machine learning algorithms
Selected Article: Gaussian splatting


 88%|████████▊ | 88/100 [00:29<00:03,  3.32it/s]

Selected Category: Machine learning algorithms
Selected Article: K-nearest neighbors algorithm


 89%|████████▉ | 89/100 [00:29<00:03,  3.22it/s]

Selected Category: Machine learning algorithms
Selected Article: Randomized weighted majority algorithm


 90%|█████████ | 90/100 [00:29<00:03,  3.15it/s]

Selected Category: Machine learning algorithms
Selected Article: Expectation–maximization algorithm


 91%|█████████ | 91/100 [00:30<00:03,  2.28it/s]

Selected Category: Machine learning algorithms
Selected Article: Stochastic variance reduction


 92%|█████████▏| 92/100 [00:30<00:03,  2.39it/s]

Selected Category: Machine learning algorithms
Selected Article: Linde–Buzo–Gray algorithm


 93%|█████████▎| 93/100 [00:31<00:02,  2.62it/s]

Selected Category: Machine learning algorithms
Selected Article: Mixture of experts


 94%|█████████▍| 94/100 [00:31<00:02,  2.63it/s]

Selected Category: Machine learning algorithms
Selected Article: Minimum redundancy feature selection


 95%|█████████▌| 95/100 [00:31<00:01,  2.82it/s]

Selected Category: Machine learning algorithms
Selected Article: Extremal Ensemble Learning


 96%|█████████▌| 96/100 [00:32<00:01,  2.97it/s]

Selected Category: Machine learning algorithms
Selected Article: Minimum redundancy feature selection


 97%|█████████▋| 97/100 [00:32<00:00,  3.12it/s]

Selected Category: Machine learning algorithms
Selected Article: Logic learning machine


 98%|█████████▊| 98/100 [00:32<00:00,  3.17it/s]

Selected Category: Machine learning algorithms
Selected Article: Growing self-organizing map


 99%|█████████▉| 99/100 [00:32<00:00,  3.25it/s]

Selected Category: Machine learning algorithms
Selected Article: LogitBoost


100%|██████████| 100/100 [00:33<00:00,  3.01it/s]

Title: Cecil J. Doty
URL: https://en.wikipedia.org/wiki/Cecil_J._Doty
Content: Cecil John Doty (1907–1990) was an American architect, notable for planning a consistent architectural framework for the U.S. National Park Service's ambitious Mission 66 program in the 1950s and 1960s. Doty spent his childhood in May, Oklahoma, then attended Oklahoma A&M (now Oklahoma State University), and received a degree in architectural engineering in 1928. During the Great Depression that immediately followed Doty's graduation, Doty found intermittent work, but was unable to establish a b...





In [82]:
pd.DataFrame(results).title.value_counts().value_counts()

count
1    28
2    20
3     8
4     2
Name: count, dtype: int64

In [17]:

from collections.abc import Iterator

class Dataset(Iterator):
    def __init__(self):
        super().__init__()

    def __next__(self):
        max_tries = 20
        while True:
            
            bt.logging.debug("Retrieving data from prompting.dataset...")
            for _ in range(max_tries):
                title, url = get_random_wikipedia_article()
                content = get_wikipedia_article_content(title)
                
                if f'{title} may refer to:' in content:
                    continue

                if len(content.split()) < 250:
                    continue
                
                # TBD
                tags = []

                # TODO return useful addition fields
                if content.strip():
                    return {"text": content,'title': title, 'url': url, 'tags': tags}
            
            bt.logging.debug(f"Failed to retrieve data from prompting.dataset after {max_tries} tries")
            

In [18]:
dataset = Dataset()
next(dataset)

{'text': '"I Gotta Dance to Keep from Crying" is a 1963 hit by the Miracles on Motown\'s Tamla label.  It was written and produced by Motown\'s main songwriting team, Brian Holland, Lamont Dozier, and Eddie Holland. \n\n\n== Background ==\n"I Gotta Dance to Keep from Crying" was the follow-up to the group\'s Top 10 pop hit, "Mickey\'s Monkey", also written by Holland, Dozier and Holland.  The smash success of that song, according to Motown policy, automatically gave Holland-Dozier-Holland the green light to write and produce the Miracles\' next release, which resulted in this song. Like "Mickey\'s Monkey", "I Gotta Dance to Keep from Crying" features a "live party" feel. The song\'s title is a play on the old expression, "I Gotta Laugh to Keep from Crying", highlighting the all-too-human tendency to escape from heartbreak or personal pain by dancing, laughing and having a good time. Miracles lead singer Smokey Robinson, as the song\'s narrator, portrays a young man trying to get over t

In [20]:

n_trials = 1
n_references = 3

tasks = [SummarizationTask(llm=llm, dataset=dataset)]#, DebuggingTask, QuestionAnsweringTask]

df = pd.DataFrame()
for i in range(n_trials):

    # loop over all task types
    # for now, just summarization
    for task in tasks:
        
        # loop over all the different formulations of a given task
        # for task_params in [{}]:
            
        # If this is debugging a reference has been created, otherwise there is no reference
        # For now we will ignore debugging tasks for simplicity
        # task = task_class(llm, **task_params)
        
        bt.logging.info("🤖 Creating agent...")
        agent = Agent(llm=llm, task=task)
        
        query = agent.query
        # Create reference answers
        bt.logging.info("🤖 Creating reference answers...")
        references = agent.generate_reference_answers(n=n_references)
        
        query_eval = GPT(gpt_judge_prompt).parse()
            
            
            
    


[34m2024-01-03 22:55:35.978[0m | [1m      INFO      [0m | 🤖 Creating agent...           
[34m2024-01-03 22:55:35.979[0m | [1m      INFO      [0m | 🤖 Creating reference answers...


In [21]:
agent

<agent.Agent at 0x7f149528f730>

In [23]:
task

SummarizationTask(topics='', desc='get help with summarization', goal='a summary of the following text', topic='', subtopic='', challenge='Pradeep Kumar Dubey  (Hindi: प्रदीप कुमार दुबे) is a Judicial officer and the Principal Secretary (HJS) of Uttar Pradesh Legislative Assembly. Dubey joined  Uttar Pradesh Judicial Services - PCS(J) in Aug 1987. Before taking charge as the principal secretary, he served as the legal adviser to the Governor of Uttar Pradesh.\n\n\n== Early life and education ==\nPradeep Kumar Dubey was born in Etawah, Uttar Pradesh in 1957. He attained Bachelor of Laws -LL.B(Delhi University), and Master of Arts degree -M.A.(Allahabad University).\n\n\n== Career ==\nIn Aug 1987, Pradeep Kumar Dubey joined the Uttar Pradesh Judicial Services and served as a judicial magistrate from 1987 to 1994. He subsequently was appointed as the legal adviser to the legal adviser to the Governor of Uttar Pradesh. In Jun 2011, Dubey was appointed as the principal secretary of Uttar Pr

In [24]:
print(task.challenge)

Pradeep Kumar Dubey  (Hindi: प्रदीप कुमार दुबे) is a Judicial officer and the Principal Secretary (HJS) of Uttar Pradesh Legislative Assembly. Dubey joined  Uttar Pradesh Judicial Services - PCS(J) in Aug 1987. Before taking charge as the principal secretary, he served as the legal adviser to the Governor of Uttar Pradesh.


== Early life and education ==
Pradeep Kumar Dubey was born in Etawah, Uttar Pradesh in 1957. He attained Bachelor of Laws -LL.B(Delhi University), and Master of Arts degree -M.A.(Allahabad University).


== Career ==
In Aug 1987, Pradeep Kumar Dubey joined the Uttar Pradesh Judicial Services and served as a judicial magistrate from 1987 to 1994. He subsequently was appointed as the legal adviser to the legal adviser to the Governor of Uttar Pradesh. In Jun 2011, Dubey was appointed as the principal secretary of Uttar Pradesh Legislative Assembly in the Sixteenth Legislative Assembly of Uttar Pradesh.POSITIONS HELDJoined U.P. Judicial Service in the month of August

In [26]:
print( task.create_summary_prompt(task.challenge) )

        Summarize the following context: Pradeep Kumar Dubey  (Hindi: प्रदीप कुमार दुबे) is a Judicial officer and the Principal Secretary (HJS) of Uttar Pradesh Legislative Assembly. Dubey joined  Uttar Pradesh Judicial Services - PCS(J) in Aug 1987. Before taking charge as the principal secretary, he served as the legal adviser to the Governor of Uttar Pradesh.


== Early life and education ==
Pradeep Kumar Dubey was born in Etawah, Uttar Pradesh in 1957. He attained Bachelor of Laws -LL.B(Delhi University), and Master of Arts degree -M.A.(Allahabad University).


== Career ==
In Aug 1987, Pradeep Kumar Dubey joined the Uttar Pradesh Judicial Services and served as a judicial magistrate from 1987 to 1994. He subsequently was appointed as the legal adviser to the legal adviser to the Governor of Uttar Pradesh. In Jun 2011, Dubey was appointed as the principal secretary of Uttar Pradesh Legislative Assembly in the Sixteenth Legislative Assembly of Uttar Pradesh.POSITIONS HELDJoined U.P

In [22]:
references

[AIMessage(content='Pradeep Kumar Dubey is a Judicial officer and the Principal Secretary of Uttar Pradesh Legislative Assembly. He joined the Uttar Pradesh Judicial Services in August 1987 and has served in various roles including as a judicial magistrate and legal adviser to the Governor of Uttar Pradesh. Born in Etawah, Uttar Pradesh in 1957, Dubey holds a Bachelor of Laws degree from Delhi University and a Master of Arts degree from Allahabad University.'),
 AIMessage(content='Pradeep Kumar Dubey is a Judicial officer and the Principal Secretary of Uttar Pradesh Legislative Assembly. He joined the Uttar Pradesh Judicial Services in August 1987 and has served in various roles including as a judicial magistrate and legal adviser to the Governor of Uttar Pradesh. Born in Etawah, Uttar Pradesh in 1957, Dubey holds a Bachelor of Laws degree from Delhi University and a Master of Arts degree from Allahabad University.'),
 AIMessage(content='Pradeep Kumar Dubey is a Judicial officer and th