
# **TweetGen**: 生成推文来回答问题

## 【实战-上】检索内容生成推文并进行质量量化评估

前面讲了很多基础知识，优化器也讲解了一部分，现在让我们进入等成推文的实战吧。

会先检索内容作为语料，接下来会有多个度量指标来评判推文的质量，最后还会进行优化。

首先，让我们导入必要的库：

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os
import regex as re

import dspy
from dspy.predict import Retry
from dspy.datasets import HotPotQA
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
from dsp.utils import deduplicate
from dspy.evaluate.evaluate import Evaluate
from dspy.primitives.assertions import assert_transform_module, backtrack_handler



类似RAG的R——“Retrieve” ，在 DSPy 和类似的框架中，`rm` 代表 **Retrieval Model**。它是用于信息检索的核心组件，负责从大量文档中找到与给定查询最相关的文档。

**简单来说，`rm` 的作用就是:**

1. **接收一个查询（query）作为输入。**
2. **从一个文档集合（document collection）中搜索与查询相关的文档。**
3. **返回一个排序后的文档列表，其中最相关的文档排在最前面。**

**不同的检索模型有不同的工作原理：**

* **传统的检索模型 (例如 BM25)：** 通常基于词频统计和倒排索引来计算相关性。
* **神经网络检索模型 (例如 ColBERTv2)：** 使用深度学习模型来理解查询和文档的语义信息，从而更准确地判断相关性。


DSPy 是一个框架，它允许你通过声明式的方式来构建复杂的 NLP 流程，而不是编写大量的命令式代码。配置 `rm` 是 DSPy 声明式编程模型的一部分

In [9]:
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')#/wiki17_abstracts 该服务提供的是维基百科 2017 年数据的摘要（abstracts）的检索功能。
dspy.settings.configure(rm=colbertv2_wiki17_abstracts)
turbo = dspy.LM('ollama_chat/llama3.2:3b', api_base='http://192.168.110.131:11434', api_key='')
dspy.settings.configure(lm=turbo, trace=[], temperature=0.7)

In [10]:
turbo("hi~")

['How can I assist you today?']

-  这段代码配置了DSPy使用的远程管理器(RM)和语言模型(LM)。`dspy.ColBERTv2`用于检索信息，`turbo`用于生成文本。``temperature=0.7`控制生成文本的创造性，较高的值会产生更多样但可能不太相关的输出，较低的值则会产生更保守但可能更相关的输出。
配置了一个基于 ColBERTv2 模型的检索器，并将其设置为 DSPy 的默认检索模型。`rm` 参数在 DSPy 中代表检索模型，它是信息检索的核心组件，负责根据查询从文档集合中找到最相关的文档。在 DSPy 中配置 `rm` 可以简化代码、提高可复用性和可维护性。

接下来，我们加载数据集：

In [11]:
dataset = HotPotQA(train_seed=1, train_size=50, eval_seed=2023, dev_size=50, test_size=30, keep_details=True)
trainset = [x.with_inputs('question', 'answer') for x in dataset.train]
devset = [x.with_inputs('question', 'answer') for x in dataset.dev]


Average Metric: 0.00 / 5 (0.0%):  80%|████████  | 4/5 [00:29<00:07,  7.47s/it]


In [12]:
valset = [x.with_inputs('question', 'answer') for x in dataset.test]

-  我们使用了`HotPotQA`数据集，这是一个包含多跳问答任务的数据集。`train_seed`和`eval_seed`用于控制数据集的随机性，`train_size`、`dev_size`和`test_size`分别控制训练集、开发集和测试集的大小。`with_inputs('question', 'answer')`指定了输入字段为`question`和`answer`。这个数据集将被用于训练和评估我们的DSPy程序。

### 3] TweetGen

让我们介绍一个新任务：TweetGen。我们扩展了`Multi-Hop QA`程序，但现在目标是以推文的形式呈现答案生成。

`Tweeter`模块捕获了`Multi-Hop QA`中查询生成、段落检索和上下文组装的迭代多跳生成过程。`GenerateTweet`层现在利用上下文和问题来生成一条有效地回答问题的推文。

通过这个程序，我们的目标是生成符合以下准则的推文：

1. 推文没有主题标签。
2. 推文包含正确答案（但因为数据集是英文的，而我们想生成中文的推文，所以无法直接比较这一点，我们会把这一点和第5点合并）。
3. 推文字数在限制范围内。
4. 推文具有吸引力。
5. 推文是忠实的。

查看一下数据格式

In [9]:
trainset[:1]

[Example({'id': '5ac0e2525542997d64295a79', 'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt', 'type': 'bridge', 'context': {'title': ['Through the Window', 'Yes I Am (Melissa Etheridge album)', 'Chris Arena', 'Melissa Etheridge', 'List of Bridgit Mendler concert tours', 'Townes Van Zandt', 'Native Window (album)', 'At My Window (album)', 'Love Letter for Fire', 'Little Window'], 'sentences': [['Through the Window is an album by the American music project Prurient, the performing name of the artist Dominick Fernow.', ' The three-song album was released on March 19, 2013 through the English label Blackest Ever Black.', ' Though released in 2013, the tracks for "Through the Window" were recorded in October 2011 at the same time as Prurient\'s two Hydra Head Records releases — the studio album "Bermuda Drain" (2011) and the EP "Time\'s Arrow" (2011) — and were noted for musically showing more techno influences, akin to one of F

问题是“At My Window 是由哪位美国创作歌手发行的？”，答案是“John Townes Van Zandt”，问题类型为“bridge”。

数据中还包含了一个“context”字段，该字段提供了与问题相关的背景信息。它包含了 10 个标题（title）和与之对应的句子列表（sentences）。每个标题下的句子列表都提供了关于该标题的一些信息。例如，“Townes Van Zandt” 标题下的句子描述了这位歌手的生平和成就，“At My Window (album)” 标题下的句子则描述了这张专辑的发行背景和意义。

最后，“gold_titles” 字段列出了两个与答案相关的关键标题：“Townes Van Zandt” 和 “At My Window (album)”。

总而言之，这条数据提供了一个问答示例，并附带了丰富的背景信息，这些信息有助于理解问题和答案的关联性。该数据的主要关注点是美国创作歌手 John Townes Van Zandt 和他的专辑 At My Window。

- 这里定义了三个类：`GenerateSearchQuery`、`GenerateTweet`和`Tweeter`。`GenerateSearchQuery`用于生成搜索查询，`GenerateTweet`用于生成推文，`Tweeter`是主要的DSPy模块，它结合了这两个签名。`Tweeter`模块的`forward`方法执行多跳检索，首先生成查询，然后检索相关段落，最后生成推文。dspy.Retrieve 是一个 DSPy 模块，用于从向量数据库或其他信息源中检索相关的文本段落。k 参数指定了要检索的段落数量。`max_hops=2`和`passages_per_hop=3`控制检索的深度和广度。

In [3]:
class GenerateSearchQuery(dspy.Signature):
    """写一个简单的搜索查询，这将有助于回答一个复杂的问题。"""
    context = dspy.InputField(desc="可能包含相关事实")
    question = dspy.InputField()
    query = dspy.OutputField()

class GenerateTweet(dspy.Signature):
    """生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。"""
    question = dspy.InputField()
    context = dspy.InputField(desc="推特内容")
    tweet_in_Chinese = dspy.OutputField(desc="推特内容")

class Tweeter(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate_tweet = dspy.ChainOfThought(GenerateTweet)

    def forward(self, question, answer):
        #初始化一个空列表来存储上下文（检索到的段落）
        context = []
        #设置最大检索次数为 2
        max_hops = 2
        #设置每次检索返回的段落数量为 3
        passages_per_hop = 3
        #创建多个 ChainOfThought 模块实例，用于生成检索查询
        #GenerateSearchQuery 是一个签名，定义了生成检索查询的任务。
        generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        #初始化一个检索模块，用于检索与查询相关的段落
        retrieve = dspy.Retrieve(k=passages_per_hop)
        #执行多次检索，迭代地构建上下文
        for hop in range(max_hops):
            #使用当前的上下文和问题生成检索查询
            query = generate_query[hop](context=context, question=question).query
            #检索与查询相关的段落
            passages = retrieve(query).passages
            #将新检索到的段落添加到上下文中，并去除重复项
            context = deduplicate(context + passages)
        #使用 generate_tweet 模块生成推文
        generated_tweet = self.generate_tweet(question=question, context=context).tweet_in_Chinese
        return dspy.Prediction(generated_tweet=generated_tweet, context=context)
    
tweeter = Tweeter()



### 4] 评估 - 内在和外在

#### 内在指标：通过内部计算约束是目标

**无主题标签** - 这是一个用户个性化的约束，用于测试模型是否能够遵循一个特定的、简单的准则，即在生成的推文中不包含任何主题标签“#”。

**长度以内** - 此检查遵循 Twitter 平台每条推文 280 个字符的限制。

**参与度** - 为了验证推文的参与度质量，我们定义并调用另一个 **DSPy** 程序：`AssessTweet` 上的 `Predict`，依赖于相同的 LM 来回答问题：“评估的文本是否构成一个独立的、引人入胜的推文？”

**忠实性** - 为了验证推文对其引用上下文的忠实性，我们类似地使用 `AssessTweet` 如上所述，但用问题提示它：“评估的文本是否基于上下文？”

In [4]:
def has_no_hashtags(text):
    return len(re.findall(r"#\w+", text)) == 0

def is_within_length_limit(text, length_limit=280):
    return len(text) <= length_limit

def is_assessment_yes(assessment_answer):
    """检查评估答案的第一个单词是否为“是”。"""
    return 'yes' in assessment_answer.lower()
#我们的推文是中文的，无法和数据集作直接的in比较，把这点放在faithful_metric的评估中
# def has_correct_answer(text, answer):
#     return answer in text

class AssessTweet(dspy.Signature):
    """评估推文在指定维度上的质量。"""

    context = dspy.InputField(desc='如果不适用，请忽略')
    assessed_text = dspy.InputField()
    assessment_question = dspy.InputField()
    assessment_answer = dspy.OutputField(desc="yes or no")

def no_hashtags_metric(gold, pred, trace=None):
    tweet = pred.generated_tweet
    no_hashtags = has_no_hashtags(tweet)
    score = no_hashtags
    return score
#我们的推文是中文的，无法和数据集作直接的in比较，把这点放在faithful_metric的评估中
# def is_correct_metric(gold, pred, trace=None):
#     answer, tweet = gold.answer, pred.generated_tweet
#     correct = has_correct_answer(tweet, answer)
#     score = correct
#     return score

def within_length_metric(gold, pred, trace=None):
    tweet = pred.generated_tweet
    within_length_limit = is_within_length_limit(tweet, 280)
    score = within_length_limit
    return score

def engaging_metric(gold, pred, trace=None):
    tweet = pred.generated_tweet
    engaging = "评估的文本是否构成一个独立的、引人入胜的推文？only give me 'yes' or 'no'"
    engaging = dspy.Predict(AssessTweet)(context='N/A', assessed_text=tweet, assessment_question=engaging)
    engaging = 'yes' in engaging.assessment_answer.lower()
    score = engaging
    return score

def faithful_metric(gold, pred, trace=None):
    context, tweet = pred.context, pred.generated_tweet
    faithful = "评估的文本是否基于上下文,并没有虚假的事实？only give me 'yes' or 'no'"   
    faithful = dspy.Predict(AssessTweet)(context=context, assessed_text=tweet, assessment_question=faithful)
    faithful = 'yes' in faithful.assessment_answer.lower()
    score = faithful
    return score

}- 这里定义了五个内在指标：`no_hashtags_metric`、`is_correct_metric`、`within_length_metric`、`engaging_metric`和`faithful_metric`，分别用于评估推文是否没有主题标签、是否包含正确答案、是否在长度限制内、是否引人入胜以及是否忠实于上下文。`AssessTweet`是一个DSPy签名，用于评估推文的质量。这些指标将用于评估生成的推文是否满足预定义的约束。

- 让我们来深入理解一下：
*   `re.findall(r"#\w+", text)`是什么意思？
    *   这是一个正则表达式，用于查找文本中所有以`#`开头，后面跟着一个或多个字母、数字或下划线的字符串，即主题标签。
*   `assessment_answer.split()[0].lower() == 'yes'`是如何工作的？
    *   这行代码首先将评估答案按空格分割成单词列表，然后取第一个单词并将其转换为小写，最后检查它是否等于`'yes'`。
*   为什么`engaging_metric`和`faithful_metric`使用`dspy.Predict`？
    *   这两个指标使用了`dspy.Predict`来调用语言模型进行评估，因为它们需要模型理解文本的语义才能判断推文是否引人入胜或忠实。

#### 外在指标：评估生成的输出在下游任务中的整体质量和有效性

外在指标被定义为生成的推文在遵循上述约束方面的整体质量，这是通过一个综合指标来评估的。

在保持构成有效推文的最相关的内在指标（正确性和长度限制）的同时，整体综合指标返回 5 个内在指标的平均分数。

In [5]:
def overall_metric(gold, pred, trace=None):
    answer, context, tweet = gold.answer, pred.context, pred.generated_tweet
    no_hashtags = has_no_hashtags(tweet)
    within_length_limit = is_within_length_limit(tweet, 280)
    # correct = has_correct_answer(tweet, answer)
    engaging = "评估的文本是否构成一个独立的、引人入胜的媒体推文？only give me 'yes' or 'no'"
    faithful = "评估的文本是否基于上下文,并没有虚假的事实并回答了问题？only give me 'yes' or 'no'"   
    faithful = dspy.Predict(AssessTweet)(context=context, assessed_text=tweet, assessment_question=faithful)
    engaging = dspy.Predict(AssessTweet)(context='N/A', assessed_text=tweet, assessment_question=engaging)
    engaging, faithful = ['yes' in m.assessment_answer.lower() for m in [engaging, faithful]]
    score = (engaging + faithful + no_hashtags + within_length_limit) if within_length_limit else 0
    return score / 4.0

- `overall_metric`是一个综合指标，它结合了上述五个内在指标。它首先检查推文是否正确且在长度限制内，如果是，则计算五个指标的总和并除以5.0得到平均分。如果推文不正确或超出长度限制，则得分为0。这个指标用于评估推文的整体质量。

*   为什么只有当`correct`和`within_length_limit`都为真时才计算其他指标？
    *   因为这两个指标是最基本的，如果推文不正确或超出长度限制，那么它就不是一条有效的推文，即使它在其他方面表现良好。

因此，我们定义评估如下：

In [8]:
metrics = [no_hashtags_metric,  within_length_metric, engaging_metric, faithful_metric, overall_metric]

for metric in metrics:
    evaluate = Evaluate(metric=metric, devset=devset[:5], num_threads=1, display_progress=True, display_table=5)
    evaluate(tweeter)

  0%|          | 0/5 [00:00<?, ?it/s]

2024/12/26 17:26:47 ERROR dspy.utils.parallelizer: Error processing item Example({'id': '5ae550225542990ba0bbb275', 'question': 'Are both Cangzhou and Qionghai in the Hebei province of China?', 'answer': 'no', 'type': 'comparison', 'context': {'title': ['Langfang', 'Qing County', 'Yanshan County, Hebei', 'Qionghai', 'Wang Zhengyi', 'Hejian', 'Tianjin Maritime Court', 'Iron Lion of Cangzhou', 'Cangzhou', 'Port of Huanghua'], 'sentences': [['Langfang (), is a prefecture-level city of Hebei Province, which was known as Tianjin Prefecture until 1973.', ' Hebei province was renamed Langfang Prefecture after Tianjin became a municipality and finally upgraded into a prefecture-level city in 1988.', ' Langfang is located approximately midway between Beijing and Tianjin.', ' At the 2010 census, the population of Langfang was 4,358,839, of whom 868,066 lived in the built-up ("or metro") area made of Guangyang and Anci districts; its total area is around 6417.28 km² .', ' Langfang borders Baoding

Average Metric: 0.00 / 5 (0.0%):   0%|          | 0/5 [00:00<?, ?it/s]

2024/12/26 17:26:47 ERROR dspy.utils.parallelizer: Error processing item Example({'id': '5a8e69ae5542990e94052afd', 'question': 'Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?', 'answer': 'National Hockey League', 'type': 'bridge', 'context': {'title': ['List of Vegas Golden Knights head coaches', 'List of Vegas Golden Knights draft picks', 'Cody Glass', '2017–18 Pittsburgh Penguins season', 'Erik Brännström', '2017 NHL Expansion Draft', 'Marc-André Fleury', '2017–18 Vegas Golden Knights season', 'Potential National Hockey League expansion', '2011–12 Pittsburgh Penguins season'], 'sentences': [['The Vegas Golden Knights are an American professional ice hockey team based in the Las Vegas metropolitan area.', ' They play in the Pacific Division of the Western Conference in the National Hockey League (NHL).', ' They have played at T-Mobile Arena since their inaugural season in 2017–18.', ' The Golden Knights joined the NHL

Average Metric: 0.00 / 5 (0.0%):  20%|██        | 1/5 [00:00<00:00, 27.78it/s]

2024/12/26 17:26:47 ERROR dspy.utils.parallelizer: Error processing item Example({'id': '5ab85a495542992aa3b8c8bc', 'question': 'The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay Lightning of the National Hockey League (NHL)?', 'answer': 'Steve Yzerman', 'type': 'bridge', 'context': {'title': ['Steve Yzerman', 'Jay Feaster', 'Pat Elynuik', 'Brent Gretzky', 'Steven Stamkos', 'Rick Paterson', 'Jason Lafreniere', '2006–07 Detroit Red Wings season', 'Darren Rumble (ice hockey)', 'Brad Richards'], 'sentences': [['Stephen Gregory "Steve" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager of the Tampa Bay Lightning of the National Hockey League (NHL).', ' He is widely considered to be one of the greatest players of all time.', ' Yzerman spent his entire NHL playing career with the Detroit Red Wings and is a member of the Hockey H

Average Metric: 0.00 / 5 (0.0%):  40%|████      | 2/5 [00:00<00:00, 50.00it/s]

2024/12/26 17:26:47 ERROR dspy.utils.parallelizer: Error processing item Example({'id': '5a90b2815542990a984936af', 'question': 'What river is near the Crichton Collegiate Church?', 'answer': 'the River Tyne', 'type': 'bridge', 'context': {'title': ['West End Collegiate Church', 'Crichton Castle', "St. Martin's Collegiate Church, Opatów", "St. Mary's Collegiate Church Gowran", 'Dean of Wolverhampton', 'Goslar Cathedral', 'Notre Dame de Dinant', 'Crichton Collegiate Church', 'Collegiate church', 'Simeonstift of Trier'], 'sentences': [["The West End Collegiate Church is a church on West End Avenue at 77th Street on Manhattan's Upper West Side.", ' It is part of The Collegiate Reformed Protestant Dutch Church in the City of New York, the oldest Protestant church with a continuing organization in America.', ' The West End Collegiate Church and Collegiate School, which includes the adjacent Collegiate School, is listed on the U.S. National Register of Historic Places.'], ['Crichton Castle i

Average Metric: 0.00 / 5 (0.0%):  60%|██████    | 3/5 [00:00<00:00, 68.18it/s]

AssertionError: No LM is loaded.

- 这段代码使用定义的指标评估`tweeter`模型在开发集上的表现。`Evaluate`类用于执行评估，`num_threads=1`表示使用单个线程进行评估，`display_progress=True`表示显示评估进度，`display_table=5`表示显示前5个评估结果。

让我们看一个生成推文的例子：

In [38]:
example = devset[10]
tweet = tweeter(question=example.question, answer = example.answer)
print('生成的推文: ', tweet.generated_tweet)
tweet.context

生成的推文:  79 年，意大利的“蒙特·韦苏维奥”火山爆发了，是欧洲史上最具灾难性的火山爆发之一。


['Eruption of Mount Vesuvius in 79 | The eruption of Mount Vesuvius in 79 was one of the most catastrophic volcanic eruptions in European history. Historians have learned about the eruption from the eyewitness account of Pliny the Younger, a Roman administrator and poet. It is the namesake for Vesuvian eruptions.',
 'Monte Nuovo | Monte Nuovo ("New Mountain") is a cinder cone volcano within the Campi Flegrei caldera, near Naples, southern Italy. A series of damaging earthquakes and changes in land elevation preceded its only eruption, during the most recent part of the Holocene, which lasted from September 29 to October 6, 1538, when it was formed.',
 'Naples–Salerno high-speed railway | The Naples–Salerno high-speed railway line (also known in Italian as the Linea a Monte del Vesuvio, meaning the "line up Mount Vesuvius") is a link in the Italian high-speed rail network opened in June 2008. The 29 kilometre-long line is one of the new high-speed lines being built to strengthen rail tr

In [39]:
dspy.inspect_history(n=1)





[34m[2024-12-26T14:44:01.922530][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `context` (str): 推特内容

Your output fields are:
1. `reasoning` (str)
2. `tweet_in_Chinese` (str): 推特内容

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## context ## ]]
{context}

[[ ## reasoning ## ]]
{reasoning}

[[ ## tweet_in_Chinese ## ]]
{tweet_in_Chinese}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。


[31mUser message:[0m

[[ ## question ## ]]
What year did the mountain known in Italian as "Monte Vesuvio", erupt?

[[ ## context ## ]]
[1] «Eruption of Mount Vesuvius in 79 | The eruption of Mount Vesuvius in 79 was one of the most catastrophic volcanic eruptions in European history. Historians have learned about the eruption from the eyewitness account of Pliny the Younger, a Roman administrator an

可以先从少量数据开始评估

In [40]:
for metric in metrics:
    evaluate = Evaluate(metric=metric, devset=devset[8:9], num_threads=1, display_progress=True, display_table=5)
    evaluate(tweeter)

Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 56.11it/s]

2024/12/26 14:44:02 INFO dspy.evaluate.evaluate: Average Metric: 1 / 1 (100.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,no_hashtags_metric
0,5adfc84c5542992d7e9f93cf,In which Maine county is Fort Pownall located?,"Waldo County, Maine",bridge,"{'title': ['Fort Frederick (Saint John, New Brunswick)', 'Fort Pow...","{Stockton Springs, Maine, Fort Pownall}",在梅因州，Fort Pownall位于哪个县呢？要找答案，我们需要结合上下文来确定。根据相关信息，Fort Pownall是由Gov...,"[""Fort Point State Park | Fort Point State Park is a public recrea...",✔️ [True]


Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 207.80it/s]

2024/12/26 14:44:02 INFO dspy.evaluate.evaluate: Average Metric: 1 / 1 (100.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,within_length_metric
0,5adfc84c5542992d7e9f93cf,In which Maine county is Fort Pownall located?,"Waldo County, Maine",bridge,"{'title': ['Fort Frederick (Saint John, New Brunswick)', 'Fort Pow...","{Stockton Springs, Maine, Fort Pownall}",在梅因州，Fort Pownall位于哪个县呢？要找答案，我们需要结合上下文来确定。根据相关信息，Fort Pownall是由Gov...,"[""Fort Point State Park | Fort Point State Park is a public recrea...",✔️ [True]


Average Metric: 0.00 / 1 (0.0%): 100%|██████████| 1/1 [00:00<00:00, 126.54it/s]

2024/12/26 14:44:02 INFO dspy.evaluate.evaluate: Average Metric: 0 / 1 (0.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,engaging_metric
0,5adfc84c5542992d7e9f93cf,In which Maine county is Fort Pownall located?,"Waldo County, Maine",bridge,"{'title': ['Fort Frederick (Saint John, New Brunswick)', 'Fort Pow...","{Stockton Springs, Maine, Fort Pownall}",在梅因州，Fort Pownall位于哪个县呢？要找答案，我们需要结合上下文来确定。根据相关信息，Fort Pownall是由Gov...,"[""Fort Point State Park | Fort Point State Park is a public recrea...",


Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 111.11it/s]

2024/12/26 14:44:02 INFO dspy.evaluate.evaluate: Average Metric: 1 / 1 (100.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,faithful_metric
0,5adfc84c5542992d7e9f93cf,In which Maine county is Fort Pownall located?,"Waldo County, Maine",bridge,"{'title': ['Fort Frederick (Saint John, New Brunswick)', 'Fort Pow...","{Stockton Springs, Maine, Fort Pownall}",在梅因州，Fort Pownall位于哪个县呢？要找答案，我们需要结合上下文来确定。根据相关信息，Fort Pownall是由Gov...,"[""Fort Point State Park | Fort Point State Park is a public recrea...",✔️ [True]


Average Metric: 0.75 / 1 (75.0%): 100%|██████████| 1/1 [00:00<00:00, 96.95it/s]

2024/12/26 14:44:02 INFO dspy.evaluate.evaluate: Average Metric: 0.75 / 1 (75.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5adfc84c5542992d7e9f93cf,In which Maine county is Fort Pownall located?,"Waldo County, Maine",bridge,"{'title': ['Fort Frederick (Saint John, New Brunswick)', 'Fort Pow...","{Stockton Springs, Maine, Fort Pownall}",在梅因州，Fort Pownall位于哪个县呢？要找答案，我们需要结合上下文来确定。根据相关信息，Fort Pownall是由Gov...,"[""Fort Point State Park | Fort Point State Park is a public recrea...",✔️ [0.750]


## 【实战-下】检索内容生成媒体推文并进行评估优化

之前我们完成了定义我们的LLM程序流程与评估标准，接下来我们就进入优化我们的程序的步骤：

BootstrapFewShotWithRandomSearch优化

我们跳过BootstrapFewShot，直接使用BootstrapFewShotWithRandomSearch优化。

回顾之前已经讲完的这部分内容。这里主要总结探究一个问题，它的随机搜索体现在哪里呢？


1. **随机种子用于控制数据集的洗牌和采样：**
   - 在 `for seed in range(-3, self.num_candidate_sets):` 循环中，不同的 `seed` 值被用来确定不同的采样和优化方法。

2. **候选方案的随机探索：**
   - 当 `seed >= 0`，会基于随机洗牌的训练集和随机采样大小构建一个 `BootstrapFewShot` 优化器，然后使用优化器生成对应的程序（`program`）。通过这种方式，系统可以生成多个候选方案（即不同的模型或设置）。

3. **逐步筛选和选择最佳方案：**
   - 通过 `Evaluate` 类对每个程序进行评估，计算分数（`score`）。
   - 维护一个分数列表（`scores`），并实时更新当前最佳方案（`best_program`）。
   - 随机探索的本质体现在不同候选方案的随机性，而最终根据评估得分筛选出最优方案。

4. **限制和控制随机搜索的范围：**
   - 搜索范围通过参数控制，例如 `max_bootstrapped_demos` 限制了随机采样的最大数量，`num_candidate_sets` 限制了候选方案的数量。


### 随机搜索逻辑总结：
- **输入随机性**：通过随机种子seed控制数据集洗牌和采样大小。
- **输出评估**：对每个候选程序进行评估，记录得分并选择最佳方案。
- **搜索范围限制**：通过候选数量和采样范围等参数限制搜索空间，避免无序的随机探索。

通过这些机制，BootstrapFewShotWithRandomSearch在BootstrapFewShot的基础上实现了基于随机性的候选模型生成和优化，并使用分数评估进行筛选。

In [13]:
teleprompter = BootstrapFewShotWithRandomSearch(metric = overall_metric, max_bootstrapped_demos=4, num_candidate_programs=6,metric_threshold=1.00)
compiled_tweeter = teleprompter.compile(student = tweeter, teacher = tweeter, trainset=trainset[:20], valset=devset[20:30])

for metric in metrics:
    evaluate = Evaluate(metric=metric, devset=devset[30:40], num_threads=4, display_progress=True, display_table=10)
    evaluate(compiled_tweeter)



Going to sample between 1 and 4 traces per predictor.
Will attempt to bootstrap 6 candidate sets.
Average Metric: 8.50 / 10 (85.0%): 100%|██████████| 10/10 [00:18<00:00,  1.82s/it]

2024/12/26 17:31:31 INFO dspy.evaluate.evaluate: Average Metric: 8.5 / 10 (85.0%)



New best score: 85.0 for seed -3
Scores so far: [85.0]
Best score so far: 85.0
Average Metric: 8.50 / 10 (85.0%): 100%|██████████| 10/10 [00:00<00:00, 1999.38it/s]

2024/12/26 17:31:32 INFO dspy.evaluate.evaluate: Average Metric: 8.5 / 10 (85.0%)



Scores so far: [85.0, 85.0]
Best score so far: 85.0


 45%|████▌     | 9/20 [00:18<00:22,  2.02s/it]


Bootstrapped 4 full traces after 9 examples for up to 1 rounds, amounting to 9 attempts.
Average Metric: 8.25 / 10 (82.5%): 100%|██████████| 10/10 [00:46<00:00,  4.63s/it]

2024/12/26 17:32:37 INFO dspy.evaluate.evaluate: Average Metric: 8.25 / 10 (82.5%)



Scores so far: [85.0, 85.0, 82.5]
Best score so far: 85.0


 75%|███████▌  | 15/20 [00:17<00:05,  1.19s/it]


Bootstrapped 4 full traces after 15 examples for up to 1 rounds, amounting to 15 attempts.
Average Metric: 8.25 / 10 (82.5%): 100%|██████████| 10/10 [00:44<00:00,  4.48s/it]

2024/12/26 17:33:39 INFO dspy.evaluate.evaluate: Average Metric: 8.25 / 10 (82.5%)



Scores so far: [85.0, 85.0, 82.5, 82.5]
Best score so far: 85.0


 50%|█████     | 10/20 [00:02<00:02,  3.41it/s]


Bootstrapped 2 full traces after 10 examples for up to 1 rounds, amounting to 10 attempts.
Average Metric: 8.25 / 10 (82.5%): 100%|██████████| 10/10 [00:31<00:00,  3.11s/it]

2024/12/26 17:34:13 INFO dspy.evaluate.evaluate: Average Metric: 8.25 / 10 (82.5%)



Scores so far: [85.0, 85.0, 82.5, 82.5, 82.5]
Best score so far: 85.0


  5%|▌         | 1/20 [00:00<00:00, 124.73it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Average Metric: 8.75 / 10 (87.5%): 100%|██████████| 10/10 [00:15<00:00,  1.54s/it]

2024/12/26 17:34:29 INFO dspy.evaluate.evaluate: Average Metric: 8.75 / 10 (87.5%)



New best score: 87.5 for seed 2
Scores so far: [85.0, 85.0, 82.5, 82.5, 82.5, 87.5]
Best score so far: 87.5


 15%|█▌        | 3/20 [00:00<00:00, 187.37it/s]


Bootstrapped 2 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts.
Average Metric: 7.75 / 10 (77.5%): 100%|██████████| 10/10 [00:15<00:00,  1.51s/it]

2024/12/26 17:34:44 INFO dspy.evaluate.evaluate: Average Metric: 7.75 / 10 (77.5%)



Scores so far: [85.0, 85.0, 82.5, 82.5, 82.5, 87.5, 77.5]
Best score so far: 87.5


 20%|██        | 4/20 [00:00<00:00, 250.00it/s]


Bootstrapped 2 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Average Metric: 7.50 / 10 (75.0%): 100%|██████████| 10/10 [00:29<00:00,  2.90s/it]

2024/12/26 17:35:13 INFO dspy.evaluate.evaluate: Average Metric: 7.5 / 10 (75.0%)



Scores so far: [85.0, 85.0, 82.5, 82.5, 82.5, 87.5, 77.5, 75.0]
Best score so far: 87.5


 40%|████      | 8/20 [00:00<00:00, 222.15it/s]


Bootstrapped 3 full traces after 8 examples for up to 1 rounds, amounting to 8 attempts.
Average Metric: 8.00 / 10 (80.0%): 100%|██████████| 10/10 [00:46<00:00,  4.66s/it]

2024/12/26 17:36:00 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 10 (80.0%)



Scores so far: [85.0, 85.0, 82.5, 82.5, 82.5, 87.5, 77.5, 75.0, 80.0]
Best score so far: 87.5
9 candidate programs found.
Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:09<00:00,  1.01it/s]

2024/12/26 17:36:10 INFO dspy.evaluate.evaluate: Average Metric: 10 / 10 (100.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,no_hashtags_metric
0,5a824c155542990a1d231f92,The original work by Anton Chekhov involving a disillusioned schoo...,Maria Yermolova,bridge,"{'title': ['On the Harmful Effects of Tobacco', 'Melikhovo', 'Plat...","{Wild Honey (play), Platonov (play)}",我们需要找出与原作“Platonov”相关的受众。 Anton Chekhov 的作品“Platonov”是为 Maria Yerm...,['Wild Honey (play) | Wild Honey is a 1984 adaptation by British p...,✔️ [True]
1,5adc1218554299438c868d32,Are Roswell International Air Center and Pago Pago International A...,no,comparison,"{'title': ['American Samoa Community College', 'Tutuila', 'Inter I...","{Roswell International Air Center, Pago Pago International Airport}",根据提供的信息，我们可以确定 Roswell International Air Center 和 Pago Pago Intern...,['Roswell International Air Center | Roswell International Air Cen...,✔️ [True]
2,5ab65f2455429954757d3265,Untold: The Greatest Sports Stories Never Told was hosted by a spo...,the voice of basketball,bridge,"{'title': ['Gregg Murphy', 'Marv Albert', 'The Greatest Story Neve...","{Marv Albert, Untold: The Greatest Sports Stories Never Told}",Untold: The Greatest Sports Stories Never Told 的主持人是常见的体育播音员。马尔夫·阿...,"[""Untold: The Greatest Sports Stories Never Told | Untold: The Gre...",✔️ [True]
3,5a8803da55429938390d3eca,Are Walt Disney and Sacro GRA both documentry films?,yes,comparison,"{'title': ['List of Disney animated shorts', 'Walt Disney Pictures...","{Sacro GRA, Walt Disney (film)}",根据提供的信息，Sacro GRA 是一部纪录片电影，而 Walt Disney 则是一部关于他的生活和遗产的纪录片电影。因此，Wa...,"['Sacro GRA | Sacro GRA (] , Italian for ""Holy GRA"") is a 2013 Ita...",✔️ [True]
4,5a8f71c75542992414482acf,What is the Palestinian Islamic organization that governs th small...,Hamas,bridge,"{'title': ['Bar Lev Line', 'Zionist Freedom Alliance', 'Status of ...","{Gaza Strip, Status of territories occupied by Israel in 1967}",根据提供的信息， Hamas 是一家巴勒斯坦伊斯兰组织，它于 2007 年接管了加沙地带，是该地区的主要政治和军事权力。,"['Gaza Strip | The Gaza Strip ( ; Arabic: قطاع غزة\u200e \u200e ""Q...",✔️ [True]
5,5a81b2f655429926c1cdad62,What album did the song of which Taylor Swift premiered the music ...,1989,bridge,"{'title': ['Taylor Swift videography', '2017 MTV Video Music Award...","{2015 MTV Video Music Awards, Wildest Dreams (Taylor Swift song)}",根据提供的信息，我们可以确定 Taylor Swift 的歌曲“Bad Blood”是从哪张专辑中 premiered 的。正确答案...,"['Bad Blood (Bastille album) | Bad Blood (stylised as ""BΔD BLOOD"")...",✔️ [True]
6,5ac3bd345542993915413929,"Which is considered a genus level classification, Apera or Gunnera...",Apera,comparison,"{'title': ['Gunnera manicata', 'Northwest Oregon Conference', 'Hyg...","{Apera, Gunnera manicata}",根据提供的信息，我们需要确定哪一类是 genus 级别分类。Gunera 属于属级，而 Gunnera manicata 属于物种级。,"['Gunnera manicata | Gunnera manicata, known as Brazilian giant-rh...",✔️ [True]
7,5ab7498f554299110f219af8,Do The Drums and Pussy Galore play music of similar genres?,no,comparison,"{'title': ['The Honeymoon Killers (American band)', 'Space Cowboy ...","{The Drums, Pussy Galore (band)}",The Drums 和 Pussy Galore 都属于 post-punk revival 乐队，这是一种 indie rock ...,['List of post-punk revival bands | Post-punk revival is a type of...,✔️ [True]
8,5ae6e5875542992ae0d163a0,What is the post-nominal abbreviation for the university where the...,Exon,bridge,"{'title': ['Zeresenay Alemseged', 'Research', 'University of Exete...","{Banded Brothers, University of Exeter}",Banded Mongoose Research Project 基于 University of Exeter。因此，其 post...,"['Banded Brothers | Banded Brothers (also known as ""Banded Brother...",✔️ [True]
9,5ac55af35542993e66e82333,Are both Benjamin Christensen and Len Wiseman directors?,yes,comparison,"{'title': ['Benjamin Christensen', 'Len Wiseman', 'Lady with the L...","{Len Wiseman, Benjamin Christensen}",根据提供的信息， Benjamin Christensen 是一位丹麦电影导演，但 Len Wiseman 的职业方向与 Benja...,['Benjamin Christensen | Benjamin Christensen (28 September 1879 –...,✔️ [True]


Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:00<00:00, 1666.39it/s]

2024/12/26 17:36:10 INFO dspy.evaluate.evaluate: Average Metric: 10 / 10 (100.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,within_length_metric
0,5a824c155542990a1d231f92,The original work by Anton Chekhov involving a disillusioned schoo...,Maria Yermolova,bridge,"{'title': ['On the Harmful Effects of Tobacco', 'Melikhovo', 'Plat...","{Wild Honey (play), Platonov (play)}",我们需要找出与原作“Platonov”相关的受众。 Anton Chekhov 的作品“Platonov”是为 Maria Yerm...,['Wild Honey (play) | Wild Honey is a 1984 adaptation by British p...,✔️ [True]
1,5adc1218554299438c868d32,Are Roswell International Air Center and Pago Pago International A...,no,comparison,"{'title': ['American Samoa Community College', 'Tutuila', 'Inter I...","{Roswell International Air Center, Pago Pago International Airport}",根据提供的信息，我们可以确定 Roswell International Air Center 和 Pago Pago Intern...,['Roswell International Air Center | Roswell International Air Cen...,✔️ [True]
2,5ab65f2455429954757d3265,Untold: The Greatest Sports Stories Never Told was hosted by a spo...,the voice of basketball,bridge,"{'title': ['Gregg Murphy', 'Marv Albert', 'The Greatest Story Neve...","{Marv Albert, Untold: The Greatest Sports Stories Never Told}",Untold: The Greatest Sports Stories Never Told 的主持人是常见的体育播音员。马尔夫·阿...,"[""Untold: The Greatest Sports Stories Never Told | Untold: The Gre...",✔️ [True]
3,5a8803da55429938390d3eca,Are Walt Disney and Sacro GRA both documentry films?,yes,comparison,"{'title': ['List of Disney animated shorts', 'Walt Disney Pictures...","{Sacro GRA, Walt Disney (film)}",根据提供的信息，Sacro GRA 是一部纪录片电影，而 Walt Disney 则是一部关于他的生活和遗产的纪录片电影。因此，Wa...,"['Sacro GRA | Sacro GRA (] , Italian for ""Holy GRA"") is a 2013 Ita...",✔️ [True]
4,5a8f71c75542992414482acf,What is the Palestinian Islamic organization that governs th small...,Hamas,bridge,"{'title': ['Bar Lev Line', 'Zionist Freedom Alliance', 'Status of ...","{Gaza Strip, Status of territories occupied by Israel in 1967}",根据提供的信息， Hamas 是一家巴勒斯坦伊斯兰组织，它于 2007 年接管了加沙地带，是该地区的主要政治和军事权力。,"['Gaza Strip | The Gaza Strip ( ; Arabic: قطاع غزة\u200e \u200e ""Q...",✔️ [True]
5,5a81b2f655429926c1cdad62,What album did the song of which Taylor Swift premiered the music ...,1989,bridge,"{'title': ['Taylor Swift videography', '2017 MTV Video Music Award...","{2015 MTV Video Music Awards, Wildest Dreams (Taylor Swift song)}",根据提供的信息，我们可以确定 Taylor Swift 的歌曲“Bad Blood”是从哪张专辑中 premiered 的。正确答案...,"['Bad Blood (Bastille album) | Bad Blood (stylised as ""BΔD BLOOD"")...",✔️ [True]
6,5ac3bd345542993915413929,"Which is considered a genus level classification, Apera or Gunnera...",Apera,comparison,"{'title': ['Gunnera manicata', 'Northwest Oregon Conference', 'Hyg...","{Apera, Gunnera manicata}",根据提供的信息，我们需要确定哪一类是 genus 级别分类。Gunera 属于属级，而 Gunnera manicata 属于物种级。,"['Gunnera manicata | Gunnera manicata, known as Brazilian giant-rh...",✔️ [True]
7,5ab7498f554299110f219af8,Do The Drums and Pussy Galore play music of similar genres?,no,comparison,"{'title': ['The Honeymoon Killers (American band)', 'Space Cowboy ...","{The Drums, Pussy Galore (band)}",The Drums 和 Pussy Galore 都属于 post-punk revival 乐队，这是一种 indie rock ...,['List of post-punk revival bands | Post-punk revival is a type of...,✔️ [True]
8,5ae6e5875542992ae0d163a0,What is the post-nominal abbreviation for the university where the...,Exon,bridge,"{'title': ['Zeresenay Alemseged', 'Research', 'University of Exete...","{Banded Brothers, University of Exeter}",Banded Mongoose Research Project 基于 University of Exeter。因此，其 post...,"['Banded Brothers | Banded Brothers (also known as ""Banded Brother...",✔️ [True]
9,5ac55af35542993e66e82333,Are both Benjamin Christensen and Len Wiseman directors?,yes,comparison,"{'title': ['Benjamin Christensen', 'Len Wiseman', 'Lady with the L...","{Len Wiseman, Benjamin Christensen}",根据提供的信息， Benjamin Christensen 是一位丹麦电影导演，但 Len Wiseman 的职业方向与 Benja...,['Benjamin Christensen | Benjamin Christensen (28 September 1879 –...,✔️ [True]


Average Metric: 4.00 / 10 (40.0%): 100%|██████████| 10/10 [00:01<00:00,  7.39it/s]

2024/12/26 17:36:11 INFO dspy.evaluate.evaluate: Average Metric: 4 / 10 (40.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,engaging_metric
0,5a824c155542990a1d231f92,The original work by Anton Chekhov involving a disillusioned schoo...,Maria Yermolova,bridge,"{'title': ['On the Harmful Effects of Tobacco', 'Melikhovo', 'Plat...","{Wild Honey (play), Platonov (play)}",我们需要找出与原作“Platonov”相关的受众。 Anton Chekhov 的作品“Platonov”是为 Maria Yerm...,['Wild Honey (play) | Wild Honey is a 1984 adaptation by British p...,
1,5adc1218554299438c868d32,Are Roswell International Air Center and Pago Pago International A...,no,comparison,"{'title': ['American Samoa Community College', 'Tutuila', 'Inter I...","{Roswell International Air Center, Pago Pago International Airport}",根据提供的信息，我们可以确定 Roswell International Air Center 和 Pago Pago Intern...,['Roswell International Air Center | Roswell International Air Cen...,✔️ [True]
2,5ab65f2455429954757d3265,Untold: The Greatest Sports Stories Never Told was hosted by a spo...,the voice of basketball,bridge,"{'title': ['Gregg Murphy', 'Marv Albert', 'The Greatest Story Neve...","{Marv Albert, Untold: The Greatest Sports Stories Never Told}",Untold: The Greatest Sports Stories Never Told 的主持人是常见的体育播音员。马尔夫·阿...,"[""Untold: The Greatest Sports Stories Never Told | Untold: The Gre...",✔️ [True]
3,5a8803da55429938390d3eca,Are Walt Disney and Sacro GRA both documentry films?,yes,comparison,"{'title': ['List of Disney animated shorts', 'Walt Disney Pictures...","{Sacro GRA, Walt Disney (film)}",根据提供的信息，Sacro GRA 是一部纪录片电影，而 Walt Disney 则是一部关于他的生活和遗产的纪录片电影。因此，Wa...,"['Sacro GRA | Sacro GRA (] , Italian for ""Holy GRA"") is a 2013 Ita...",✔️ [True]
4,5a8f71c75542992414482acf,What is the Palestinian Islamic organization that governs th small...,Hamas,bridge,"{'title': ['Bar Lev Line', 'Zionist Freedom Alliance', 'Status of ...","{Gaza Strip, Status of territories occupied by Israel in 1967}",根据提供的信息， Hamas 是一家巴勒斯坦伊斯兰组织，它于 2007 年接管了加沙地带，是该地区的主要政治和军事权力。,"['Gaza Strip | The Gaza Strip ( ; Arabic: قطاع غزة\u200e \u200e ""Q...",
5,5a81b2f655429926c1cdad62,What album did the song of which Taylor Swift premiered the music ...,1989,bridge,"{'title': ['Taylor Swift videography', '2017 MTV Video Music Award...","{2015 MTV Video Music Awards, Wildest Dreams (Taylor Swift song)}",根据提供的信息，我们可以确定 Taylor Swift 的歌曲“Bad Blood”是从哪张专辑中 premiered 的。正确答案...,"['Bad Blood (Bastille album) | Bad Blood (stylised as ""BΔD BLOOD"")...",✔️ [True]
6,5ac3bd345542993915413929,"Which is considered a genus level classification, Apera or Gunnera...",Apera,comparison,"{'title': ['Gunnera manicata', 'Northwest Oregon Conference', 'Hyg...","{Apera, Gunnera manicata}",根据提供的信息，我们需要确定哪一类是 genus 级别分类。Gunera 属于属级，而 Gunnera manicata 属于物种级。,"['Gunnera manicata | Gunnera manicata, known as Brazilian giant-rh...",
7,5ab7498f554299110f219af8,Do The Drums and Pussy Galore play music of similar genres?,no,comparison,"{'title': ['The Honeymoon Killers (American band)', 'Space Cowboy ...","{The Drums, Pussy Galore (band)}",The Drums 和 Pussy Galore 都属于 post-punk revival 乐队，这是一种 indie rock ...,['List of post-punk revival bands | Post-punk revival is a type of...,
8,5ae6e5875542992ae0d163a0,What is the post-nominal abbreviation for the university where the...,Exon,bridge,"{'title': ['Zeresenay Alemseged', 'Research', 'University of Exete...","{Banded Brothers, University of Exeter}",Banded Mongoose Research Project 基于 University of Exeter。因此，其 post...,"['Banded Brothers | Banded Brothers (also known as ""Banded Brother...",
9,5ac55af35542993e66e82333,Are both Benjamin Christensen and Len Wiseman directors?,yes,comparison,"{'title': ['Benjamin Christensen', 'Len Wiseman', 'Lady with the L...","{Len Wiseman, Benjamin Christensen}",根据提供的信息， Benjamin Christensen 是一位丹麦电影导演，但 Len Wiseman 的职业方向与 Benja...,['Benjamin Christensen | Benjamin Christensen (28 September 1879 –...,


Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:02<00:00,  3.83it/s]

2024/12/26 17:36:14 INFO dspy.evaluate.evaluate: Average Metric: 10 / 10 (100.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,faithful_metric
0,5a824c155542990a1d231f92,The original work by Anton Chekhov involving a disillusioned schoo...,Maria Yermolova,bridge,"{'title': ['On the Harmful Effects of Tobacco', 'Melikhovo', 'Plat...","{Wild Honey (play), Platonov (play)}",我们需要找出与原作“Platonov”相关的受众。 Anton Chekhov 的作品“Platonov”是为 Maria Yerm...,['Wild Honey (play) | Wild Honey is a 1984 adaptation by British p...,✔️ [True]
1,5adc1218554299438c868d32,Are Roswell International Air Center and Pago Pago International A...,no,comparison,"{'title': ['American Samoa Community College', 'Tutuila', 'Inter I...","{Roswell International Air Center, Pago Pago International Airport}",根据提供的信息，我们可以确定 Roswell International Air Center 和 Pago Pago Intern...,['Roswell International Air Center | Roswell International Air Cen...,✔️ [True]
2,5ab65f2455429954757d3265,Untold: The Greatest Sports Stories Never Told was hosted by a spo...,the voice of basketball,bridge,"{'title': ['Gregg Murphy', 'Marv Albert', 'The Greatest Story Neve...","{Marv Albert, Untold: The Greatest Sports Stories Never Told}",Untold: The Greatest Sports Stories Never Told 的主持人是常见的体育播音员。马尔夫·阿...,"[""Untold: The Greatest Sports Stories Never Told | Untold: The Gre...",✔️ [True]
3,5a8803da55429938390d3eca,Are Walt Disney and Sacro GRA both documentry films?,yes,comparison,"{'title': ['List of Disney animated shorts', 'Walt Disney Pictures...","{Sacro GRA, Walt Disney (film)}",根据提供的信息，Sacro GRA 是一部纪录片电影，而 Walt Disney 则是一部关于他的生活和遗产的纪录片电影。因此，Wa...,"['Sacro GRA | Sacro GRA (] , Italian for ""Holy GRA"") is a 2013 Ita...",✔️ [True]
4,5a8f71c75542992414482acf,What is the Palestinian Islamic organization that governs th small...,Hamas,bridge,"{'title': ['Bar Lev Line', 'Zionist Freedom Alliance', 'Status of ...","{Gaza Strip, Status of territories occupied by Israel in 1967}",根据提供的信息， Hamas 是一家巴勒斯坦伊斯兰组织，它于 2007 年接管了加沙地带，是该地区的主要政治和军事权力。,"['Gaza Strip | The Gaza Strip ( ; Arabic: قطاع غزة\u200e \u200e ""Q...",✔️ [True]
5,5a81b2f655429926c1cdad62,What album did the song of which Taylor Swift premiered the music ...,1989,bridge,"{'title': ['Taylor Swift videography', '2017 MTV Video Music Award...","{2015 MTV Video Music Awards, Wildest Dreams (Taylor Swift song)}",根据提供的信息，我们可以确定 Taylor Swift 的歌曲“Bad Blood”是从哪张专辑中 premiered 的。正确答案...,"['Bad Blood (Bastille album) | Bad Blood (stylised as ""BΔD BLOOD"")...",✔️ [True]
6,5ac3bd345542993915413929,"Which is considered a genus level classification, Apera or Gunnera...",Apera,comparison,"{'title': ['Gunnera manicata', 'Northwest Oregon Conference', 'Hyg...","{Apera, Gunnera manicata}",根据提供的信息，我们需要确定哪一类是 genus 级别分类。Gunera 属于属级，而 Gunnera manicata 属于物种级。,"['Gunnera manicata | Gunnera manicata, known as Brazilian giant-rh...",✔️ [True]
7,5ab7498f554299110f219af8,Do The Drums and Pussy Galore play music of similar genres?,no,comparison,"{'title': ['The Honeymoon Killers (American band)', 'Space Cowboy ...","{The Drums, Pussy Galore (band)}",The Drums 和 Pussy Galore 都属于 post-punk revival 乐队，这是一种 indie rock ...,['List of post-punk revival bands | Post-punk revival is a type of...,✔️ [True]
8,5ae6e5875542992ae0d163a0,What is the post-nominal abbreviation for the university where the...,Exon,bridge,"{'title': ['Zeresenay Alemseged', 'Research', 'University of Exete...","{Banded Brothers, University of Exeter}",Banded Mongoose Research Project 基于 University of Exeter。因此，其 post...,"['Banded Brothers | Banded Brothers (also known as ""Banded Brother...",✔️ [True]
9,5ac55af35542993e66e82333,Are both Benjamin Christensen and Len Wiseman directors?,yes,comparison,"{'title': ['Benjamin Christensen', 'Len Wiseman', 'Lady with the L...","{Len Wiseman, Benjamin Christensen}",根据提供的信息， Benjamin Christensen 是一位丹麦电影导演，但 Len Wiseman 的职业方向与 Benja...,['Benjamin Christensen | Benjamin Christensen (28 September 1879 –...,✔️ [True]


Average Metric: 8.25 / 10 (82.5%): 100%|██████████| 10/10 [00:03<00:00,  2.80it/s]

2024/12/26 17:36:18 INFO dspy.evaluate.evaluate: Average Metric: 8.25 / 10 (82.5%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5a824c155542990a1d231f92,The original work by Anton Chekhov involving a disillusioned schoo...,Maria Yermolova,bridge,"{'title': ['On the Harmful Effects of Tobacco', 'Melikhovo', 'Plat...","{Wild Honey (play), Platonov (play)}",我们需要找出与原作“Platonov”相关的受众。 Anton Chekhov 的作品“Platonov”是为 Maria Yerm...,['Wild Honey (play) | Wild Honey is a 1984 adaptation by British p...,✔️ [0.750]
1,5adc1218554299438c868d32,Are Roswell International Air Center and Pago Pago International A...,no,comparison,"{'title': ['American Samoa Community College', 'Tutuila', 'Inter I...","{Roswell International Air Center, Pago Pago International Airport}",根据提供的信息，我们可以确定 Roswell International Air Center 和 Pago Pago Intern...,['Roswell International Air Center | Roswell International Air Cen...,✔️ [0.750]
2,5ab65f2455429954757d3265,Untold: The Greatest Sports Stories Never Told was hosted by a spo...,the voice of basketball,bridge,"{'title': ['Gregg Murphy', 'Marv Albert', 'The Greatest Story Neve...","{Marv Albert, Untold: The Greatest Sports Stories Never Told}",Untold: The Greatest Sports Stories Never Told 的主持人是常见的体育播音员。马尔夫·阿...,"[""Untold: The Greatest Sports Stories Never Told | Untold: The Gre...",✔️ [1.000]
3,5a8803da55429938390d3eca,Are Walt Disney and Sacro GRA both documentry films?,yes,comparison,"{'title': ['List of Disney animated shorts', 'Walt Disney Pictures...","{Sacro GRA, Walt Disney (film)}",根据提供的信息，Sacro GRA 是一部纪录片电影，而 Walt Disney 则是一部关于他的生活和遗产的纪录片电影。因此，Wa...,"['Sacro GRA | Sacro GRA (] , Italian for ""Holy GRA"") is a 2013 Ita...",✔️ [1.000]
4,5a8f71c75542992414482acf,What is the Palestinian Islamic organization that governs th small...,Hamas,bridge,"{'title': ['Bar Lev Line', 'Zionist Freedom Alliance', 'Status of ...","{Gaza Strip, Status of territories occupied by Israel in 1967}",根据提供的信息， Hamas 是一家巴勒斯坦伊斯兰组织，它于 2007 年接管了加沙地带，是该地区的主要政治和军事权力。,"['Gaza Strip | The Gaza Strip ( ; Arabic: قطاع غزة\u200e \u200e ""Q...",✔️ [0.750]
5,5a81b2f655429926c1cdad62,What album did the song of which Taylor Swift premiered the music ...,1989,bridge,"{'title': ['Taylor Swift videography', '2017 MTV Video Music Award...","{2015 MTV Video Music Awards, Wildest Dreams (Taylor Swift song)}",根据提供的信息，我们可以确定 Taylor Swift 的歌曲“Bad Blood”是从哪张专辑中 premiered 的。正确答案...,"['Bad Blood (Bastille album) | Bad Blood (stylised as ""BΔD BLOOD"")...",✔️ [1.000]
6,5ac3bd345542993915413929,"Which is considered a genus level classification, Apera or Gunnera...",Apera,comparison,"{'title': ['Gunnera manicata', 'Northwest Oregon Conference', 'Hyg...","{Apera, Gunnera manicata}",根据提供的信息，我们需要确定哪一类是 genus 级别分类。Gunera 属于属级，而 Gunnera manicata 属于物种级。,"['Gunnera manicata | Gunnera manicata, known as Brazilian giant-rh...",✔️ [0.750]
7,5ab7498f554299110f219af8,Do The Drums and Pussy Galore play music of similar genres?,no,comparison,"{'title': ['The Honeymoon Killers (American band)', 'Space Cowboy ...","{The Drums, Pussy Galore (band)}",The Drums 和 Pussy Galore 都属于 post-punk revival 乐队，这是一种 indie rock ...,['List of post-punk revival bands | Post-punk revival is a type of...,✔️ [0.750]
8,5ae6e5875542992ae0d163a0,What is the post-nominal abbreviation for the university where the...,Exon,bridge,"{'title': ['Zeresenay Alemseged', 'Research', 'University of Exete...","{Banded Brothers, University of Exeter}",Banded Mongoose Research Project 基于 University of Exeter。因此，其 post...,"['Banded Brothers | Banded Brothers (also known as ""Banded Brother...",✔️ [0.750]
9,5ac55af35542993e66e82333,Are both Benjamin Christensen and Len Wiseman directors?,yes,comparison,"{'title': ['Benjamin Christensen', 'Len Wiseman', 'Lady with the L...","{Len Wiseman, Benjamin Christensen}",根据提供的信息， Benjamin Christensen 是一位丹麦电影导演，但 Len Wiseman 的职业方向与 Benja...,['Benjamin Christensen | Benjamin Christensen (28 September 1879 –...,✔️ [0.750]


优化后的模型会有一些few shot，这些答案都是获得了满分的，因为我们设置了metric_threshold=1.00即满分的自举的demo（shot）才会被收录进来

In [15]:
# 查看优化后的模型中的示例（Demos）
for predictor in compiled_tweeter.predictors():
    print(f"Predictor: {predictor}")
    print("Demos:")
    for demo in predictor.demos:
        print(f"{demo}")
    print("\n")


Predictor: Predict(StringSignature(question, context -> rationale, tweet_in_Chinese
    instructions='生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。'
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    context = Field(annotation=str required=True json_schema_extra={'desc': '推特内容', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    rationale = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${produce the tweet_in_Chinese}. We ...', '__dspy_field_type': 'output'})
    tweet_in_Chinese = Field(annotation=str required=True json_schema_extra={'desc': '推特内容', '__dspy_field_type': 'output', 'prefix': 'Tweet In  Chinese:'})
))
Demos:
Example({'augmented': True, 'question': 'Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?', 'context': ['Rosario Dawson 

之前的两个自举优化器把重心放在了加入few shot示例上了。下面会新讲两个优化器：COPRO（注重优化prompt）和MIPROv2（prompt和few shot结合）

### 优化器：COPRO

`COPRO` 提示优化器通过迭代地生成新的prompt指令、评估这些指令的性能，并选择最佳指令来优化 LLM 的输出。它使用了两个签名来指导prompt的生成：`BasicGenerateInstruction` 用于生成初始prompt，`GenerateInstructionGivenAttempts` 用于基于先前的尝试生成改进的prompt。

In [26]:
"""
使用建议 (USAGE SUGGESTIONS):

以下代码可用于编译一个优化的签名prompt优化器 (teleprompter), 并在一个最终任务上评估它：

teleprompter = COPRO(prompt_model=prompt_model, metric=metric, breadth=BREADTH, depth=DEPTH, init_temperature=INIT_TEMPERATURE)
kwargs = dict(num_threads=NUM_THREADS, display_progress=True, display_table=0)
compiled_prompt_opt = teleprompter.compile(program.deepcopy(), trainset=trainset[:DEV_NUM], eval_kwargs=kwargs)
eval_score = evaluate(compiled_prompt_opt, devset=evalset[:EVAL_NUM], **kwargs)

请注意，此prompt优化器接受以下参数：

meters:

* prompt_model: 用于生成prompt的模型。如果未指定，则默认为设置中设置的模型 (例如 dspy.settings.configure(lm=task_model))。
* metric: 用于优化的任务指标。
* breadth: 每次迭代生成的新prompt的数量。默认值=10。
* depth: 要求prompt模型生成新prompt的次数，并将过去prompt的历史作为输入。默认值=3。
* init_temperature: 用于生成新prompt的温度。值越高，生成的prompt越有创造性。默认值=1.4。
* track_stats: 指示该方法是否跟踪有关优化过程的统计信息。
                如果为 True，该方法将跟踪以下统计信息：
                    * results_best: 每个预测器在每个深度的最佳 10 个分数的最小值、最大值、平均值、标准差。
                    * results_latest: 每个预测器在每个深度的最新prompt分数的最小值、最大值、平均值、标准差。
                    * total_calls: 对任务指标的总调用次数。
                这些统计信息将作为最佳程序的属性返回。
"""

'\n使用建议 (USAGE SUGGESTIONS):\n\n以下代码可用于编译一个优化的签名提示优化器 (teleprompter), 并在一个最终任务上评估它：\n\nteleprompter = COPRO(prompt_model=prompt_model, metric=metric, breadth=BREADTH, depth=DEPTH, init_temperature=INIT_TEMPERATURE)\nkwargs = dict(num_threads=NUM_THREADS, display_progress=True, display_table=0)\ncompiled_prompt_opt = teleprompter.compile(program.deepcopy(), trainset=trainset[:DEV_NUM], eval_kwargs=kwargs)\neval_score = evaluate(compiled_prompt_opt, devset=evalset[:EVAL_NUM], **kwargs)\n\n请注意，此提示优化器接受以下参数：\n\nmeters:\n\n* prompt_model: 用于生成提示的模型。如果未指定，则默认为设置中设置的模型 (例如 dspy.settings.configure(lm=task_model))。\n* metric: 用于优化的任务指标。\n* breadth: 每次迭代生成的新提示的数量。默认值=10。\n* depth: 要求提示模型生成新提示的次数，并将过去提示的历史作为输入。默认值=3。\n* init_temperature: 用于生成新提示的温度。值越高，生成的提示越有创造性。默认值=1.4。\n* track_stats: 指示该方法是否跟踪有关优化过程的统计信息。\n                如果为 True，该方法将跟踪以下统计信息：\n                    * results_best: 每个预测器在每个深度的最佳 10 个分数的最小值、最大值、平均值、标准差。\n                    * results_latest: 每个预测器在每个深度的最新提示分数的最小值、最大值、平均值、标准差。\n

In [27]:
class BasicGenerateInstruction(Signature):
    """
    你是一个大型语言模型的指令优化器。我将给你一个包含字段（输入和输出）的英文“签名”。你的任务是提出一个指令，该指令将引导一个好的语言模型很好地执行任务。不要害怕发挥创造力。

    You are an instruction optimizer for large language models. I will give you a ``signature`` of fields (inputs and outputs) in English. Your task is to propose an instruction that will lead a good language model to perform the task well. Don't be afraid to be creative.
    """

    # 优化前的初始指令
    basic_instruction = dspy.InputField(desc="The initial instructions before optimization")
    # 针对语言模型的改进指令
    proposed_instruction = dspy.OutputField(desc="The improved instructions for the language model")
    # prompt末尾的字符串，有助于模型开始解决任务
    proposed_prefix_for_output_field = dspy.OutputField(
        desc="The string at the end of the prompt, which will help the model start solving the task",
    )

class GenerateInstructionGivenAttempts(dspy.Signature):
    """
    你是一个大型语言模型的指令优化器。我将给出一些我尝试过的任务指令，以及它们对应的验证分数。这些指令根据它们的分数按升序排列，其中较高的分数表示较好的质量。

    你的任务是提出一个新的指令，该指令将引导一个好的语言模型更好地执行任务。不要害怕发挥创造力。

    You are an instruction optimizer for large language models. I will give some task instructions I've tried, along with their corresponding validation scores. The instructions are arranged in increasing order based on their scores, where higher scores indicate better quality.

    Your task is to propose a new instruction that will lead a good language model to perform the task even better. Don't be afraid to be creative.
    """

    # 已尝试过的指令及其分数
    attempted_instructions = dspy.InputField(format=dsp.passages2text, desc="已尝试过的指令及其分数")
    # 针对语言模型的改进指令
    proposed_instruction = dspy.OutputField(desc="The improved instructions for the language model")
    # prompt末尾的字符串，有助于模型开始解决任务
    proposed_prefix_for_output_field = dspy.OutputField(
        desc="The string at the end of the prompt, which will help the model start solving the task",
    )

NameError: name 'Signature' is not defined

COPRO 采用了一种类似于束搜索的方法，在每一层深度中探索多个候选prompt，并根据评估指标选择最佳prompt。

可以理解它的本质就是通过反复地生成、测试、筛选新的指令 (instructions) 和前缀 (prefix)，从而得到更优的 Prompt 策略。

`COPRO` 继承自 `Teleprompter`，主要功能是**优化每个 Predictor 的指令**。这里的“指令”指的是 `signature.instructions`，而“前缀”指的是 `signature.fields[...]` 中的 `json_schema_extra["prefix"]` 字段。

- 每次会生成多条候选指令 (由 `BasicGenerateInstruction` 或 `GenerateInstructionGivenAttempts` 生成)。
- 用给定的 `metric` 评估这些候选指令在 `trainset` 上的表现，得出分数。
- 将分数最好的那条“指令+前缀”写回对应的 Predictor。

最终，返回一个“最佳版本”的 `program`（包括对各 predictor 的指令修改结果）。

- **源码流程解析**

**关键参数:**

* `breadth`: 每次迭代生成的候选prompt数量。更大的宽度可以探索更多可能性，但也会增加计算成本。
* `depth`: 迭代的深度。更深的深度可以进行更精细的优化，但也会增加计算成本。
* `init_temperature`:  控制生成候选prompt的多样性。更高的温度会生成更多样化的prompt。

**迭代优化**:
核心循环在

```python
for d in range(self.depth):
    ...
```

这会迭代 `depth` 次，每次做如下步骤：

- 评估当前候选

对每个 Predictor，将 `candidates_ = latest_candidates[id(p_old)]` 逐一取出；对其中每条候选 `(instruction, prefix)`：

1. **把它写入 predictor**
    
    ```python
    updated_signature = (
        self._get_signature(p_new)
        .with_instructions(instruction)
        .with_updated_fields(last_key, prefix=prefix)
    )
    self._set_signature(p_new, updated_signature)
    ```
    
2. **调用 `evaluate` 得到分数**
    
    ```python
    score = evaluate(module_clone, devset=trainset, **eval_kwargs)
    ```
    
3. **更新 `evaluated_candidates`**  
    记录 `(instruction, prefix) -> {"score":..., "program":..., ...}`，以便后面挑出最优者。

评估完后，会拿到若干 `(instruction, prefix, score)` 的结果。其中分数最高的一条被视作该 Predictor 的最优 Prompt，立即写回 `predictor`：

```python
best_candidate = max(evaluated_candidates[id(p_old)].values(), key=lambda candidate: candidate["score"])
...
self._set_signature(p_new, updated_signature)
```

这样，第一个大改动就发生了：**Predictor 的指令和前缀被替换成分数最好的候选**。

- 生成下一批候选

如果不是最后一轮 (`d < self.depth - 1`)，就要基于“当前最佳”再生成新的一批prompt。这里使用 `GenerateInstructionGivenAttempts`，其中会把“已尝试过的指令、分数”整理成一个文本列表 `attempts`，作为输入；然后让模型生成新的候选指令。

```python
instr = dspy.Predict(
    GenerateInstructionGivenAttempts,
    n=self.breadth,
    temperature=self.init_temperature,
)(attempted_instructions=attempts)
```

返回的 `instr.completions` 又是一批新的 `(proposed_instruction, proposed_prefix_for_output_field)`，存到 `new_candidates[id(p_base)]`。然后我们将 `latest_candidates` 指向 `new_candidates`，这样在下一轮 (`d+1`) 就会评估这批新候选。

**这一过程对应第二个大改动**：**我们将不同轮次的“最好指令”当作上下文，把它们拼接成 `attempts` 来生成下一波候选**，从而实现多轮递进优化。

-- 去重

在全部轮次结束后，代码会把所有候选统一收集并排序：

```python
candidates = []
for predictor in module.predictors():
    candidates.extend(list(evaluated_candidates[id(predictor)].values()))
candidates.sort(key=lambda x: x["score"], reverse=True)
candidates = self._drop_duplicates(candidates)
```

`_drop_duplicates` 用于把“重复”的指令-前缀组合过滤掉，避免同样的 Prompt 带来相同结果。

-- 选出最佳

最后一行：

```python
best_candidate = max(evaluated_candidates[id(p_old)].values(), key=lambda candidate: candidate["score"])

```

拿评分最高的那个程序作为最终输出的“最优模型”。并且给它加一些附加属性，如 `best_program.candidate_programs = candidates`、`best_program.total_calls = total_calls`。如果 `track_stats=True`，还会把统计信息（如 max/avg/min/std）一并储存在 `best_program.results_best` 等字段里。

它根据 candidate["score"] 求最大值，得到分数最高的那个候选 Prompt 作为最优版本。至于代码中记录的 min / avg / std 这些统计值，只是用来做分析或后续可视化，并不影响最终的筛选逻辑。

**第三个大改动**：最终我们会在返回的 `program` 对象上**新增**了若干辅助字段（`candidate_programs`, `total_calls`, `results_best`, `results_latest`），这些是为了后期分析或可视化时能够知道“都试了哪些指令”“得分如何”。


**工作流程:**

1. **初始阶段**：用 `BasicGenerateInstruction` 生成初始候选、存进 `candidates`。
2. **多轮循环（depth 次）**：
    - 取出现有候选，逐条写入到 predictor，调用 `evaluate` 得到分数，存入 `evaluated_candidates`。
    - 将分数最高的（说明最优指令+前缀）写回 predictor，使它在后续迭代中生效。
    - 根据历史尝试和分数，通过 `GenerateInstructionGivenAttempts` 生成新一轮候选。
    - 重复以上操作。
3. **最后**：
    - 收集所有评估结果，按分数排序并去重。
    - 取出分数最高者作为 `best_program`。
    - 把它与统计信息打包返回。



In [None]:
for d in range(self.depth):                              # 逐轮深度优化
    for p_i, (p_old, p_new) in enumerate(zip(...)):       # 遍历每个 predictor
        candidates_ = latest_candidates[id(p_old)]
        for c_i, c in enumerate(candidates_):
            # 1) 将 (instruction, prefix) 写入 predictor
            # 2) evaluate(...)
            # 3) 在 evaluated_candidates 中更新记录
            # 4) 如果是更好的结果就替换，否则忽略
            ...
        # 选出得分最高的作为本轮 p_new 的最终版本
    # 如果不是最后一轮，就生成下一批候选 (GenerateInstructionGivenAttempts)



**源码中的一些细节**

*   **最核心、最直接的“改动是什么？**
*   Predictor 的 `signature.instructions` 与 `signature.fields[...]` 的前缀字符串**，在每个迭代轮次可能被替换成更好的版本。除此之外，还会在最终结果里额外存储一些“最佳候选队列”和统计数据。这样，每个 Predictor 都得到了更优的 Prompt（包含指令 + 前缀），完成了所谓“多轮 Prompt 优化”。

*   **为什么 `breadth` 必须大于 1？**
    *   因为 `breadth` 代表了每次迭代生成的新prompt数量。如果 `breadth` 为 1，则每次迭代只生成一个新prompt，这将限制搜索空间，可能无法找到最佳prompt。
*   **`depth` 的作用是什么？**
    *   `depth` 代表了迭代的次数。每次迭代都会基于前一次迭代的结果生成新的prompt。`depth` 越大，搜索的深度就越大，可能找到更好的prompt，但也需要更长的时间。
*   **为什么需要 `_check_candidates_equal` 和 `_drop_duplicates` 方法？**
    *   这两个方法都是为了避免重复评估相同的prompt，提高效率。`_check_candidates_equal` 检查两个候选程序是否相同，`_drop_duplicates` 删除重复的候选。
*   **`compile` 方法中的 `module.deepcopy()` 的作用是什么？**
    *   `module.deepcopy()` 创建了 `student` 程序的一个深拷贝。这是为了避免在优化过程中修改原始程序。
*   **`compile` 方法中的 `evaluate` 对象的作用是什么？**
    *   `evaluate` 对象用于评估程序在训练集上的性能。它使用 `self.metric` 函数来计算程序的分数。
*   **`compile` 方法中的 `total_calls` 变量的作用是什么？**
    *   `total_calls` 变量用于记录评估的总次数。这可以用来衡量优化过程的计算成本。
*   **`compile` 方法中的 `results_best` 和 `results_latest` 字典的作用是什么？**
    *   这两个字典用于跟踪优化过程的统计信息。`results_best` 记录了每个预测器在每个深度的前 10 个分数的统计信息，`results_latest` 记录了每个预测器在每个深度的最新prompt分数的统计信息。

In [45]:
from dspy.teleprompt import COPRO
import litellm
litellm.drop_params=True
# 创建 COPRO 优化器实例
COPRO_teleprompter = COPRO(
    metric=overall_metric,
    breadth=15,  #  宽度
    depth=6,    #  深度
    track_stats=True,
    max_errors=10,
    init_temperature=1.8,
    verbose=True,              # 是否打印详细日志
)


NUM_THREADS = 4
compiled_COPRO_tweet = COPRO_teleprompter.compile(
    student = tweeter, 
    trainset=trainset[:25], #  使用一小部分训练数据进行演示
    eval_kwargs=dict(num_threads=NUM_THREADS, display_progress=True, display_table=20),
)


# 打印统计信息
print("最佳结果统计信息：", compiled_COPRO_tweet.results_best)  # 最佳结果统计信息：
print("最新结果统计信息：", compiled_COPRO_tweet.results_latest)  # 最新结果统计信息：
print("指标调用总次数：", compiled_COPRO_tweet.total_calls)  # 指标调用总次数：
evaluate = Evaluate(devset=devset[30:45], metric=overall_metric, num_threads=NUM_THREADS, display_progress=True, display_table=20,provide_traceback=True)

evaluate(compiled_COPRO_tweet)

2024/12/26 14:57:21 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 1/6 轮指令优化 *********
2024/12/26 14:57:21 INFO dspy.teleprompt.copro_optimizer: [候选 1/2]
2024/12/26 14:57:21 INFO dspy.teleprompt.copro_optimizer:   指令: 创建一条引人入胜的推文，回答问题以清晰易懂，注意文本不超过280个字符，且未带主题标签
2024/12/26 14:57:21 INFO dspy.teleprompt.copro_optimizer:   前缀: __generate_compelling_twentyEighteen
2024/12/26 14:57:21 INFO dspy.teleprompt.copro_optimizer: At Depth 1/6, Evaluating Prompt Candidate #1/2 for Predictor 1 of 1.


Average Metric: 17.50 / 25 (70.0%): 100%|██████████| 25/25 [00:32<00:00,  1.29s/it]

2024/12/26 14:57:53 INFO dspy.evaluate.evaluate: Average Metric: 17.5 / 25 (70.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手 Townes Van Zandt 在 1987 年发行了专辑“At My Window”。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",美国演员 Candace Kita 在多部电视剧和电影中 guest starred，但并未提及她与哪个美国演员一起合作。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.500]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self 哪个出版物最_recently 出版？根据提供的信息，Self 是 2017 年发...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",英國記者兼節目主持人 jeremy paxman 出生於1944年，為《維多利亞人 - 他們的故事在圖片中的系列》作家。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章出现在《太空道》和《西南艺术》等杂志上。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA-charity shield 比赛是曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",[1] «新大陆 skyscraper | 新大陆是位于纽约市五大道之间的102层高楼大厦，高达1454 ft（含天线），以帝国州为...,['Empire State Building | The Empire State Building is a 102-story...,
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在青春期剧片《Kids》中首次亮相，并成为Voto Latino的...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",Tombstone Territory 是一部美国西部电视剧，由 Pat Conway 和 Richard Eastham 主演。该...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.750]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在东前线的第二次世界大战行动，包括102步兵师，开始于Operation Mars。,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 14:57:53 INFO dspy.teleprompt.copro_optimizer: [候选 2/2]
2024/12/26 14:57:53 INFO dspy.teleprompt.copro_optimizer:   指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 14:57:53 INFO dspy.teleprompt.copro_optimizer:   前缀: Tweet In  Chinese:
2024/12/26 14:57:53 INFO dspy.teleprompt.copro_optimizer: At Depth 1/6, Evaluating Prompt Candidate #2/2 for Predictor 1 of 1.


Average Metric: 20.00 / 25 (80.0%): 100%|██████████| 25/25 [00:00<00:00, 833.30it/s]

2024/12/26 14:57:53 INFO dspy.evaluate.evaluate: Average Metric: 20.0 / 25 (80.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter 发布了“At My Window”。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",据了解，Candace Kita 在剧集《头目》中 guest starred，与 Jilly Kitzinger 相同的是一位美国演员。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.500]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self 哪个最近出版呢？根据提供的信息，Who Put the Bomp 是一首 1961...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",「维多利亚人 - 他们的故事在图片中的故事」是一部2009年的英国纪录片系列，关注维多利亚艺术和文化。四集系列由Jeremy Pax...,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Tae Kwon Do Times 或 Southwest Art 等杂志都可能发表 Scott Shaw 的文章。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA-charity shield 比赛是曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",新 York市的 Empire State Building 高度为 1454 ft，而 Bank of America Tower...,['Empire State Building | The Empire State Building is a 102-story...,✔️ [0.750]
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在电影《Kids》中首次亮相，是Voto Latino的联合创始人。,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",2019年5月17日出生的美国演员Michael Biehn，知名于其在《终结者》系列中的角色。,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [1.000]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在东前线的第二次进攻， codenamed Operation Mars，是苏联forces对德国forces的进攻，发生在19...,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 14:57:54 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 80.0
2024/12/26 14:57:54 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 14:57:54 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In  Chinese:
2024/12/26 14:57:55 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 2/6 轮指令优化 *********
2024/12/26 14:57:55 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 14:57:55 INFO dspy.teleprompt.copro_optimizer:   指令: 生成一条引人入胜的推文，回答问题以清晰易懂，注意文本不超过280个字符，且未带主题标签
2024/12/26 14:57:55 INFO dspy.teleprompt.copro_optimizer:   前缀: Tweet In  Chinese:
2024/12/26 14:57:55 INFO dspy.teleprompt.copro_optimizer: At Depth 2/6, Evaluating Prompt Candidate #1/1 for Predictor 1 of 1.


Average Metric: 17.75 / 25 (71.0%): 100%|██████████| 25/25 [00:57<00:00,  2.32s/it]

2024/12/26 14:58:53 INFO dspy.evaluate.evaluate: Average Metric: 17.75 / 25 (71.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",我的窗口（英语：At My Window）是美国歌手 Townes Van Zandt 的 1987 年发行的专辑。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",美国演员 Candace Kita 在剧集《头目》中 guest starred，与 Jilly Kitzinger 相同的是由 L...,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [1.000]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self 哪个出版物最先出版呢？根据提供的信息，True to Self 是 2017 年发...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [0.750]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",维多利亚时期的纪录片系列《维多利亚人 - 他们的故事在图片中》，由英国作家和电视主持人Jeremy Paxman写成，于2009年上映。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章出现在《Tae Kwon Do Times》和《Southwest Art》等杂志上。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA-charity shield 比赛中，曼城与阿斯顿维拉相遇。然而，关于该比赛的具体年份和对手信息不够明确。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.500]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",[1] «新大陆 skyscraper | 新大陆是位于纽约市五大道之间的102层高楼大厦，高达1454 ft（含天线），以帝国州为...,['Empire State Building | The Empire State Building is a 102-story...,
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在青春期剧片《Kids》中首次亮相，并成为Voto Latino的...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",Tombstone Territory 是一部美国西部电视剧，由 Pat Conway 和 Richard Eastham 主演。该...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.750]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在东前线的第二次世界大战行动的代号为Operation Mars或第二次Rzhev-Sychevka进攻行动，主要发生在1942...,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 14:58:53 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 80.0
2024/12/26 14:58:53 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 14:58:53 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In  Chinese:
2024/12/26 14:58:56 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 3/6 轮指令优化 *********
2024/12/26 14:58:56 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 14:58:56 INFO dspy.teleprompt.copro_optimizer:   指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于280个字符，并且没有主题标签，或与已存在 Instruction #3 相关，但强调了其他关键方面，如内容精度和词汇选择
2024/12/26 14:58:56 INFO dspy.teleprompt.copro_optimizer:   前缀: 
2024/12/26 14:58:56 INFO dspy.teleprompt.copro_optimizer: At Depth 3/6, Evaluating Prompt Candidate #1/1 for Predictor 1 of 1.


Average Metric: 18.75 / 25 (75.0%): 100%|██████████| 25/25 [00:30<00:00,  1.22s/it]

2024/12/26 14:59:26 INFO dspy.evaluate.evaluate: Average Metric: 18.75 / 25 (75.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter Townes Van Zandt于 1987 年发行了专辑At My Window。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",由于缺乏具体信息，难以确定Candace Kita guest starred 的是哪个美国演员。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.500]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self，哪个最近出版呢？根据提供的信息，Self 是美国 Singer Bryson Ti...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",英国著名电视节目《维多利亚人：他们的故事在图片中》由杰米·帕克斯曼（Jeremy Paxman）执导，首播于2009年2月15日，探...,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章可能出现在Tae Kwon Do Times 或Southwest Art等杂志上。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA-charity shield 比赛是 Manchester City 与 Aston Villa 的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",新西兰 skyscraper 中，Empire State Building 高度为 1454 ft，而 Bank of Ameri...,['Empire State Building | The Empire State Building is a 102-story...,✔️ [0.750]
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在青春期剧片《Kids》中首次亮相，并成为Voto Latino的...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",Tombstone 的主角并非 Tombstone Territory 的主角，而是 #QuintinSondergaard，出生于...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.500]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",#第二次世界大战 #东部战场 #OperationMars 苏联对德国的进攻， codename 为 Operation Mars ...,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 14:59:26 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 80.0
2024/12/26 14:59:26 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 14:59:26 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In  Chinese:
2024/12/26 14:59:27 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 4/6 轮指令优化 *********
2024/12/26 14:59:27 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 14:59:27 INFO dspy.teleprompt.copro_optimizer:   指令: <生成一条引人入胜的推文，准确有效地回答问题，并保留文本的逻辑连贯性、内容精度和语料选择性，无论其长度如何。
2024/12/26 14:59:27 INFO dspy.teleprompt.copro_optimizer:   前缀: __generate_crisp_twentyEighteen_with_contextual_precision_
2024/12/26 14:59:27 INFO dspy.teleprompt.copro_optimizer: At Depth 4/6, Evaluating Prompt Candidate #1/1 for Predictor 1 of 1.


Average Metric: 17.75 / 25 (71.0%): 100%|██████████| 25/25 [00:35<00:00,  1.43s/it]

2024/12/26 15:00:03 INFO dspy.evaluate.evaluate: Average Metric: 17.75 / 25 (71.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",我的窗口（英语：At My Window）是美国歌手 Townes Van Zandt 的第一个录制 studio 专辑，于 198...,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",美国演员比尔·穆雷（Bill Murray）与 actress Candace Kita 合作过一部电影。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.750]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self 哪个最近出版？根据提供的信息，Who Put the Bomp 是一首 1961 ...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [0.750]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",英國記者兼節目主持人 jeremy paxman 出生於1944年，為《維多利亞人 - their story in pictur...,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章可能出现在以下几家杂志中：太空道时报、西南艺术或跆拳道时报。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA.charity Shield 比赛是曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",[1] «新大陆 skyscraper | 新大陆是位于纽约市中town的一座102层高楼大厦，位于第五大道与三十三街和三十四街之间...,['Empire State Building | The Empire State Building is a 102-story...,
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在青春期剧片《Kids》中首次亮相，并成为Voto Latino的...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",Tombstone Territory 是一部美国西部电视剧，由Pat Conway 和 Richard Eastham 主演。该系...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.750]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在第二次世界大战东部战区的进攻，包括102步兵师，实际上是由苏联对德国的“Operation Mars”（也称为第二次Rzhev...,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:00:03 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 80.0
2024/12/26 15:00:03 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 15:00:03 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In  Chinese:
2024/12/26 15:00:05 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 5/6 轮指令优化 *********
2024/12/26 15:00:05 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 15:00:05 INFO dspy.teleprompt.copro_optimizer:   指令: 创建一条引人入胜的推文，回答问题以清晰易懂，注意文本不超过280个字符，且未带主题标签，并遵循具体任务要求和背景信息
2024/12/26 15:00:05 INFO dspy.teleprompt.copro_optimizer:   前缀: __generate_twentyEighteen_with_task_context_
2024/12/26 15:00:05 INFO dspy.teleprompt.copro_optimizer: At Depth 5/6, Evaluating Prompt Candidate #1/1 for Predictor 1 of 1.


Average Metric: 19.00 / 25 (76.0%): 100%|██████████| 25/25 [00:47<00:00,  1.88s/it]

2024/12/26 15:00:52 INFO dspy.evaluate.evaluate: Average Metric: 19.0 / 25 (76.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",我的窗口（英语：At My Window）是美国歌手 Townes Van Zandt 的第一个录制 studio 专辑，于 198...,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",“美国演员 Candace Kita 在剧集《头目》中 guest starred，与 Jilly Kitzinger 相同的是一位...,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.750]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self 哪个作品最 recently 出版？根据提供的信息，Who Put the Bom...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",英国著名电视节目《维多利亚人 - 他们的故事在图片中》由杰米·帕克斯曼和其他人共同创作，首播于2009年2月15日。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章出现在《太空道》和《西南艺术》等杂志上。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA-charity shield 比赛中，曼城与阿斯顿维拉比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",[1] «新大陆 skyscraper | 新大陆是位于纽约市五大道之间的33街和34街之间的一座102层高楼大厦，高度为1454 ...,['Empire State Building | The Empire State Building is a 102-story...,
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",罗莎里奥·道森（Rosario Dawson）是美国女演员、制片人、歌手、漫画作家和政治活动家。她在1995年的青春剧“Kids”中...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",Tombstone Territory 是一部美国西部电视剧，由 Pat Conway 和 Richard Eastham 主演。该...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.750]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在东前线的第二次世界大战行动，包括102步兵师，开始于Operation Mars。,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:00:52 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 80.0
2024/12/26 15:00:52 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 15:00:52 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In  Chinese:
2024/12/26 15:00:55 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 6/6 轮指令优化 *********
2024/12/26 15:00:55 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 15:00:55 INFO dspy.teleprompt.copro_optimizer:   指令: 创建一条引人入胜的推文，回答问题以清晰易懂，注意文本不超过280个字符，并且深入理解题目的细节，并准确表达其含义
2024/12/26 15:00:55 INFO dspy.teleprompt.copro_optimizer:   前缀: __generate_twentyEighteen_with_in_depth_context_
2024/12/26 15:00:55 INFO dspy.teleprompt.copro_optimizer: At Depth 6/6, Evaluating Prompt Candidate #1/1 for Predictor 1 of 1.


Average Metric: 19.25 / 25 (77.0%): 100%|██████████| 25/25 [00:36<00:00,  1.48s/it]

2024/12/26 15:01:32 INFO dspy.evaluate.evaluate: Average Metric: 19.25 / 25 (77.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter Townes Van Zandt 发布了专辑 At My Window。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",“美国演员在剧集《头目》中与Jilly Kitzinger相同，然而没有明确指出是谁。”,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.500]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",谁放了跳动？（Who Put the Bomp）是 1961 年的一首 doo-wop 风格歌曲，而 Self 则是美国 Singe...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [0.750]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",英国著名电视节目《维多利亚人：他们的故事在图像中的传说》由杰米·帕克斯曼（Jeremy Paxman）执笔，于2009年首播，探讨了...,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章可能出现在以下几家杂志中：太空道时报、西南艺术或跆拳道时报。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA Charity Shield 是 Manchester City 和 Aston Villa 之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",[1] «新大陆 skyscraper | 新大陆是位于纽约市五点道之间的33街和34街之间的一座102层高楼大厦。它有一个屋顶高度...,['Empire State Building | The Empire State Building is a 102-story...,
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在青春期剧片《Kids》中首次亮相，并成为Voto Latino的...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",Tombstone Territory 是一部美国西部电视剧，由Pat Conway 和 Richard Eastham 主演。该系...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.750]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在二战东部战区的第二次进攻，包括102步兵师， codenamed operation mars。,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:01:32 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 80.0
2024/12/26 15:01:32 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 15:01:32 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In  Chinese:


最佳结果统计信息： {2729733986240: {'depth': [0, 1, 2, 3, 4, 5], 'max': [80.0, 80.0, 80.0, 80.0, 80.0, 80.0], 'average': [75.0, 73.66666666666667, 74.0, 73.4, 73.83333333333333, 74.28571428571429], 'min': [70.0, 70.0, 70.0, 70.0, 70.0, 70.0], 'std': [5.0, 4.496912521077347, 3.9370039370059056, 3.7202150475476548, 3.531603350069515, 3.4522988495984492]}}
最新结果统计信息： {2729733986240: {'depth': [0, 1, 2, 3, 4, 5], 'max': [80.0, 71.0, 75.0, 71.0, 76.0, 77.0], 'average': [75.0, 71.0, 75.0, 71.0, 76.0, 77.0], 'min': [70.0, 71.0, 75.0, 71.0, 76.0, 77.0], 'std': [5.0, 0.0, 0.0, 0.0, 0.0, 0.0]}}
指标调用总次数： 7
Average Metric: 11.75 / 15 (78.3%): 100%|██████████| 15/15 [00:04<00:00,  3.65it/s]  

2024/12/26 15:01:36 INFO dspy.evaluate.evaluate: Average Metric: 11.75 / 15 (78.3%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5a824c155542990a1d231f92,The original work by Anton Chekhov involving a disillusioned schoo...,Maria Yermolova,bridge,"{'title': ['On the Harmful Effects of Tobacco', 'Melikhovo', 'Plat...","{Wild Honey (play), Platonov (play)}",Anton Chekhov 的原作“Platonov”是为 Maly Theatre 的rising star Maria Yerm...,['Wild Honey (play) | Wild Honey is a 1984 adaptation by British p...,✔️ [0.750]
1,5adc1218554299438c868d32,Are Roswell International Air Center and Pago Pago International A...,no,comparison,"{'title': ['American Samoa Community College', 'Tutuila', 'Inter I...","{Roswell International Air Center, Pago Pago International Airport}",两座机场都不是位于美国的主陆。罗swell国际航空中心和韦恩戈地区机场都在美国的州内，但帕戈帕戈国际机场则位于美 Samoa的一个岛屿上。,['Roswell International Air Center | Roswell International Air Cen...,✔️ [1.000]
2,5ab65f2455429954757d3265,Untold: The Greatest Sports Stories Never Told was hosted by a spo...,the voice of basketball,bridge,"{'title': ['Gregg Murphy', 'Marv Albert', 'The Greatest Story Neve...","{Untold: The Greatest Sports Stories Never Told, Marv Albert}",《未tell：最伟大的体育故事 never tell》是一档由马夫阿尔伯特（Marv Albert）主持的纪录片系列，讲述了体育界最...,"[""Untold: The Greatest Sports Stories Never Told | Untold: The Gre...",✔️ [1.000]
3,5a8803da55429938390d3eca,Are Walt Disney and Sacro GRA both documentry films?,yes,comparison,"{'title': ['List of Disney animated shorts', 'Walt Disney Pictures...","{Sacro GRA, Walt Disney (film)}",#SacroGRA #WaltDisney 不是纪录片电影，Sacro GRA 主要关注意大利社会和环境问题，而 Walt Disn...,"['Sacro GRA | Sacro GRA (] , Italian for ""Holy GRA"") is a 2013 Ita...",✔️ [0.500]
4,5a8f71c75542992414482acf,What is the Palestinian Islamic organization that governs th small...,Hamas,bridge,"{'title': ['Bar Lev Line', 'Zionist Freedom Alliance', 'Status of ...","{Gaza Strip, Status of territories occupied by Israel in 1967}",加沙地带由巴勒斯坦伊斯兰组织哈马斯管理，位于东部海岸的小领土。,"['Gaza Strip | The Gaza Strip ( ; Arabic: قطاع غزة\u200e \u200e ""Q...",✔️ [0.500]
5,5a81b2f655429926c1cdad62,What album did the song of which Taylor Swift premiered the music ...,1989,bridge,"{'title': ['Taylor Swift videography', '2017 MTV Video Music Award...","{2015 MTV Video Music Awards, Wildest Dreams (Taylor Swift song)}",2015年MTVVideoMusic奖show中，泰勒斯威ift在预SHOW中首播了歌曲「bad blood」，该歌曲来自于英国乐队...,"['Bad Blood (Bastille album) | Bad Blood (stylised as ""BΔD BLOOD"")...",✔️ [1.000]
6,5ac3bd345542993915413929,"Which is considered a genus level classification, Apera or Gunnera...",Apera,comparison,"{'title': ['Gunnera manicata', 'Northwest Oregon Conference', 'Hyg...","{Apera, Gunnera manicata}",Gunnera 属于属级分类，而 Apera 则是其下属。所以，Apera 不是 genus 级别的分类。,"['Gunnera manicata | Gunnera manicata, known as Brazilian giant-rh...",✔️ [0.750]
7,5ab7498f554299110f219af8,Do The Drums and Pussy Galore play music of similar genres?,no,comparison,"{'title': ['The Honeymoon Killers (American band)', 'Space Cowboy ...","{The Drums, Pussy Galore (band)}",The Drums 和 Pussy Galore 都属于独立摇滚的范畴，但他们的音乐风格有所不同。Pussy Galore 的音乐更...,['List of post-punk revival bands | Post-punk revival is a type of...,✔️ [1.000]
8,5ae6e5875542992ae0d163a0,What is the post-nominal abbreviation for the university where the...,Exon,bridge,"{'title': ['Zeresenay Alemseged', 'Research', 'University of Exete...","{University of Exeter, Banded Brothers}",英国埃克塞เตอร大学的研究团队监测了非洲丛林中的带纹啮齿类，展现出其生活方式和社会结构。,"['Banded Brothers | Banded Brothers (also known as ""Banded Brother...",✔️ [0.750]
9,5ac55af35542993e66e82333,Are both Benjamin Christensen and Len Wiseman directors?,yes,comparison,"{'title': ['Benjamin Christensen', 'Len Wiseman', 'Lady with the L...","{Len Wiseman, Benjamin Christensen}",#电影导演 #电影史\nBenjamin Christensen 和 Len Wiseman 都是电影导演，但他们的工作背景和风格不...,['Benjamin Christensen | Benjamin Christensen (28 September 1879 –...,✔️ [0.750]


78.33

 {2729733986240: {'depth': [0, 1, 2, 3, 4, 5], 'max': [80.0, 71.0, 75.0, 71.0, 76.0, 77.0], 'average': [75.0, 71.0, 75.0, 71.0, 76.0, 77.0], 'min': [70.0, 71.0, 75.0, 71.0, 76.0, 77.0], 'std': [5.0, 0.0, 0.0, 0.0, 0.0, 0.0]}}
 终决定“最佳”候选（best_candidate）使用的是最高 score，也就是 max。
发现最高分是第一个80分，即仍然是我们之前的指令，所以程序里的prompt和前缀没有被改变优化。

In [43]:
def print_instructions_and_prefix(program):
    for idx, predictor in enumerate(program.predictors()):
        # 获取 signature
        signature = getattr(predictor, "extended_signature", None) or getattr(predictor, "signature", None)
        # 最后一个字段的 key
        *_, last_key = signature.fields.keys()
        prefix = signature.fields[last_key].json_schema_extra.get("prefix", "")

        print(f"=== Predictor #{idx} ===")
        print(f"Instructions: {signature.instructions}")
        print(f"Prefix: {prefix}\n")

# 在未优化的 tweet 上调用
print_instructions_and_prefix(tweeter)


=== Predictor #0 ===
Instructions: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
Prefix: Tweet In  Chinese:



In [44]:
def print_instructions_and_prefix(program):
    for idx, predictor in enumerate(program.predictors()):
        # 获取 signature
        signature = getattr(predictor, "extended_signature", None) or getattr(predictor, "signature", None)
        # 最后一个字段的 key
        *_, last_key = signature.fields.keys()
        prefix = signature.fields[last_key].json_schema_extra.get("prefix", "")

        print(f"=== Predictor #{idx} ===")
        print(f"Instructions: {signature.instructions}")
        print(f"Prefix: {prefix}\n")

# 在已优化的 compiled_COPRO_tweet 上调用
print_instructions_and_prefix(compiled_COPRO_tweet)


=== Predictor #0 ===
Instructions: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
Prefix: Tweet In  Chinese:



发现并没有改变，那么我们采取改变数据集和提升温度的方法，并对源码进行打印log的改造来更清晰的看看中间过程以及产生优化的prompt

In [46]:
from dspy.teleprompt import COPRO
import litellm
litellm.drop_params=True
# 创建 COPRO 优化器实例
COPRO_teleprompter = COPRO(
    metric=overall_metric,
    breadth=15,  #  宽度
    depth=6,    #  深度
    track_stats=True,
    max_errors=10,
    init_temperature=1.8,
    verbose=True,              # 是否打印详细日志
)


NUM_THREADS = 4
compiled_COPRO_tweet = COPRO_teleprompter.compile(
    student = tweeter, 
    trainset=trainset[:15], #  使用一小部分训练数据进行演示
    eval_kwargs=dict(num_threads=NUM_THREADS, display_progress=True, display_table=20),
)


# 打印统计信息
print("最佳结果统计信息：", compiled_COPRO_tweet.results_best)  # 最佳结果统计信息：
print("最新结果统计信息：", compiled_COPRO_tweet.results_latest)  # 最新结果统计信息：
print("指标调用总次数：", compiled_COPRO_tweet.total_calls)  # 指标调用总次数：
evaluate = Evaluate(devset=devset[30:45], metric=overall_metric, num_threads=NUM_THREADS, display_progress=True, display_table=20,provide_traceback=True)

evaluate(compiled_COPRO_tweet)

2024/12/26 15:08:05 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 1/6 轮指令优化 *********
2024/12/26 15:08:05 INFO dspy.teleprompt.copro_optimizer: [候选 1/2]
2024/12/26 15:08:05 INFO dspy.teleprompt.copro_optimizer:   指令: 创建一条引人入胜的推文，回答问题以清晰易懂，注意文本不超过280个字符，且未带主题标签
2024/12/26 15:08:05 INFO dspy.teleprompt.copro_optimizer:   前缀: __generate_compelling_twentyEighteen
2024/12/26 15:08:05 INFO dspy.teleprompt.copro_optimizer: At Depth 1/6, Evaluating Prompt Candidate #1/2 for Predictor 1 of 1.


Average Metric: 10.50 / 15 (70.0%): 100%|██████████| 15/15 [00:00<00:00, 1152.47it/s]

2024/12/26 15:08:05 INFO dspy.evaluate.evaluate: Average Metric: 10.5 / 15 (70.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手 Townes Van Zandt 在 1987 年发行了专辑“At My Window”。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",美国演员 Candace Kita 在多部电视剧和电影中 guest starred，但并未提及她与哪个美国演员一起合作。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.500]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self 哪个出版物最_recently 出版？根据提供的信息，Self 是 2017 年发...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",英國記者兼節目主持人 jeremy paxman 出生於1944年，為《維多利亞人 - 他們的故事在圖片中的系列》作家。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章出现在《太空道》和《西南艺术》等杂志上。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA-charity shield 比赛是曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",[1] «新大陆 skyscraper | 新大陆是位于纽约市五大道之间的102层高楼大厦，高达1454 ft（含天线），以帝国州为...,['Empire State Building | The Empire State Building is a 102-story...,
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在青春期剧片《Kids》中首次亮相，并成为Voto Latino的...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",Tombstone Territory 是一部美国西部电视剧，由 Pat Conway 和 Richard Eastham 主演。该...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.750]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在东前线的第二次世界大战行动，包括102步兵师，开始于Operation Mars。,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:08:06 INFO dspy.teleprompt.copro_optimizer: [候选 2/2]
2024/12/26 15:08:06 INFO dspy.teleprompt.copro_optimizer:   指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 15:08:06 INFO dspy.teleprompt.copro_optimizer:   前缀: Tweet In  Chinese:
2024/12/26 15:08:06 INFO dspy.teleprompt.copro_optimizer: At Depth 1/6, Evaluating Prompt Candidate #2/2 for Predictor 1 of 1.


Average Metric: 12.25 / 15 (81.7%): 100%|██████████| 15/15 [00:00<00:00, 2143.23it/s]

2024/12/26 15:08:06 INFO dspy.evaluate.evaluate: Average Metric: 12.25 / 15 (81.7%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter 发布了“At My Window”。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",据了解，Candace Kita 在剧集《头目》中 guest starred，与 Jilly Kitzinger 相同的是一位美国演员。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.500]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self 哪个最近出版呢？根据提供的信息，Who Put the Bomp 是一首 1961...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",「维多利亚人 - 他们的故事在图片中的故事」是一部2009年的英国纪录片系列，关注维多利亚艺术和文化。四集系列由Jeremy Pax...,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Tae Kwon Do Times 或 Southwest Art 等杂志都可能发表 Scott Shaw 的文章。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA-charity shield 比赛是曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",新 York市的 Empire State Building 高度为 1454 ft，而 Bank of America Tower...,['Empire State Building | The Empire State Building is a 102-story...,✔️ [0.750]
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在电影《Kids》中首次亮相，是Voto Latino的联合创始人。,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",2019年5月17日出生的美国演员Michael Biehn，知名于其在《终结者》系列中的角色。,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [1.000]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在东前线的第二次进攻， codenamed Operation Mars，是苏联forces对德国forces的进攻，发生在19...,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:08:07 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 81.67
2024/12/26 15:08:07 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 15:08:07 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In  Chinese:
2024/12/26 15:08:14 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 2/6 轮指令优化 *********
2024/12/26 15:08:14 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 15:08:14 INFO dspy.teleprompt.copro_optimizer:   指令: Create an engaging tweet that accurately answers the question while staying within the 280 character limit and not having any topical tags.
2024/12/26 15:08:14 INFO dspy.teleprompt.copro_optimizer:   前缀: Tweet In Chinese:
2024/12/26 15:08:14 INFO dspy.teleprompt.copro_optimizer: At Depth 2/6, Evaluating Prompt Candidate #1/1 for Predictor 1 of 1.


Average Metric: 11.00 / 15 (73.3%): 100%|██████████| 15/15 [00:23<00:00,  1.54s/it]

2024/12/26 15:08:37 INFO dspy.evaluate.evaluate: Average Metric: 11.0 / 15 (73.3%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter Townes Van Zandt 发布了“At My Window”。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",“《The Wayans Bros.》”中，卡丹斯·基塔与比尔·穆雷一起客串。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [1.000]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}","""Who Put the Bomp 和 Self 是两个不同的出版物！Who Put the Bomp 是一本rock music ...",['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [0.750]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",「维多利亚人 - 他们的故事在图像中的故事」是一部由英国记者和广播人杰西米·帕克斯曼（Jeremy Paxman）撰写的纪录片系列。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章出现在Tae Kwon Do Times 和 Southwest Art 这两家杂志上。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA 公益盃比赛中，曼城与阿斯顿维拉之间进行了比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",美国银行大楼远高于帝国州立建筑。美国银行大楼在亚特兰大和圣路易斯都有高达312米和384米的高度，远高于帝国州立建筑的1454英尺。,['Empire State Building | The Empire State Building is a 102-story...,✔️ [1.000]
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}","罗莎里奥·道森（Rosario Dawson）是一位美国演员和政治活动家，她在 1995 年的青春剧 ""Kids"" 中首次亮相，并于...","['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",[1] «迈克尔·比恩（Michael Biehn）出生于1955年5月17日，美国知名军事科幻片演员。他在詹姆斯·卡梅伦执导的电影...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在莫斯科附近东部战线的第二次世界大战行动中使用的代码名称是“Operation Mars”。,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:08:37 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 81.67
2024/12/26 15:08:37 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
2024/12/26 15:08:37 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In  Chinese:
2024/12/26 15:08:38 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 3/6 轮指令优化 *********
2024/12/26 15:08:38 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 15:08:38 INFO dspy.teleprompt.copro_optimizer:   指令: Create a captivating Twitter-style summary that provides accurate answers to the question while adhering to the 280 character limit, and omits any topical labels, ensuring contextual relevance and conciseness.
2024/12/26 15:08:38 INFO dspy.teleprompt.copro_optimizer:   前缀: Tweet In Chinese
2024/12/26 15:08:38 INFO dspy.teleprompt.copro_optimizer: At Depth 3/6, Evaluating Prompt Candidate #1/1 for Predictor 1 of 1.


Average Metric: 12.50 / 15 (83.3%): 100%|██████████| 15/15 [00:15<00:00,  1.03s/it]

2024/12/26 15:08:54 INFO dspy.evaluate.evaluate: Average Metric: 12.5 / 15 (83.3%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter Townes Van Zandt 发布了“At My Window”。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",“《The Wayans Bros.》”中，卡丹斯·基塔与比尔·穆雷一起客串。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [1.000]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}","""Who Put the Bomp"" 是一首 1961 年的 doo-wop 风格歌曲，而 Self 则是美国 Singer Bry...",['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [0.750]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",2009年，英國作家及主持人 jeremy paxman 出演的《維多利亞人：他們的故事在圖片中》一系列紀錄片。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [1.000]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 在Tae Kwon Do Times杂志上发表过文章。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972年FA公益盃：曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",新西兰的塔楼中，亚特兰大塔（Atlanta）是世界第96高、美国第14高、Georgia州第1高的 skyscraper。与帝国国民...,['Empire State Building | The Empire State Building is a 102-story...,✔️ [0.750]
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}","美国演员 Rosario Dawson 在 1995 年的青春剧 ""Kids"" 中首次亮相，后来又加入了多部电影和电视剧，并于 20...","['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",# Tombstone # MichaelBiehn # TombstoneTerritory,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.750]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在第二次世界大战东部战区的进攻，包括102步兵师，主要是由“Operation Mars” codename驱动。,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:08:54 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 83.33
2024/12/26 15:08:54 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: Create a captivating Twitter-style summary that provides accurate answers to the question while adhering to the 280 character limit, and omits any topical labels, ensuring contextual relevance and conciseness.
2024/12/26 15:08:54 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In Chinese
2024/12/26 15:08:55 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 4/6 轮指令优化 *********
2024/12/26 15:08:55 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 15:08:55 INFO dspy.teleprompt.copro_optimizer:   指令: Design an engaging Twitter summary that accurately addresses the question, incorporating contextual accuracy and staying within the 280 character limit while omitting topical labels.
2024/12/26 15:08:55 INFO dspy.teleprompt.copro_optimizer:   前缀: Tweet In Chinese:
2024/12/26 15:08:55 INFO dspy.teleprompt.copro_optimizer: At Depth 4/6, Evaluat

Average Metric: 12.25 / 15 (81.7%): 100%|██████████| 15/15 [00:38<00:00,  2.56s/it]

2024/12/26 15:09:34 INFO dspy.evaluate.evaluate: Average Metric: 12.25 / 15 (81.7%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter Townes Van Zandt 发布了“At My Window”。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",“《The Wayans Bros.》中，卡丹斯·基塔与比尔·穆雷一起出演，而卡丹斯·布朗则在剧集如《格里尔斯安atomy》和《绝望...,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [1.000]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",2017 年，美国歌手 Bryson Tiller 发布了专辑 True to Self，这是最新的出版物。,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",「维多利亚人 - 他们的故事在图片中的故事」是一部由英国记者和广播人杰西米·帕克斯曼（Jeremy Paxman）撰写的纪录片系列。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [1.000]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",「泰科武道时斯科特沙威（Scott Shaw）发表的文章主要出现在泰科武道时杂志上，虽然该杂志专注于泰科武道，但也涵盖其他韩国武术。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的FA-charity shield比赛是曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",新西兰的Empire State Building（102层）与美国纽约市的Bank of America Tower（23层）相比...,['Empire State Building | The Empire State Building is a 102-story...,✔️ [0.750]
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在青春期剧《孩子们》中首次亮相，并与Voto Latino共同创立...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",[1] «迈克尔·比恩 (Michael Biehn) | 1956年7月31日出生，美国演员，主要以科幻电影中的军事角色而闻名，如...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在第二次世界大战东部战区的进攻，包括102步兵师，实际上是由苏联对德国的“Operation Mars”（俄语：OPERATIO...,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:09:34 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 83.33
2024/12/26 15:09:34 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: Create a captivating Twitter-style summary that provides accurate answers to the question while adhering to the 280 character limit, and omits any topical labels, ensuring contextual relevance and conciseness.
2024/12/26 15:09:34 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In Chinese
2024/12/26 15:09:36 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 5/6 轮指令优化 *********
2024/12/26 15:09:36 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 15:09:36 INFO dspy.teleprompt.copro_optimizer:   指令: Create a captivating Twitter-style summary that provides accurate and concise answers to the question while adhering to the 280 character limit, and omits any topical labels, ensuring contextual relevance.
2024/12/26 15:09:36 INFO dspy.teleprompt.copro_optimizer:   前缀: Tweet In Chinese:
2024/12/26 15:09:36 INFO dspy.teleprompt.copro_optimizer

Average Metric: 12.75 / 15 (85.0%): 100%|██████████| 15/15 [00:13<00:00,  1.15it/s]

2024/12/26 15:09:49 INFO dspy.evaluate.evaluate: Average Metric: 12.75 / 15 (85.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter Townes Van Zandt 发布了“At My Window”。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",“《The Wayans Bros.》”中，卡丹斯·基塔与比尔·穆雷一起客串。,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [1.000]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}","""Who Put the Bomp"" 是一首 1961 年的道德歌曲，而 ""Self"" 是 2017 年的一张音乐专辑。 Who P...",['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",英国著名主持人 Jeremy Paxman 出生于 1944 年。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章在Tae Kwon Do Times杂志上发表。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972年FA公益盃：曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",新西兰的塔楼中，亚特兰大塔（Atlanta）是世界第96高、美国第14高、Georgia州都城最高的建筑。与帝国国民大厦相比，它要高...,['Empire State Building | The Empire State Building is a 102-story...,✔️ [1.000]
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）创办了非党派、非盈利组织Voto Latino，旨在鼓励年轻的西班牙裔和拉丁族...,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",# Tombstone # 1993年电影 # MichaelBiehn # GunfightAtTheOKCorral,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [1.000]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在第二次世界大战东部战区的进攻，包括102步兵师，主要是由苏联对德军的“Operation Mars”（也称为第二次Rzhev-...,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:09:49 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 85.0
2024/12/26 15:09:49 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: Create a captivating Twitter-style summary that provides accurate and concise answers to the question while adhering to the 280 character limit, and omits any topical labels, ensuring contextual relevance.
2024/12/26 15:09:49 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In Chinese:
2024/12/26 15:09:51 INFO dspy.teleprompt.copro_optimizer: ********* 开始第 6/6 轮指令优化 *********
2024/12/26 15:09:51 INFO dspy.teleprompt.copro_optimizer: [候选 1/1]
2024/12/26 15:09:51 INFO dspy.teleprompt.copro_optimizer:   指令: 设计一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签
2024/12/26 15:09:51 INFO dspy.teleprompt.copro_optimizer:   前缀: __generate_compelling_twentyEighteen_Twitter_Summary_In_Chinese
2024/12/26 15:09:51 INFO dspy.teleprompt.copro_optimizer: At Depth 6/6, Evaluating Prompt Candidate #1/1 for Predictor 1 of 1.


Average Metric: 11.00 / 15 (73.3%): 100%|██████████| 15/15 [00:40<00:00,  2.72s/it]

2024/12/26 15:10:31 INFO dspy.evaluate.evaluate: Average Metric: 11.0 / 15 (73.3%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5ac0e2525542997d64295a79,At My Window was released by which American singer-songwriter?,John Townes Van Zandt,bridge,"{'title': ['Through the Window', 'Yes I Am (Melissa Etheridge albu...","{Townes Van Zandt, At My Window (album)}",美国歌手-songwriter Townes Van Zandt 于 1987 年发行了专辑 At My Window。,"[""At a Window in the Artist's Studio | At a Window in the Artist's...",✔️ [0.750]
1,5abfc1895542993fe9a41e38,which American actor was Candace Kita guest starred with,Bill Murray,bridge,"{'title': ['Charlie Babcock', 'Thom Bierdz', 'Scrubs (season 4)', ...","{Candace Kita, Bill Murray}",#头部风暴 #美国演员 #CandaceKita,['Candace Kita | Kita\'s first role was as a news anchor in the 19...,✔️ [0.750]
2,5a8f600c5542992414482a8f,"Which of these publications was most recently published, Who Put t...",Self,comparison,"{'title': ['He Put the Bomp! In the Bomp', 'Who Put the Bomp (in t...","{Self (magazine), Who Put the Bomp}",Who Put the Bomp 或者 Self 中哪个音乐作品最 recently 出版？根据提供的信息，Who Put the ...,['Guy Self | Guy Self is a fictional character from the BBC medica...,✔️ [1.000]
3,5ac562695542993e66e823c7,The Victorians - Their Story In Pictures is a documentary series w...,1950,bridge,"{'title': ['The Victorians', ""Michael Wood's Story of England"", 'S...","{The Victorians, Jeremy Paxman}",维多利亚时期纪录片《维多利亚人 - 他们的故事在图片中》由英国作家和主持人Jeremy Paxman写并呈现，于2009年首播。,"['Masao | Masao (written: 正雄, 正夫, 正生, 正男, 正郎, 雅雄, 雅央, 雅夫, 雅勇, 雅男, ...",✔️ [0.750]
4,5ae4c6ba5542996836b02d23,"Which magazine has published articles by Scott Shaw, Tae Kwon Do T...",Tae Kwon Do Times,comparison,"{'title': ['Rick Timmons', 'Duk Sung Son', 'Nam Suk Lee', 'Southwe...","{Tae Kwon Do Times, Southwest Art}",Scott Shaw 的文章出现在《太空道》和《西南艺术》等杂志上。,['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to t...,✔️ [0.750]
5,5abac4ae55429939ce03dd54,In what year was the club founded that played Manchester City in t...,1874,bridge,"{'title': ['1985 FA Charity Shield', '1995 FA Charity Shield', '19...","{1972 FA Charity Shield, Aston Villa F.C.}",1972 年的 FA-charity shield 比赛是曼城与阿斯顿维拉之间的比赛。,['1972 FA Charity Shield | The 1972 FA Charity Shield was conteste...,✔️ [0.750]
6,5a73061d55429901807daf66,"Which is taller, the Empire State Building or the Bank of America ...",The Empire State Building,comparison,"{'title': ['WOR TV Tower', 'Empire State Building', 'Jack Brod', '...","{Empire State Building, Bank of America Tower (Manhattan)}",[1] «新大陆 skyscraper | 新大陆是位于纽约市五大道之间的33街和34街之间的一座102层高楼大厦。它有一个屋顶高度...,['Empire State Building | The Empire State Building is a 102-story...,
7,5a760bbc554299109176e631,Which American actress who made their film debut in the 1995 teen ...,Rosario Dawson,bridge,"{'title': ['Evan Rachel Wood', 'Voto Latino', 'Shannen Doherty fil...","{Rosario Dawson, Voto Latino}",美国演员罗莎里奥·道森（Rosario Dawson）于1995年在电影《Kids》中首次亮相，是Voto Latino的联合创始人。,"['Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an ...",✔️ [1.000]
8,5ab853925542990e739ec8c8,"Tombstone stared an actor born May 17, 1955 known as who?",Bill Paxton,bridge,"{'title': ['Bill Paxton', 'Kinpei Azusa', 'Dennis Hopper filmograp...","{Bill Paxton, Tombstone (film)}",Tombstone Territory 的主演中，没有明确提到出生于1955年的人物，但是根据提供的信息，Quintin Sonde...,"['Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an...",✔️ [0.750]
9,5a8532485542997b5ce3ffbd,What is the code name for the German offensive that started this S...,Operation Citadel,bridge,"{'title': ['52nd Infantry Division (German Empire)', 'Cornay', '10...","{102nd Infantry Division (Wehrmacht), Battle of Kursk}",德国在 1943 年 7 月 5 日开始的 Operation Citadel 进攻最终失败了，导致苏联对其进行反击。,['Battle of Kursk | The Battle of Kursk was a Second World War eng...,✔️ [0.750]


2024/12/26 15:10:31 INFO dspy.teleprompt.copro_optimizer: 本轮最佳候选得分: 85.0
2024/12/26 15:10:31 INFO dspy.teleprompt.copro_optimizer: 最佳候选指令: Create a captivating Twitter-style summary that provides accurate and concise answers to the question while adhering to the 280 character limit, and omits any topical labels, ensuring contextual relevance.
2024/12/26 15:10:31 INFO dspy.teleprompt.copro_optimizer: 最佳候选前缀: Tweet In Chinese:


最佳结果统计信息： {2729756795120: {'depth': [0, 1, 2, 3, 4, 5], 'max': [81.67, 81.67, 83.33, 83.33, 85.0, 85.0], 'average': [75.83500000000001, 75.0, 77.0825, 78.0, 79.16666666666667, 78.33285714285715], 'min': [70.0, 70.0, 70.0, 70.0, 70.0, 70.0], 'std': [5.835000000000001, 4.908421334808169, 5.574932174475309, 5.31329652852163, 5.507403098456558, 5.49270871060826]}}
最新结果统计信息： {2729756795120: {'depth': [0, 1, 2, 3, 4, 5], 'max': [81.67, 73.33, 83.33, 81.67, 85.0, 73.33], 'average': [75.83500000000001, 73.33, 83.33, 81.67, 85.0, 73.33], 'min': [70.0, 73.33, 83.33, 81.67, 85.0, 73.33], 'std': [5.835000000000001, 0.0, 0.0, 0.0, 0.0, 0.0]}}
指标调用总次数： 7
Average Metric: 12.00 / 15 (80.0%): 100%|██████████| 15/15 [00:16<00:00,  1.10s/it]

2024/12/26 15:10:48 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 15 (80.0%)





Unnamed: 0,id,question,answer,type,example_context,gold_titles,generated_tweet,pred_context,overall_metric
0,5a824c155542990a1d231f92,The original work by Anton Chekhov involving a disillusioned schoo...,Maria Yermolova,bridge,"{'title': ['On the Harmful Effects of Tobacco', 'Melikhovo', 'Plat...","{Wild Honey (play), Platonov (play)}",原作《普拉托诺夫》由俄罗斯剧作家安东·切赫霍夫创作，主要人物普拉托诺夫是一名失落的省级学校教员。,['Wild Honey (play) | Wild Honey is a 1984 adaptation by British p...,✔️ [1.000]
1,5adc1218554299438c868d32,Are Roswell International Air Center and Pago Pago International A...,no,comparison,"{'title': ['American Samoa Community College', 'Tutuila', 'Inter I...","{Roswell International Air Center, Pago Pago International Airport}",neither Roswell International Air Center nor Pago Pago Internation...,['Roswell International Air Center | Roswell International Air Cen...,✔️ [0.750]
2,5ab65f2455429954757d3265,Untold: The Greatest Sports Stories Never Told was hosted by a spo...,the voice of basketball,bridge,"{'title': ['Gregg Murphy', 'Marv Albert', 'The Greatest Story Neve...","{Untold: The Greatest Sports Stories Never Told, Marv Albert}",sports播音员马尔伯特（Marv Albert）主持了《未tell：最伟大的体育故事 never tell》。,"[""Untold: The Greatest Sports Stories Never Told | Untold: The Gre...",✔️ [1.000]
3,5a8803da55429938390d3eca,Are Walt Disney and Sacro GRA both documentry films?,yes,comparison,"{'title': ['List of Disney animated shorts', 'Walt Disney Pictures...","{Sacro GRA, Walt Disney (film)}",#西方电影 #纪录片 #电影导演 #华特迪士尼 #萨克罗格拉,"['Sacro GRA | Sacro GRA (] , Italian for ""Holy GRA"") is a 2013 Ita...",✔️ [0.500]
4,5a8f71c75542992414482acf,What is the Palestinian Islamic organization that governs th small...,Hamas,bridge,"{'title': ['Bar Lev Line', 'Zionist Freedom Alliance', 'Status of ...","{Gaza Strip, Status of territories occupied by Israel in 1967}",中东小岛国加沙地带由巴勒斯坦伊斯兰组织哈马斯管理，位于地中海东岸。,"['Gaza Strip | The Gaza Strip ( ; Arabic: قطاع غزة\u200e \u200e ""Q...",✔️ [0.750]
5,5a81b2f655429926c1cdad62,What album did the song of which Taylor Swift premiered the music ...,1989,bridge,"{'title': ['Taylor Swift videography', '2017 MTV Video Music Award...","{2015 MTV Video Music Awards, Wildest Dreams (Taylor Swift song)}",2015年MTV音乐奖的前场，泰勒斯威ift在那里首播了她的歌曲《坏血》的音乐视频，这首歌是她于2014年发行的专辑《1989》中的...,"['Bad Blood (Bastille album) | Bad Blood (stylised as ""BΔD BLOOD"")...",✔️ [0.750]
6,5ac3bd345542993915413929,"Which is considered a genus level classification, Apera or Gunnera...",Apera,comparison,"{'title': ['Gunnera manicata', 'Northwest Oregon Conference', 'Hyg...","{Apera, Gunnera manicata}",Gunnera 属于属级分类，而 Apera 则是 genus 级的分类。,"['Gunnera manicata | Gunnera manicata, known as Brazilian giant-rh...",✔️ [0.750]
7,5ab7498f554299110f219af8,Do The Drums and Pussy Galore play music of similar genres?,no,comparison,"{'title': ['The Honeymoon Killers (American band)', 'Space Cowboy ...","{The Drums, Pussy Galore (band)}",#TheDrums和#PussyGalore的音乐风格相似，都是garage rock和noise rock的代表。 #TheDru...,['List of post-punk revival bands | Post-punk revival is a type of...,✔️ [0.750]
8,5ae6e5875542992ae0d163a0,What is the post-nominal abbreviation for the university where the...,Exon,bridge,"{'title': ['Zeresenay Alemseged', 'Research', 'University of Exete...","{University of Exeter, Banded Brothers}",「布兰德兄弟们」研究项目的母体机构是埃克塞特大学。,"['Banded Brothers | Banded Brothers (also known as ""Banded Brother...",✔️ [0.750]
9,5ac55af35542993e66e82333,Are both Benjamin Christensen and Len Wiseman directors?,yes,comparison,"{'title': ['Benjamin Christensen', 'Len Wiseman', 'Lady with the L...","{Len Wiseman, Benjamin Christensen}",丹麦导演本杰明·克里斯蒂安森（Benjamin Christensen）是丹麦电影的重要人物。他执导了多部著名电影，包括1922年的...,['Benjamin Christensen | Benjamin Christensen (28 September 1879 –...,✔️ [1.000]


80.0

可以看到最高分不是第一条了，而是优化过程中的其他新的prompt得到了更高的分数

In [47]:
def print_instructions_and_prefix(program):
    for idx, predictor in enumerate(program.predictors()):
        # 获取 signature
        signature = getattr(predictor, "extended_signature", None) or getattr(predictor, "signature", None)
        # 最后一个字段的 key
        *_, last_key = signature.fields.keys()
        prefix = signature.fields[last_key].json_schema_extra.get("prefix", "")

        print(f"=== Predictor #{idx} ===")
        print(f"Instructions: {signature.instructions}")
        print(f"Prefix: {prefix}\n")

# 在未优化的tweet 上调用
print_instructions_and_prefix(tweeter)


=== Predictor #0 ===
Instructions: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
Prefix: Tweet In  Chinese:



In [48]:
def print_instructions_and_prefix(program):
    for idx, predictor in enumerate(program.predictors()):
        # 获取 signature
        signature = getattr(predictor, "extended_signature", None) or getattr(predictor, "signature", None)
        # 最后一个字段的 key
        *_, last_key = signature.fields.keys()
        prefix = signature.fields[last_key].json_schema_extra.get("prefix", "")

        print(f"=== Predictor #{idx} ===")
        print(f"Instructions: {signature.instructions}")
        print(f"Prefix: {prefix}\n")

# 在已优化的 compiled_COPRO_tweet 上调用
print_instructions_and_prefix(compiled_COPRO_tweet)


=== Predictor #0 ===
Instructions: Create a captivating Twitter-style summary that provides accurate and concise answers to the question while adhering to the 280 character limit, and omits any topical labels, ensuring contextual relevance.
Prefix: Tweet In Chinese:



### 优化器：MIPROv2

**MIPROv2** 是一个继承自 `Teleprompter` 的类，提供了「自动化 Prompt 与 Few-shot 示例优化」的综合优化。它的核心目标是：给定一个「学生程序 (student program)」、一组训练/验证数据集，以及一个指标函数 (metric)，系统性地搜索并挑选出最优的 Prompt 模板和 Few-shot 示例，从而在验证集上取得尽可能好的分数。

#### MIPROv2 的主要功能

1. **自动化模式 (auto/light/medium/heavy)**  
    MIPROv2 预置了多种「搜索强度」模式，可根据任务规模与资源情况，选择适当的模式来平衡搜索效率与质量。比如：
    
    - **light**：试验次数较少、迭代更快；
    - **medium**：适中规模；
    - **heavy**：搜索范围更大，适合对精度要求更高、且有一定算力/预算支撑的场景。
2. **Few-shot 示例自动引导**  
    可通过设置 `max_bootstrapped_demos` 和 `max_labeled_demos`，自动生成多组 few-shot 示例集 (每组包含若干示例)。这些示例可以在 Prompt 中起到良好的引导作用，提高模型在小样本场景下的表现。
    
3. **指令 (instruction) 生成**  
    通过一个名为 `GroundedProposer` 的组件，把数据集摘要、程序摘要、few-shot 示例以及一些提示技巧 (tips) 整合在一起，自动生成多条候选指令 (instruction)。这样可以让 Prompt 更好地适配具体任务内容，减少手工编写指令的难度。
    
4. **小批量 (minibatch) 与全量评估结合**  
    在数据量较大时，支持先在一个小批量子集上做快速评估，帮助快速淘汰效果不佳的候选组合；然后在关键步骤做全量评估，以得到对整体数据更准确的分数。此策略能在保持评估准确度的同时，减少大规模运算带来的高成本。
    
5. **贝叶斯优化 (Optuna)**  
    在多次试验 (trial) 中，MIPROv2 利用 Optuna 的 TPE (Tree-structured Parzen Estimator) 采样器对 Prompt 组合进行优化。随着试验轮次增加，搜索会逐渐收敛到全局最优或近似最优的指令与示例配置。
    
6. **日志与可视化**  
    每次 trial 的得分、程序版本(快照)及其对应的指令或 few-shot 组合都被完整记录，方便事后回溯和分析。如果需要查看更多细节或可视化结果，可基于这些日志自行扩展或调试。
- 所有优化器的前置条件

3. **编写/拥有一个 Student Program**：
    
    - 这个 Program 是一个可执行的「预测器集合」，比如说一个继承自 `dspy.Program` 或 `Teleprompter` 的对象，其中包含一个或多个 `predictors()`.
    - 每个 `predictor` 支持 `.demos` 用于存放 few-shot 示例，以及可以给它设置 instruction、系统prompt等信息。
4. **准备数据集**：
    
    - 至少要有一个 `trainset` (带有输入-输出的示例列表)。
    - 至少需要一个验证集 `valset` 来做评估 (可以用户自行传入或让代码自动切分)。
    - `trainset` 和 `valset` 里的格式需要和你的 Student Program 适配。
5. **定义好评估指标函数**：
    
    - `metric` 返回单次评估的分数，如准确率等

---
-  使用流程

#### 1. 初始化 MIPROv2

1. 调用 MIPROv2 的构造函数时，**必须**传入 `metric`，其余参数如 `prompt_model`、`task_model`、`max_bootstrapped_demos`、`auto`、`verbose` 等可以根据需要进行配置：
    
    ```python
    mipro = MIPROv2(
        metric=your_metric_function,
        prompt_model=your_prompt_model,
        task_model=your_task_model,
        max_bootstrapped_demos=4,
        max_labeled_demos=16,
        auto="medium",
        verbose=True,
        # 其他可选参数...
    )
    ```
    
2. 如果你**不**指定 `prompt_model` 与 `task_model`，则会使用全局默认语言模型 `dspy.settings.lm`。
3. 如果 `auto` 参数不为 `None`，如 `"light"`, `"medium"`, 或 `"heavy"`，MIPROv2 会根据该模式自动覆盖部分超参数（如 `num_trials`, `val_size` 等），减少手动设置的工作量。

#### 2. 运行 `compile` 方法

- 当所有准备就绪后，调用 `mipro.compile(...)` 来启动自动化搜索流程。常见关键参数包括：
    1. `student=student_program`: 你定义的 Student Program。
    2. `trainset`、`valset`: 分别为训练集和验证集，如果省略 `valset`，代码会自动拆分一部分 `trainset` 作为验证集。
    3. `num_trials`: 搜索多少轮，每轮会针对不同指令与 few-shot 示例组合做评估。
    4. `minibatch`: 是否开启小批量评估。若 `True`，每轮在验证集的一小部分做快评估，以节省算力；若 `False`，每轮都在完整验证集评估，得到更准确的分数。
    5. `requires_permission_to_run`: 若担心语言模型调用成本，可设为 `True` 以在执行前给出预计调用量和费用提示，让你手动确认。

一个示例调用可能是：

```python
best_program = mipro.compile(
    student=MyStudentProgram(),
    trainset=train_data,
    valset=valid_data,
    num_trials=20,
    minibatch=True,
    minibatch_size=50,
    requires_permission_to_run=False
)
```

#### 3. 查看搜索结果

- `compile` 方法最终会返回一个 **best_program**，其内部包含最优的指令（instructions）、few-shot 示例及相关信息。
- 你可能最关心：
    1. `best_program.score`: 在验证集上的最佳分数；
    2. `best_program.trial_logs`: 以字典形式记录了所有试验 (trial) 的信息，包括选用的指令索引、few-shot 示例索引、评测得分等；
    3. `best_program.candidate_programs`: 存放那些通过完整评估的程序版本，及对应分数，可回溯查看；
    4. `best_program.mb_candidate_programs`: 若使用小批量评估，则这里会保留小批量评估阶段的所有候选程序及分数。

拿到 **best_program** 后，可以将其用于下游推断，也可将里面的指令、few-shot 示例配置迁移到生产环境，或结合 `trial_logs` 进一步分析该搜索过程，为下次迭代提供参考。


In [43]:
from dspy import Program
from dspy.teleprompt import MIPROv2 

# 1) 定义一个最简的 Student Program
# Tweeter()

# 2) 定义一个评估指标函数
# overall_metric()

# 3) 准备训练集、验证集
# trainset = [...]  # 训练数据列表
# valset = [...]    # 验证数据列表

# 4) 初始化 MIPROv2 对象
mipro = MIPROv2(
    metric=overall_metric,
    # 下列参数均可按需修改
    max_bootstrapped_demos=4,  # few-shot自动生成的示例集合
    max_labeled_demos=2,      # labeled示例
    auto="light",             # 采用自动化模式 "light"/"medium"/"heavy"/None
    num_candidates=10,         # 一次生成几条候选指令或few-shot组合
    verbose=True,              # 是否打印详细日志
    max_errors=10
)

# 5) 构建 Student Program
tweet_student= Tweeter()

# 6) 调用 compile 完成自动化prompt优化
best_program = mipro.compile(
    student=tweet_student,
    trainset=trainset[:15],
    valset=valset[:10],
    num_trials=5,             # 总共尝试多少次trial
    minibatch=True,            # 是否使用小批量评估
    minibatch_size=3,         # 小批量大小
    requires_permission_to_run=True  # 若不想手动确认LM调用量
)


2024/12/26 20:51:56 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: False
num_candidates: 5
valset size: 10

2024/12/26 20:52:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2024/12/26 20:52:22 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2024/12/26 20:52:22 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=5 sets of demonstrations...


Bootstrapping set 1/5
Bootstrapping set 2/5
Bootstrapping set 3/5


 27%|██▋       | 4/15 [00:00<00:00, 121.21it/s]


Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 4/5


  7%|▋         | 1/15 [00:00<00:00, 200.19it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 5/5


 27%|██▋       | 4/15 [00:00<00:00, 173.91it/s]
2024/12/26 20:52:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2024/12/26 20:52:22 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2024/12/26 20:52:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...

2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: 0: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。

2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: 1: Please provide a question and a context to generate a tweet in Chinese that effectively answers the given question within the specified context.

2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: 2: 生成一条引人入胜的推文，有效地回答问题，忠实于上

Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
SOURCE CODE: StringSignature(question, context -> reasoning, tweet_in_Chinese
    instructions='生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。'
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    context = Field(annotation=str required=True json_schema_extra={'desc': '推特内容', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    reasoning = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${reasoning}', '__dspy_field_type': 'output'})
    tweet_in_Chinese = Field(annotation=str required=True json_schema_extra={'desc': '推特内容', '__dspy_field_type': 'output', 'prefix': 'Tweet In  Chinese:'})
)

class Tweeter(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate_tweet = dspy.ChainOfThought(Generat

2024/12/26 20:52:23 INFO dspy.evaluate.evaluate: Average Metric: 7.75 / 10 (77.5%)
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 77.5

2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: ==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: We will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination using Bayesian Optimization.

2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 1 / 7 =====
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: Please provide a question and a context to generate a tweet in Chinese that effectively answers the given question within the specified context.
p: Tweet In  Chinese:


Average Metric: 6.50 / 10 (65.0%): 100%|██████████| 10/10 [00:00<00:00, 1999.86it/s]

2024/12/26 20:52:23 INFO dspy.evaluate.evaluate: Average Metric: 6.5 / 10 (65.0%)
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 65.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1'].
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [77.5, 65.0]
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 77.5


2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 7 =====
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
Tweet In  Chinese： Who Put the Bomp 或者 Self 哪个最近出版呢？根据提供的信息，Who Put the Bomp 是一首 1961 年的 doo-wop 风格歌曲，而 Self 则是美国 Singer Bryson Tiller 的第二张录音室专辑，于 2017 年发行。所以，Self 最近出版。
p: Tweet In  Chinese:


Average Metric: 9.00 / 10 (90.0%): 100%|██████████| 10/10 [00:00<00:00, 1665.79it/s]

2024/12/26 20:52:23 INFO dspy.evaluate.evaluate: Average Metric: 9.0 / 10 (90.0%)
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 90.0
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 90.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1'].
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [77.5, 65.0, 90.0]
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 90.0


2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 7 =====
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。例如：“你知道谁是 Royal Shakespeare Company 的最年轻 Ophelia 演员？”
p: Tweet In  Chinese:


Average Metric: 7.25 / 10 (72.5%): 100%|██████████| 10/10 [00:00<00:00, 1666.39it/s]

2024/12/26 20:52:23 INFO dspy.evaluate.evaluate: Average Metric: 7.25 / 10 (72.5%)
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 72.5 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 1'].





2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [77.5, 65.0, 90.0, 72.5]
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 90.0


2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 7 =====
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
Tweet In  Chinese： Who Put the Bomp 或者 Self 哪个最近出版呢？根据提供的信息，Who Put the Bomp 是一首 1961 年的 doo-wop 风格歌曲，而 Self 则是美国 Singer Bryson Tiller 的第二张录音室专辑，于 2017 年发行。所以，Self 最近出版。
p: Tweet In  Chinese:


Average Metric: 9.00 / 10 (90.0%): 100%|██████████| 10/10 [00:00<00:00, 2500.03it/s]

2024/12/26 20:52:23 INFO dspy.evaluate.evaluate: Average Metric: 9.0 / 10 (90.0%)
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 90.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1'].





2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [77.5, 65.0, 90.0, 72.5, 90.0]
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 90.0


2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 7 =====
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。例如：“你知道谁是 Royal Shakespeare Company 的最年轻 Ophelia 演员？”
p: Tweet In  Chinese:


Average Metric: 8.25 / 10 (82.5%): 100%|██████████| 10/10 [00:00<00:00, 2001.48it/s]

2024/12/26 20:52:23 INFO dspy.evaluate.evaluate: Average Metric: 8.25 / 10 (82.5%)
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 82.5 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 3'].
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [77.5, 65.0, 90.0, 72.5, 90.0, 82.5]
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 90.0


2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 7 =====
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。
p: Tweet In  Chinese:


Average Metric: 7.75 / 10 (77.5%): 100%|██████████| 10/10 [00:00<00:00, 1665.66it/s]

2024/12/26 20:52:23 INFO dspy.evaluate.evaluate: Average Metric: 7.75 / 10 (77.5%)
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.5 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1'].





2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [77.5, 65.0, 90.0, 72.5, 90.0, 82.5, 77.5]
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 90.0


2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 7 =====
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: 生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。例如：“你知道谁是 Royal Shakespeare Company 的最年轻 Ophelia 演员？”
p: Tweet In  Chinese:


Average Metric: 8.50 / 10 (85.0%): 100%|██████████| 10/10 [00:00<00:00, 1110.96it/s]

2024/12/26 20:52:23 INFO dspy.evaluate.evaluate: Average Metric: 8.5 / 10 (85.0%)
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 85.0 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 4'].
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [77.5, 65.0, 90.0, 72.5, 90.0, 82.5, 77.5, 85.0]
2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 90.0


2024/12/26 20:52:23 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 90.0!





![](https://typora-photo1220.oss-cn-beijing.aliyuncs.com/DataAnalysis/LingYi/20241226191620.png)

这部分日志显示了 **light 模式**下的运行配置，具体解释如下：
- **`num_trials: 7`**：将进行 7 次试验（每次试验探索不同的 Prompt 和 Few-shot 示例组合）。  
- **`minibatch: False`**：每次试验将使用完整的验证集（`valset`）进行评估，而不是仅用一部分数据进行快速评估。  
- **`num_candidates: 5`**：每个预测器将生成 5 条候选指令（Prompt 组合）。  
- **`valset size: 10`**：验证集的大小为 10 条样本，所有评估将在这 10 条样本上进行。


![](https://typora-photo1220.oss-cn-beijing.aliyuncs.com/DataAnalysis/LingYi/20241226190412.png)

从这里可以看到 MIPROv2 在对 Prompt 的候选指令 (instructions) 进行生成和评估的过程。

---

### 1. **"Proposed Instructions for Predictor 0"**
这是针对第 0 个 `predictor`（即学生程序中的第一个预测器）生成的候选指令列表。MIPROv2 使用 `GroundedProposer`，结合程序上下文、数据摘要、few-shot 示例等信息，为 `predictor` 提供了多种可能的指令。这些指令会作为 Prompt 的模板，指导模型完成指定任务。

这些指令的目标是生成高质量的 Prompt，从而让语言模型能够更好地回答问题。

---

### 2. **"候选指令的内容"**

日志中的 5 条候选指令内容如下：

- **指令 0**: 
  > "生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。"  
  - 一个清晰、直接的任务描述，符合生成推文的基本要求。

- **指令 1**: 
  > "Please provide a question and a context to generate a tweet in Chinese that effectively answers the given question within the specified context."  
  - 英文指令，要求生成一个中文推文，强调问题和上下文的结合。

- **指令 2**:  
  > 同指令 0，但附加了一个示例：  
    > "Tweet In Chinese：Who Put the Bomp 或者 Self 哪个最近出版呢？根据提供的信息，Who Put the Bomp 是一首 1961 年的 doo-wop 风格歌曲，而 Self 则是美国 Singer Bryson Tiller 的第二张录音室专辑，于 2017 年发行。所以，Self 最近出版。"  
  - 增加示例的指令更具体，有助于模型理解任务。

- **指令 3**:  
  > 在基础任务上增加了一项复杂要求：  
    > "同时考虑到两个个人在他们各自领域里所达到的第一站点的意义和重要性。"  
  - 更复杂的指令，可能用来测试模型对多维度信息的处理能力。

- **指令 4**:  
  > 提供了一个具体的任务示例：  
    > "例如：你知道谁是 Royal Shakespeare Company 的最年轻 Ophelia 演员？"  
  - 示例让模型更容易理解任务预期。

---

### 3. **"Evaluating the default program..."**

这条日志标志着评估流程的开始。MIPROv2 首先会对 "默认程序"（即未插入任何候选指令和示例的原始学生程序）进行评估，以计算其在验证集上的初始分数。这个分数作为基准，后续试验中的优化程度将以此为参考。

评估过程可能包括以下步骤：
- **调用默认程序执行任务**（例如生成推文）。  
- **验证结果的质量**：通过 `metric` 函数计算得分（如准确率、F1 等）。
- **记录默认程序的表现**，以便与后续插入候选指令的程序表现做对比。

---

### 过程总结

1. **候选指令生成**：展示 MIPROv2 如何为特定任务生成多条候选 Prompt 指令，每条指令可能带有不同的要求或示例，供后续优化时选择。
2. **默认程序评估**：评估未优化程序的表现，作为优化过程的基准参考点。
3. **后续优化步骤的基础**：后续，MIPROv2 会基于这些候选指令和评估结果，进行多轮试验，寻找最优 Prompt 和示例组合。


In [41]:
best_trial = max(best_program.trial_logs.values(), key=lambda x: x.get("full_eval_score", 0))
print("Best parameters:", best_trial)


Best parameters: {'0_predictor_instruction': 2, '0_predictor_demos': 1, 'full_eval_program_path': None, 'full_eval_score': 90.0, 'total_eval_calls_so_far': 30, 'full_eval_program': generate_tweet = Predict(StringSignature(question, context -> reasoning, tweet_in_Chinese
    instructions='生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。'
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    context = Field(annotation=str required=True json_schema_extra={'desc': '推特内容', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    reasoning = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${reasoning}', '__dspy_field_type': 'output'})
    tweet_in_Chinese = Field(annotation=str required=True json_schema_extra={'desc': '推特内容', '__dspy_field_type': 'output', 'prefix': 'Tweet In  Chinese:'})
))}


最终选择的是参数（Instruction 2 和 Few-Shot Set 1）在验证集上的表现最好，得分 90.0 分，即还是原来的prompt

1. **Few-shot 示例引导 (Bootstrap)**
    
    - 在 `_bootstrap_fewshot_examples` 阶段，会对训练集做若干次采样、调用 Teacher (若有) 或直接基于当前 `student` 程序来生成 demo，最终形成多个可选 few-shot 示例集合 (demo_candidates)。
2. **指令生成 (Instruction Proposer)**
    
    - `GroundedProposer` 会组合「数据集摘要 + 程序摘要 + 上一步 few-shot 示例 + 随机prompt小贴士 (tips)」进行 Prompt 生成，得到对于每个 predictor 多条候选指令列表。
3. **贝叶斯优化 (Optuna)**
    
    - 在 `_optimize_prompt_parameters` 中，将 `instruction_candidates` 与 `demo_candidates` 进行多轮「采样」和「试验」：
        - 每次 trial 都挑选一条 candidate 组合放到 predictor 里，然后在 valset(或minibatch) 上做评估。
        - TPE 采样器根据已有 trial 的得分分布，逐渐向高分区域收敛。
    - 最终拿到一个 score 最高的 Prompt 组合，即 `best_program`。
4. **小批量评估 vs 全量评估**
    
    - 若 `minibatch=True`，每个 trial 仅在 `minibatch_size` 个样本上跑推断 (快速评估)；每隔 `minibatch_full_eval_steps` 就拿过去几轮中平均分最高的组合做一次全量评估，更新真正的最优分数。
        

**Q1: 如果只想快速测试 MIPROv2 的流程，但数据集很小怎么办？**  
A: 你可以把 `auto="light"` 以外，还可以手动降低 `num_trials`、`valset` 大小、`max_bootstrapped_demos` 等，使得整个编译过程非常快。

**Q2: 如果不想使用 few-shot 示例，只想 zero-shot 怎么办？**  
A: 可以将 `max_bootstrapped_demos=0` 与 `max_labeled_demos=0`。这样 `_bootstrap_fewshot_examples` 就会返回 None，实际就进入零样本prompt优化模式。

**Q3: 我想指定自己的 Teacher 模型去生成 few-shot 示例，该如何做？**  
A: 在初始化 MIPROv2 时，将一个自定义的 Teacher 实例或配置传入 `teacher_settings`。然后在 `compile(..., teacher=my_teacher)` 时，MIPROv2 会在 `_bootstrap_fewshot_examples` 阶段使用它来生成示例。

**Q4: 我需要完全跳过用户确认，直接批量跑多个实验？**  
A: 将 `requires_permission_to_run=False`，这样就不会弹出确认提示，直接开始执行编译。

**Q5: 运行中出现『Evaluation error... Exceeded max_errors』怎么办？**  
A: 说明在程序评测过程中发生了太多错误(如模型超时、崩溃等)，可以增大 `max_errors` 或更好地处理推断时的异常状况。


1. **MIPROv2** 提供了一个「一站式自动 Prompt 优化」流程：从生成 few-shot 示例、指令候选到多轮评估与贝叶斯优化，全程高度自动化。
2. 可通过参数自由切换「zero-shot / few-shot」、「自动 / 手动超参」、「小批量 / 全量评估」等多种模式，以适配不同规模与需求的项目。
3. 最终得到的 `best_program` 中除了模型本体，还带有详细的搜索日志，便于后续分析和迭代改进。

In [45]:
best_program.trial_logs

{-1: {'full_eval_program_path': None,
  'full_eval_score': 77.5,
  'total_eval_calls_so_far': 10,
  'full_eval_program': generate_tweet = Predict(StringSignature(question, context -> reasoning, tweet_in_Chinese
      instructions='生成一条引人入胜的推文，有效地回答问题，忠实于上下文，少于 280 个字符，并且没有主题标签。'
      question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
      context = Field(annotation=str required=True json_schema_extra={'desc': '推特内容', '__dspy_field_type': 'input', 'prefix': 'Context:'})
      reasoning = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${reasoning}', '__dspy_field_type': 'output'})
      tweet_in_Chinese = Field(annotation=str required=True json_schema_extra={'desc': '推特内容', '__dspy_field_type': 'output', 'prefix': 'Tweet In  Chinese:'})
  ))},
 1: {'0_predictor_instruction': 1,
  '0_predictor_demos': 1,
  'full_eval_prog