In [1]:
import pandas as pd

# Display the complete contents of dataframe cells.
pd.set_option("display.max_colwidth", None)

In [2]:

import os
from getpass import getpass

import openai

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

In [3]:

import phoenix as px
from llama_index.core import set_global_handler
from phoenix.trace.langchain import LangChainInstrumentor

session = px.launch_app()

# Setup instrumentation for both llama-index and LangChain (used by Ragas)
set_global_handler("arize_phoenix")
LangChainInstrumentor().instrument()

🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


In [4]:
from llama_index.core import SimpleDirectoryReader

dir_path = "./data/prompt-engineering-papers"
reader = SimpleDirectoryReader(dir_path, num_files_limit=2)
documents = reader.load_data()

In [6]:
from phoenix.trace import using_project
from ragas.testset.evolutions import multi_context, reasoning, simple
from ragas.testset.generator import TestsetGenerator, RunConfig

TEST_SIZE = 10

# generator with openai models
generator = TestsetGenerator.with_openai(
    generator_llm="gpt-3.5-turbo-0125",
    critic_llm="gpt-3.5-turbo-0125",
    embeddings="text-embedding-3-large",
)

# set question type distribution
distribution = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}

# disable async and change config to keep from reaching OpenAI rate limits
with using_project("ragas-testset"):
    testset = generator.generate_with_llamaindex_docs(
        documents, test_size=TEST_SIZE, distributions=distribution,
        run_config=RunConfig(
            timeout=60,
            max_retries=10,
            max_wait = 180, # default: 60
            max_workers= 1, # default: 16 
    ),
    raise_exceptions=False,
    is_async=False,
    )
test_df = testset.to_pandas()
test_df.head()

  generator = TestsetGenerator.with_openai(


embedding nodes:   0%|          | 0/144 [00:00<?, ?it/s]

Filename and doc_id are the same for all nodes.


Generating:   0%|          | 0/10 [00:00<?, ?it/s]

max retries exceeded for ReasoningEvolution(generator_llm=LangchainLLMWrapper(run_config=RunConfig(timeout=60, max_retries=10, max_wait=180, max_workers=1, exception_types=<class 'openai.RateLimitError'>)), docstore=InMemoryDocumentStore(splitter=<langchain_text_splitters.base.TokenTextSplitter object at 0x7fb70fd61240>, nodes=[Node(page_content='arXiv:1605.08386v1  [math.CO]  26 May 2016HEAT-BATH RANDOM WALKS WITH MARKOV BASES\nCAPRICE STANLEY AND TOBIAS WINDISCH\nAbstract. Graphs on lattice points are studied whose edges come from a ﬁ nite set of\nallowed moves of arbitrary length. We show that the diameter of these graphs on ﬁbers of a\nﬁxed integer matrix can be bounded from above by a constant. W e then study the mixing\nbehaviour of heat-bath random walks on these graphs. We also state explicit conditions\non the set of moves so that the heat-bath random walk, a genera lization of the Glauber\ndynamics, is an expander in ﬁxed dimension.\nContents\n1. Introduction 1\n2. Graphs and

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,How is a scoring LM used in the two-stage retrieval method proposed by Rubin et al. for selecting demonstrations?,"[−P(y|x)for all (x, y)pairs in a\nvalidation set with a diversity regularization.\nSupervised Method Rubin et al. (2022) pro-\nposed a two-stage retrieval method to select demon-\nstrations. For a specific input, it first built an un-\nsupervised retriever (e.g., BM25) to recall simi-\nlar examples as candidates and then built a su-\npervised retriever EPR to select demonstrations\nfrom candidates. A scoring LM is used to eval-\nuate the concatenation of each candidate exam-\nple and the input. Candidates with high scores\nare labeled as positive examples, and candidates\nwith low scores are hard negative examples. Li]","A scoring LM is used to evaluate the concatenation of each candidate example and the input. Candidates with high scores are labeled as positive examples, and candidates with low scores are hard negative examples.",simple,"[{'page_label': '4', 'file_name': '2301.00234v3.A_Survey_on_In_context_Learning.pdf', 'file_path': '/home/peter-legion-wsl2/peter-projects/regen-ai/nbs/data/prompt-engineering-papers/2301.00234v3.A_Survey_on_In_context_Learning.pdf', 'file_type': 'application/pdf', 'file_size': 4898135, 'creation_date': '2024-04-13', 'last_modified_date': '2024-04-13'}]",True
1,What is discussed in the survey on Geometric Random Walks by Santosh S. Vempala?,"[20 CAPRICE STANLEY AND TOBIAS WINDISCH\n23. Alistair Sinclair, Improved Bounds for Mixing Rates of Markov Chains and Multic ommodity Flow , Com-\nbinatorics, Probability and Computing 1(1992), no. 4, 351–370.\n24. Bernd Sturmfels, Gr¨ obner bases and convex polytopes , American Mathematical Society, Providence, R.I.,\n1996.\n25. Seth Sullivant, Markov bases of binary graph models , Annals of Combinatorics 7(2003), 441–466.\n26. Santosh S. Vempala, Geometric Random Walks: A Survey , MSRI Combinatorial and Computational\nGeometry 52(2005), 573–612.\n27. Tobias Windisch, Rapid mixing and Markov bases , preprint, arXiv:1505.03018 (2015), 1–18.\nNC State University, Raleigh, NC 27695, USA\nE-mail address :crstanl2@ncsu.edu\nOtto-von-Guericke Universit ¨at, Magdeburg, Germany\nE-mail address :windisch@ovgu.de]",The survey on Geometric Random Walks by Santosh S. Vempala discusses Rapid mixing,simple,"[{'page_label': '20', 'file_name': '1605.08386v1.Heat_bath_random_walks_with_Markov_bases.pdf', 'file_path': '/home/peter-legion-wsl2/peter-projects/regen-ai/nbs/data/prompt-engineering-papers/1605.08386v1.Heat_bath_random_walks_with_Markov_bases.pdf', 'file_type': 'application/pdf', 'file_size': 289178, 'creation_date': '2024-04-13', 'last_modified_date': '2024-04-13'}]",True
2,What is the role of memory-of-thoughts in enabling ChatGPT to self-improve?,"[Shuai Li, Zhao Song, Yu Xia, Tong Yu, and Tianyi\nZhou. 2023e. The closeness of in-context learn-\ning and weight shifting for softmax regression.\nCoRR , abs/2304.13276.\nXiaonan Li, Kai Lv, Hang Yan, Tianyang Lin, Wei\nZhu, Yuan Ni, Guotong Xie, Xiaoling Wang,\nand Xipeng Qiu. 2023f. Unified demonstra-\ntion retriever for in-context learning. CoRR ,\nabs/2305.04320.\nXiaonan Li and Xipeng Qiu. 2023a. Finding sup-\nporting examples for in-context learning. arXiv\npreprint arXiv:2302.13539 .\nXiaonan Li and Xipeng Qiu. 2023b. Mot: Pre-\nthinking and recalling enable chatgpt to self-\nimprove with memory-of-thoughts. CoRR ,\nabs/2305.05181.\nYingcong Li, M. Emrullah Ildiz, Dimitris S. Papail-\niopoulos, and Samet Oymak. 2023g. Transform-\ners as algorithms: Generalization and implicit\nmodel selection in in-context learning. CoRR ,\nabs/2301.07067.\nHaotian Liu, Chunyuan Li, Qingyang Wu, and\nYong Jae Lee. 2023. Visual instruction tuning.\narXiv preprint arXiv:2304.08485 .\nJiachang Liu, Dinghan Shen, Yizhe Zhang, Bill\nDolan, Lawrence Carin, and Weizhu Chen. 2022.\nWhat makes good in-context examples for GPT-\n3? In Proceedings of Deep Learning Inside Out\n(DeeLIO 2022): The 3rd Workshop on Knowl-\nedge Extraction and Integration for Deep Learn-\ning Architectures , pages 100–114, Dublin, Ire-\nland and Online. Association for Computational\nLinguistics.\nPengfei Liu, Weizhe Yuan, Jinlan Fu, Zheng-\nbao Jiang, Hiroaki Hayashi, and Graham Neu-\nbig. 2021. Pre-train, prompt, and predict: A\nsystematic survey of prompting methods in\nnatural language processing. arXiv preprint\narXiv:2107.13586 .\nYao Lu, Max Bartolo, Alastair Moore, Sebastian\nRiedel, and Pontus Stenetorp. 2022. Fantasti-\ncally ordered prompts and where to find them:\nOvercoming few-shot prompt order sensitivity.\nInProc. of ACL , pages 8086–8098, Dublin, Ire-\nland. Association for Computational Linguistics.Lucie Charlotte Magister, Jonathan Mallinson,\nJakub Adamek, Eric Malmi, and Aliaksei Sev-\neryn. 2022. Teaching small language models to\nreason. ArXiv preprint , abs/2212.08410.\nNicholas Meade, Spandana Gella, Devamanyu\nHazarika, Prakhar Gupta, Di Jin, Siva Reddy,\nYang Liu, and Dilek Hakkani-Tür. 2023. Using\nin-context learning to improve dialogue safety.\narXiv preprint arXiv:2302.00871 .\nSewon Min, Mike Lewis, Hannaneh Hajishirzi, and\nLuke Zettlemoyer. 2022a. Noisy channel lan-\nguage model prompting for few-shot text clas-\nsification. In Proc. of ACL , pages 5316–5330,\nDublin, Ireland. Association for Computational\nLinguistics.\nSewon Min, Mike Lewis, Luke Zettlemoyer, and\nHannaneh Hajishirzi. 2022b. MetaICL: Learn-\ning to learn in context. In Proceedings of the\n2022 Conference of the North American Chap-\nter of the Association for Computational Lin-\nguistics: Human Language Technologies , pages\n2791–2809, Seattle, United States. Association\nfor Computational Linguistics.\nSewon Min, Xinxi Lyu, Ari Holtzman, Mikel\nArtetxe, Mike Lewis, Hannaneh Hajishirzi, and\nLuke Zettlemoyer. 2022c. Rethinking the role of\ndemonstrations: What makes in-context learning\nwork? ArXiv preprint , abs/2202.12837.\nSwaroop Mishra, Daniel Khashabi, Chitta Baral,\nand Hannaneh Hajish]",,simple,"[{'page_label': '16', 'file_name': '2301.00234v3.A_Survey_on_In_context_Learning.pdf', 'file_path': '/home/peter-legion-wsl2/peter-projects/regen-ai/nbs/data/prompt-engineering-papers/2301.00234v3.A_Survey_on_In_context_Learning.pdf', 'file_type': 'application/pdf', 'file_size': 4898135, 'creation_date': '2024-04-13', 'last_modified_date': '2024-04-13'}]",True
3,"How do models like Frozen, Flamingo, and METALM demonstrate the capability of few-shot learning in the context of vision-language tasks?","[ improving the results.\n9.2 Multi-Modal In-Context Learning\nIn the vision-language area, Tsimpoukelli et al.\n(2021) utilize a vision encoder to represent an im-\nage as a prefix embedding sequence that is aligned\nwith a frozen language model after training on the\npaired image-caption dataset. The resulting model,\nFrozen, is capable of performing multi-modal few-\nshot learning. Further, Alayrac et al. (2022) in-\ntroduce Flamingo, which combines a vision en-\ncoder with LLMs and adopts LLMs as the general\ninterface to perform in-context learning on many\nmulti-modal tasks. They show that training on\nlarge-scale multi-modal web corpora with arbitrar-\nily interleaved text and images is key to endowing\nthem with in-context few-shot learning capabili-\nties. Kosmos-1 (Huang et al., 2023b) is another\nmulti-modal LLMs and demonstrates promising\nzero-shot, few-shot, and even multimodal chain-\nof-thought prompting abilities. Hao et al. (2022a)\npresent METALM, a general-purpose interface to\nmodels across tasks and modalities. With a semi-\ncausal language modeling objective, METALM is\npretrained and exhibits strong ICL performance\nacross various vision-language tasks.\nIt is natural to further enhance the ICL ability]","Models like Frozen, Flamingo, and METALM demonstrate the capability of few-shot learning in vision-language tasks by utilizing vision encoders aligned with language models, combining vision encoders with large language models as a general interface for in-context learning, and presenting a general-purpose interface to models across tasks and modalities with strong ICL performance, respectively.",simple,"[{'page_label': '10', 'file_name': '2301.00234v3.A_Survey_on_In_context_Learning.pdf', 'file_path': '/home/peter-legion-wsl2/peter-projects/regen-ai/nbs/data/prompt-engineering-papers/2301.00234v3.A_Survey_on_In_context_Learning.pdf', 'file_type': 'application/pdf', 'file_size': 4898135, 'creation_date': '2024-04-13', 'last_modified_date': '2024-04-13'}]",True
4,How are heat-bath random walks related to Markov bases in the context of finite subsets of Zd?,"[HEAT-BATH RANDOM WALKS WITH MARKOV BASES 5\nDeﬁnition 3.3. LetPbe a collection of ﬁnite subsets of Zd. A ﬁnite set M ⊂Zdis\nnorm-like forPif there exists a constant C∈Nsuch that for all F ∈ Pand allu,v∈ F,\ndistF(M)(u,v)≤C·∥u−v∥. The set Mis∥·∥-norm-reducing forPif for all F ∈ Pand all\nu,v∈ Fthere exists m∈ Msuch that u+m∈ Fand∥u+m−v∥<∥u−v∥.\nThe property of being norm-like does not depend on the norm, w hereas being norm-\nreducing does. Norm-reducing sets are always norm-like, an d norm-like sets are in turn\nalways Markov bases, but the reverse of both statements is fa lse in general (Example 3.4and\nExample 3.5). For collections PAhowever, every Markov basis is norm-like (Proposition 3.7).\nExample 3.4. For anyn∈N, consider the normal set Fn:= ([2]×[n]×{0})∪{(2,n,1)}with\nthe Markov basis {(0,1,0),(0,0,1),(−1,0,−1)}. The distance between (1 ,1,0) and (2 ,1,0)\ninFn(M) is 2nand thus Mis not norm-like for {Fn:n∈N}(see also Figure 2).\nExample 3.5. Letd∈Nand consider A:= (1,...,1)∈Z1×d, then the set M:={e1−ei:\n2≤i≤d}is a Markov basis for the collection PA. However, Mis not∥·∥p-norm-reducing for\nanyd≥3 and any p∈[1,∞]. For instance, consider e2ande3inFA,1(M). The only move\nfromMthatcanbeappliedon e2ise1−e2, but∥(e2+e1−e2)−e3)∥p=∥e2−e3∥p. Ontheother\nhand, in the case we cannot ﬁnd a move that decreases the 1-nor m of two nodes u,v∈ FA,b\nby 1, we can ﬁnd instead two moves m1,m2∈ Msuch that u+m1,u+m1+m2∈ FA,band\n∥u+m1+m2−v∥=∥u−v∥−2. Thus, the graph-distance of any two elements uandvin\nFA,b(M) is at most ∥u−v∥1and hence Mis norm-like for PA.\nFigure 2. The graph from Example 3.4\nRemark 3.6. LetPbe a collection of ﬁnite subsets of ZdandM ⊂Zdbe norm-like for P.\nIt follows from the deﬁnition that there exists a constant C∈Q≥0such that for all F ∈ P\ndiam(F(M))≤C·max{∥u−v∥:u,v∈ F}.\nTheproofofournextresultsusesthe Graver basis GA⊂Zdforaninteger matrix A∈Zm×d\nwith ker Z(A)∩Nd={0}. We refer to [4, Chapter 3] for a precise deﬁnition.\nProposition 3.7. LetA∈Zm×dwithkerZ(A)∩Nd={0}andM ⊂kerZ(A)be a Markov\nbasis ofPA. ThenMis norm-like for PA.\nProof.LetMbe a Markov basis for PA. The Graver basis GAforAis a ﬁnite set which\nis∥ · ∥1-norm-reducing for PA. Thus, deﬁne C:= max g∈GAdiam(FA,Ag+(M)). Now, pick\nu,v∈ FA,barbitrarily and let u=v+∑r\ni=1gibe a walk from utovinFA]","Heat-bath random walks are related to Markov bases in the context of finite subsets of Zd through the property of being norm-like and norm-reducing. Norm-reducing sets are always norm-like, and norm-like sets are Markov bases. However, the reverse is not always true. Every Markov basis for a collection PA is norm-like.",simple,"[{'page_label': '5', 'file_name': '1605.08386v1.Heat_bath_random_walks_with_Markov_bases.pdf', 'file_path': '/home/peter-legion-wsl2/peter-projects/regen-ai/nbs/data/prompt-engineering-papers/1605.08386v1.Heat_bath_random_walks_with_Markov_bases.pdf', 'file_type': 'application/pdf', 'file_size': 289178, 'creation_date': '2024-04-13', 'last_modified_date': '2024-04-13'}]",True


In [7]:
my_traces = px.Client().get_trace_dataset().save(directory="./data")

AttributeError: 'NoneType' object has no attribute 'save'

In [None]:
my_traces.hex