## Build an entire pipeline/package?
* Builds RAG application
    * Have dataset ready to create vector index (this is situational based on application)
    * Use fraction of same dataset to create synthetic dataset using RAGS
    * Runs the RAG application and produces an 'answer' column
    * Runs RAGAS and evaluates based on specified metric
    * Gives summary
* Update this to try out different RAG formulations and compare eval

### Imports and API Keys

In [36]:
import pandas as pd
import os

from dotenv import load_dotenv, find_dotenv

import warnings
warnings.filterwarnings('ignore')

from langchain import hub
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain.document_loaders import DataFrameLoader
from langchain.vectorstores import Chroma

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
)

from datasets import Dataset

**Note** How do we use Langchain trace effectively?

In [3]:
_ = load_dotenv(find_dotenv())

os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'

os.environ['LANGCHAIN_API_KEY'] = os.environ['LANGCHAIN_API_KEY']
os.environ['OPENAI_API_KEY'] = os.environ['OPENAI_API_KEY']

### Read in the dataset
**Note**: There could be nuances here (in a proper package/software): Add in connectors etc.

In [4]:
### Read in dataset (all ~66k cosmology paper abstracts and titles)

df_cosmo = pd.read_csv('astro_rag/arxiv_astro-ph_data_cosmo.csv')    
df_cosmo.head()

Unnamed: 0,id,title,abstract,categories,cat_text,prepared_text
0,705.2176,Gravitational particle production in braneworl...,Gravitational particle production in time vari...,hep-ph astro-ph.CO gr-qc,"High Energy Physics - Phenomenology, Cosmology...",Gravitational particle production in braneworl...
1,705.2299,Time evolution of T_{\mu\nu} and the cosmologi...,We study the cosmic time evolution of an effec...,hep-ph astro-ph.CO gr-qc,"High Energy Physics - Phenomenology, Cosmology...",Time evolution of T_{\mu\nu} and the cosmologi...
2,705.3289,Helium abundance in galaxy clusters and Sunyae...,It has long been suggested that helium nuclei ...,astro-ph astro-ph.CO astro-ph.HE astro-ph.IM,"Astrophysics, Cosmology and Nongalactic Astrop...",Helium abundance in galaxy clusters and Sunyae...
3,705.4139,Our Peculiar Motion Away from the Local Void,The peculiar velocity of the Local Group of ga...,astro-ph astro-ph.CO,"Astrophysics, Cosmology and Nongalactic Astrop...",Our Peculiar Motion Away from the Local Void \...
4,707.1351,Inverse approach to Einstein's equations for f...,We expand previous work on an inverse approach...,gr-qc astro-ph.CO,"General Relativity and Quantum Cosmology, Cosm...",Inverse approach to Einstein's equations for f...


In [5]:
df_cosmo.shape

(66103, 6)

For this example, to cut down time, let us only use the categories that are just 'astro-ph.CO', so no cross-disciplinary papers; just 'Cosmology and Nongalactic Astrophysics'

In [6]:
df_cosmo = df_cosmo.loc[df_cosmo['categories']=='astro-ph.CO']
df_cosmo.reset_index(inplace=True, drop=True)

df_cosmo.head()

Unnamed: 0,id,title,abstract,categories,cat_text,prepared_text
0,901.0173,Non-Minimal Quintessence With Nearly Flat Pote...,We consider Brans-Dicke type nonminimally coup...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Non-Minimal Quintessence With Nearly Flat Pote...
1,901.0189,Robust determination of the major merger fract...,(Abridged) We measure the fraction of galaxies...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Robust determination of the major merger fract...
2,901.0245,"Neutrino Masses, Dark Energy and the Gravitati...",We study the constraints which the next genera...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,"Neutrino Masses, Dark Energy and the Gravitati..."
3,901.0285,Impact of Instrumental Systematic Contaminatio...,"In this paper, we study the effects of instrum...",astro-ph.CO,Cosmology and Nongalactic Astrophysics,Impact of Instrumental Systematic Contaminatio...
4,901.0286,Tracing the Reionization-Epoch Intergalactic M...,IGM metal absorption lines observed in z>6 spe...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Tracing the Reionization-Epoch Intergalactic M...


In [7]:
df_cosmo.shape 

(21674, 6)

So now we have ~22k rows in the dataset. Let's build both the vector index using Chroma and the synthetic dataset using RAGAS

### Create vector index

The embedding model used isn't a currently high-ranked one (https://huggingface.co/spaces/mteb/leaderboard), this example is just for prototyping

In [8]:
# Get the embedding model

model_name = "sentence-transformers/all-MiniLM-l6-v2" 
model_kwargs = {"device": "cpu"} # Since we are running on local machine, we will use CPU

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)


In [9]:
# Create a DataFrameLoader
loader = DataFrameLoader(df_cosmo, page_content_column='prepared_text')
arxiv_documents = loader.load()

arxiv_documents[0]

Document(page_content="Non-Minimal Quintessence With Nearly Flat Potential \n We consider Brans-Dicke type nonminimally coupled scalar field as a candidate for dark energy. In the conformally transformed Einstein's frame, our model is similar to {\\it coupled quintessence} model. In such models, we consider potentials for the scalar field which satisfy the slow-roll conditions: $[(1/V)(dV/d\\phi)]^2 << 1$ and $(1/V)(d^2V/d\\phi^2) << 1$. For such potentials, we show that the equation of state for the scalar field can be described by a universal behaviour, provided the scalar field rolls only in the flat part of the potentials where the slow-roll conditions are satisfied. Our work generalizes the previous work by Scherrer and Sen \\cite{scherrer} for minimally coupled scalar field case. We have also studied the observational constraints on the model parameters considering the Supernova and BAO observational data.", metadata={'id': '0901.0173', 'title': 'Non-Minimal Quintessence With Nea

**Note** We can implement more refined chunking strategies here; another thing we can use to evaluate the RAG application, i.e.
* Keep the vector indexing strategy the same, change prompting strategy -> evaluate
* Or Change indexing strategy, keep prompting same -> evaluate

In [10]:
### Split the documents into smaller chunks

splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20) 
# Keeping this small initially, since these are just abstracts, not full paper text

chunked_docs = splitter.split_documents(arxiv_documents)

**Note** Parametrize this flow as much as possible; persist_directory, chunking params etc

In [11]:
### Create the vectordb and persist it 
# Takes < 15 mins to run on a Macbook M2 Pro 2023

# vectordb = Chroma.from_documents(documents=chunked_docs, embedding=embeddings, persist_directory="arxiv_cosmo_chroma_db_22k")
# vectordb.persist()

In [12]:
# If you want to load the persisted vectordb

vectordb = Chroma(persist_directory='./arxiv_cosmo_chroma_db_22k', embedding_function=embeddings)
retriever = vectordb.as_retriever()

### Create synthetic dataset using RAGAS

In [13]:
df_sample = df_cosmo.sample(1000, random_state=42) # 1k rows at random from original 22k dataset
df_sample.reset_index(drop=True, inplace=True)
df_sample.head()

Unnamed: 0,id,title,abstract,categories,cat_text,prepared_text
0,1503.06036,Redshift-space equal-time angular-averaged con...,We present the redshift-space generalization o...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Redshift-space equal-time angular-averaged con...
1,1910.04171,Weak Lensing Minima and Peaks: Cosmological Co...,We present a novel statistic to extract cosmol...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Weak Lensing Minima and Peaks: Cosmological Co...
2,1912.06601,Lensing-like tensions in the Planck legacy rel...,We analyze the final release of the Planck sat...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Lensing-like tensions in the Planck legacy rel...
3,1403.1089,A comparison of CMB Angular Power Spectrum Est...,In the context of cosmic microwave background ...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,A comparison of CMB Angular Power Spectrum Est...
4,1102.2234,"Gas inflows, star formation and metallicity ev...",It has been known since many decades that gala...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,"Gas inflows, star formation and metallicity ev..."


In [14]:
loader = DataFrameLoader(df_sample, page_content_column='prepared_text')
documents_for_synthesis = loader.load()

**Note** Many things here that can/should be experimented with:
* Generator and Critic LLMs used
* Embeddings
* Distributions (for different use-cases)
* Compare with another generation method (such as Bonito)

In [15]:
# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# generate testset
# Took about $5 to generate 
# testset = generator.generate_with_langchain_docs(documents_for_synthesis, test_size=25, \
                                                #  distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

# Commenting out to prevent running accidentally

In [16]:
# testset.test_data[0]

In [17]:
# df_test = testset.to_pandas()
# df_test

In [18]:
# testset.save('cosmo_ragas_testset_25.json')
# df_test.to_csv('cosmo_ragas_testset_25.csv', index=False) # Save for future use

In [19]:
df_test = pd.read_csv('cosmo_ragas_testset_25.csv')
df_test

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What is the potential of future SKA-era PTAs i...,['Prospects for Constraining interacting dark ...,The future SKA-era PTAs have the potential to ...,simple,"[{'id': '2210.04000', 'title': 'Prospects for ...",True
1,What are the potential constraints on the natu...,['Strong Lensing Time Delay Constraints on Dar...,,simple,"[{'id': '1910.03566', 'title': 'Strong Lensing...",True
2,How is a temperature map used to derive an ang...,['New evidence for lack of CMB power on large ...,A temperature map is used to derive an angular...,simple,"[{'id': '0911.4063', 'title': 'New evidence fo...",True
3,What are the star formation properties in the ...,['Star Formation Properties in Barred Galaxies...,Under the effects of both a stellar bar and a ...,simple,"[{'id': '1107.0187', 'title': 'Star Formation ...",True
4,What is the predicted red excess in galaxy col...,['Groups of two galaxies in SDSS: implications...,"0.15,$\pm$,0.01 and 0.14,$\pm$,0.01",simple,"[{'id': '1301.5870', 'title': 'Groups of two g...",True
5,How can upcoming photometric large scale struc...,['Optimising cosmic shear surveys to measure m...,We consider how upcoming photometric large sca...,simple,"[{'id': '1109.4536', 'title': 'Optimising cosm...",True
6,What are the cosmological constraints obtained...,['Weak lensing from space: first cosmological ...,The cosmological constraints obtained from the...,simple,"[{'id': '1005.4941', 'title': 'Weak lensing fr...",True
7,What is the average fraction of cold gas relat...,"[""High molecular gas fractions in normal massi...",The average fraction of cold gas relative to t...,simple,"[{'id': '1002.2149', 'title': 'High molecular ...",True
8,What is the relationship between rotation curv...,['Halo Gas and Galaxy Disk Kinematics Derived ...,The majority of the absorption velocities of M...,simple,"[{'id': '0912.2746', 'title': 'Halo Gas and Ga...",True
9,What is the focus of the XMM-Newton Wide-Field...,['The XMM-Newton Wide-Field Survey in the COSM...,The focus of the XMM-Newton Wide-Field Survey ...,simple,"[{'id': '1004.2790', 'title': 'The XMM-Newton ...",True


In [20]:
df_test.head()

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What is the potential of future SKA-era PTAs i...,['Prospects for Constraining interacting dark ...,The future SKA-era PTAs have the potential to ...,simple,"[{'id': '2210.04000', 'title': 'Prospects for ...",True
1,What are the potential constraints on the natu...,['Strong Lensing Time Delay Constraints on Dar...,,simple,"[{'id': '1910.03566', 'title': 'Strong Lensing...",True
2,How is a temperature map used to derive an ang...,['New evidence for lack of CMB power on large ...,A temperature map is used to derive an angular...,simple,"[{'id': '0911.4063', 'title': 'New evidence fo...",True
3,What are the star formation properties in the ...,['Star Formation Properties in Barred Galaxies...,Under the effects of both a stellar bar and a ...,simple,"[{'id': '1107.0187', 'title': 'Star Formation ...",True
4,What is the predicted red excess in galaxy col...,['Groups of two galaxies in SDSS: implications...,"0.15,$\pm$,0.01 and 0.14,$\pm$,0.01",simple,"[{'id': '1301.5870', 'title': 'Groups of two g...",True


**Note** Some ground-truths are nan (why?), remove these before proceeding

### Create the RAG Application

In [23]:
model_name = "sentence-transformers/all-MiniLM-l6-v2"
model_kwargs = {"device": "cpu"} # Since we are running on local machine, we will use CPU

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

vectordb = Chroma(persist_directory='./arxiv_cosmo_chroma_db_22k', embedding_function=embeddings)
retriever = vectordb.as_retriever()

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [24]:
def get_rag_response(question):

    # Define the RAG template and chain
    template = """"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question."
    "If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise."
    "\nQuestion: {question} \nContext: {context} \nAnswer:"
    """

    # The above prompt is the same as we get from prompt = hub.pull("rlm/rag-prompt")

    prompt = ChatPromptTemplate.from_template(template)

    chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    return chain.invoke(question)

In [25]:
get_rag_response("What is a Galaxy Cluster?")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


'A galaxy cluster is a large group of galaxies held together by gravity. They are the largest known gravitationally bound structures in the universe. Galaxy clusters can contain hundreds to thousands of galaxies.'

### Run the RAG application to get responses and append to the synthetic dataset

Reference: https://towardsdatascience.com/evaluating-rag-applications-with-ragas-81d67b0ee31a

In [43]:
df_test.dropna(subset=['ground_truth'], inplace=True)
df_test.reset_index(drop=True, inplace=True)
df_test.shape

(21, 6)

In [44]:
df_test

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What is the potential of future SKA-era PTAs i...,['Prospects for Constraining interacting dark ...,The future SKA-era PTAs have the potential to ...,simple,"[{'id': '2210.04000', 'title': 'Prospects for ...",True
1,How is a temperature map used to derive an ang...,['New evidence for lack of CMB power on large ...,A temperature map is used to derive an angular...,simple,"[{'id': '0911.4063', 'title': 'New evidence fo...",True
2,What are the star formation properties in the ...,['Star Formation Properties in Barred Galaxies...,Under the effects of both a stellar bar and a ...,simple,"[{'id': '1107.0187', 'title': 'Star Formation ...",True
3,What is the predicted red excess in galaxy col...,['Groups of two galaxies in SDSS: implications...,"0.15,$\pm$,0.01 and 0.14,$\pm$,0.01",simple,"[{'id': '1301.5870', 'title': 'Groups of two g...",True
4,How can upcoming photometric large scale struc...,['Optimising cosmic shear surveys to measure m...,We consider how upcoming photometric large sca...,simple,"[{'id': '1109.4536', 'title': 'Optimising cosm...",True
5,What are the cosmological constraints obtained...,['Weak lensing from space: first cosmological ...,The cosmological constraints obtained from the...,simple,"[{'id': '1005.4941', 'title': 'Weak lensing fr...",True
6,What is the average fraction of cold gas relat...,"[""High molecular gas fractions in normal massi...",The average fraction of cold gas relative to t...,simple,"[{'id': '1002.2149', 'title': 'High molecular ...",True
7,What is the relationship between rotation curv...,['Halo Gas and Galaxy Disk Kinematics Derived ...,The majority of the absorption velocities of M...,simple,"[{'id': '0912.2746', 'title': 'Halo Gas and Ga...",True
8,What is the focus of the XMM-Newton Wide-Field...,['The XMM-Newton Wide-Field Survey in the COSM...,The focus of the XMM-Newton Wide-Field Survey ...,simple,"[{'id': '1004.2790', 'title': 'The XMM-Newton ...",True
9,How does the patchiness in the spatial distrib...,['Inevitable imprints of patchy reionization o...,The patchiness in the spatial distribution of ...,simple,"[{'id': '2005.05327', 'title': 'Inevitable imp...",True


In [45]:
questions = df_test['question'].tolist()
questions

['What is the potential of future SKA-era PTAs in detecting supermassive black hole binaries?',
 'How is a temperature map used to derive an angular power spectrum in the study of the CMB power on large scales?',
 'What are the star formation properties in the barred galaxy NGC 7479?',
 'What is the predicted red excess in galaxy colour for the delayed-then-rapid star formation quenching scenario in the -19 and -20 samples?',
 'How can upcoming photometric large scale structure surveys be optimized to measure modifications to gravity on cosmic scales?',
 'What are the cosmological constraints obtained from the measurement of three-point shear statistics in weak lensing from space?',
 'What is the average fraction of cold gas relative to total galaxy baryonic mass in typical massive star forming galaxies at <z>~1.2 and 2.3?',
 'What is the relationship between rotation curves and MgII absorption selected galaxies at intermediate redshift?',
 'What is the focus of the XMM-Newton Wide-Fie

In [46]:
question = questions[0]
question

'What is the potential of future SKA-era PTAs in detecting supermassive black hole binaries?'

In [47]:
get_rag_response(question)

'The potential of future SKA-era PTAs in detecting supermassive black hole binaries is significant. These PTAs have the ability to detect Nanohertz gravitational waves generated by individual inspiraling supermassive black hole binaries. The SKA will collect gravitational wave signals from thousands of massive systems, being able to individually resolve and locate several of them.'

In [48]:
retriever.get_relevant_documents(question)[0]

Document(page_content='Electromagnetic signatures of supermassive black hole binaries resolved   by PTAs', metadata={'abstract': "Pulsar timing arrays (PTAs) may eventually be able to detect not only the stochastic gravitational-wave (GW) background of SMBH binaries, but also individual, particularly massive binaries whose signals stick out above the background. In this contribution, we discuss the possibility of identifying and studying such `resolved' binaries through their electromagnetic emission. The host galaxies of such binaries are themselves expected to be also very massive and rare, so that out to redshifts z~2 a unique massive galaxy may be identified as the host. At higher redshifts, the PTA error boxes are larger and may contain as many as several hundred massive-galaxy interlopers. In this case, the true counterpart may be identified, if it is accreting gas efficiently, as an active galactic nucleus (AGN) with a peculiar spectrum and variable emission features. Specifical

In [62]:
responses=[]
contexts=[]

# Inference
for query in questions:
  responses.append(get_rag_response(query))
  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

In [50]:
responses[0]

'The potential of future SKA-era PTAs in detecting supermassive black hole binaries is significant. These PTAs have the ability to detect Nanohertz gravitational waves generated by individual inspiraling supermassive black hole binaries. The SKA will collect gravitational wave signals from thousands of massive systems, being able to individually resolve and locate several of them.'

In [63]:
contexts[0]

['Electromagnetic signatures of supermassive black hole binaries resolved   by PTAs',
 'parameters. In this paper, we analyze the ability of future SKA-era PTAs to detect the existing SMBHBs candidates assuming the root mean square of timing noise $\\sigma_t=20\\ {\\rm ns}$, and use the',
 'arrays (PTAs), are particularly appealing multimessenger carriers. According to current models for massive black hole formation and evolution, the planned Square Kilometer Array (SKA) will collect',
 'The Future of Direct Supermassive Black Hole Mass Estimates']

In [52]:
ground_truths = df_test['ground_truth'].tolist()
ground_truths[0]

'The future SKA-era PTAs have the potential to detect Nanohertz gravitational waves (GWs) generated by the individual inspiraling supermassive black hole binaries (SMBHBs) in the galactic centers.'

**Note** Incorporate these retrieved contexts now, for later try and use the contexts in the synthetic dataset -> What is the difference?

In [61]:
df_test.iloc[0]['contexts']

"['Prospects for Constraining interacting dark energy cosmology with   gravitational-wave bright sirens detected by future SKA-era pulsar timing   arrays \\n Pulsar timing arrays (PTAs) have the potential to detect Nanohertz gravitational waves (GWs) that are usually generated by the individual inspiraling supermassive black hole binaries (SMBHBs) in the galactic centers. The GW signals as cosmological standard sirens can provide the absolute cosmic distances, thereby can be used to constrain the cosmological parameters. In this paper, we analyze the ability of future SKA-era PTAs to detect the existing SMBHBs candidates assuming the root mean square of timing noise $\\\\sigma_t=20\\\\ {\\\\rm ns}$, and use the simulated PTA data to constrain the interacting dark energy (IDE) models with energy transfer rate $Q = \\\\beta H\\\\rho_c$. We find that, the future SKA-era PTAs will play an important role in constraining the IDE cosmology. Using only the mock PTA data consisting of 100 pulsa

In [69]:
data = {'question': questions, 'answer': responses, 'contexts': contexts, 'ground_truth': ground_truths  }
dataset_for_eval = Dataset.from_dict(data)

dataset_for_eval

Dataset({
    features: ['question', 'answer', 'contexts', 'ground_truth'],
    num_rows: 21
})

In [72]:
result = evaluate(
    dataset = dataset_for_eval, 
    metrics=[
        context_precision,
        context_recall,
        faithfulness,
        answer_relevancy,
    ],
)

df = result.to_pandas()

Evaluating:  14%|█▍        | 12/84 [00:12<01:00,  1.20it/s]Failed to parse output. Returning None.
Evaluating:  21%|██▏       | 18/84 [00:14<00:40,  1.63it/s]No statements were generated from the answer.
Evaluating:  33%|███▎      | 28/84 [00:17<00:27,  2.04it/s]No statements were generated from the answer.
Evaluating:  94%|█████████▍| 79/84 [00:32<00:01,  3.75it/s]Failed to parse output. Returning None.
Evaluating: 100%|██████████| 84/84 [00:36<00:00,  2.28it/s]


In [73]:
df

Unnamed: 0,question,answer,contexts,ground_truth,context_precision,context_recall,faithfulness,answer_relevancy
0,What is the potential of future SKA-era PTAs i...,The potential of future SKA-era PTAs in detect...,[Electromagnetic signatures of supermassive bl...,The future SKA-era PTAs have the potential to ...,1.0,1.0,1.0,1.0
1,How is a temperature map used to derive an ang...,A temperature map is used to derive an angular...,[Level correlations of the CMB temperature ang...,A temperature map is used to derive an angular...,1.0,0.0,1.0,0.930677
2,What are the star formation properties in the ...,The star formation properties in the barred ga...,[Galaxy Evolution Explorer and infrared data f...,Under the effects of both a stellar bar and a ...,1.0,1.0,1.0,0.98376
3,What is the predicted red excess in galaxy col...,The predicted red excess in galaxy colour for ...,"[in a group of two, we find a red excess attri...","0.15,$\pm$,0.01 and 0.14,$\pm$,0.01",1.0,1.0,0.8,1.0
4,How can upcoming photometric large scale struc...,To optimize upcoming photometric large scale s...,[We consider how upcoming photometric large sc...,We consider how upcoming photometric large sca...,1.0,1.0,1.0,0.931891
5,What are the cosmological constraints obtained...,The cosmological constraints obtained from the...,[Weak lensing from space: first cosmological c...,The cosmological constraints obtained from the...,1.0,1.0,0.571429,0.999999
6,What is the average fraction of cold gas relat...,The average fraction of cold gas relative to t...,[cosmic epoch. The average fraction of cold ga...,The average fraction of cold gas relative to t...,1.0,1.0,1.0,0.994973
7,What is the relationship between rotation curv...,The relationship between rotation curves and M...,[galaxies at low to moderate redshift. An anal...,The majority of the absorption velocities of M...,1.0,1.0,0.25,0.991793
8,What is the focus of the XMM-Newton Wide-Field...,The focus of the XMM-Newton Wide-Field Survey ...,[We report the final optical identifications o...,The focus of the XMM-Newton Wide-Field Survey ...,1.0,1.0,0.5,1.0
9,How does the patchiness in the spatial distrib...,The patchiness in the spatial distribution of ...,[and spatial fluctuations in the electron dens...,The patchiness in the spatial distribution of ...,1.0,1.0,0.4,0.947822


#### So we have the final RAG evaluations table; next step is to spend some time understanding these metrics and the assumptions behind them. 

### Things to Explore::
* Different LLMs, generator, critic, RAG
* How do we summarize performance?
* Should we consider type of evolution while summarizing?
* Most efficient way to write a pipeline from this