### In this notebook, we build a simple RAG application using the Cosmology data that we download from the Arxiv dataset

### The notebook is divided into 2 parts::
* **Part 1**: We test out a minimal RAG chatbot (No memory) with the Context augmented LLM with several of the latest Cosmology papers from Arxiv (that are not a part of the training corpus of the model, as of March, 2024)
* **Part 2**: We try a context retrieval search 

### Techstack: 
* LangChain - Framework
* Mixtral-8x7B from NVIDIA - LLM 
* Chromadb - Vector database
* all-MiniLM-L6-v2 - Embedding Model

In [1]:
import pandas as pd
import textwrap
from openai import OpenAI

from langchain_community.vectorstores import Chroma
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain.embeddings import HuggingFaceEmbeddings

from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [2]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

# openai_api_key = os.environ['OPENAI_API_KEY']
# hf_api_key = os.environ['HF_API_KEY']

groq_api_key = os.environ['GROQ_API_KEY']
nvidia_api_key = os.environ['NVIDIA_API_KEY']   

### Load in the vectordB that we build with the ~66k arxiv cosmology title+abstracts

In [3]:
# Get the embedding model, we need this again to load in the persisted vectordb

model_name = "sentence-transformers/all-MiniLM-l6-v2" #"BAAI/bge-small-en-v1.5"#"sentence-transformers/all-MiniLM-l6-v2" #"sentence-transformers/all-mpnet-base-v2"
# bge-base-en-v1.5 or bge-small taking too much time for all the cosmo docs, ~66k
model_kwargs = {"device": "cpu"} # Since we are running on local machine, we will use CPU

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
vectordb = Chroma(persist_directory='./arxiv_cosmo_chroma_db', embedding_function=embeddings)
retriever = vectordb.as_retriever()

## Part 1: Test out a minimal RAG "chatbot" (No memory) with the given context

### Load in Mixtral 8x7B from the LangChain and NVIDIA integration, and build the RAG application
https://build.nvidia.com/mistralai/mixtral-8x7b-instruct

In [5]:
llm = ChatNVIDIA(model="mixtral_8x7b")

rag_template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
rag_prompt = ChatPromptTemplate.from_template(rag_template)
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

In [6]:
# Test it out

response = rag_chain.invoke("What is a Galaxy Cluster?")
print(textwrap.fill(response, width=80))

A galaxy cluster is a large-scale structure in the universe consisting of
hundreds to thousands of galaxies, dark matter, and hot gas that fills the space
in between the galaxies. They are the most massive gravitationally bound objects
in the universe. The hot gas in the cluster can be studied through its X-ray
emission, which is produced as the gas is heated to high temperatures by the
deep gravitational potential of the cluster. The study of galaxy clusters is
important for understanding the formation and evolution of large-scale
structures in the universe, as well as the properties of dark matter and dark
energy.


In [7]:
response = rag_chain.invoke("What is the Cosmological Constant?")
print(textwrap.fill(response, width=80))

The cosmological constant is a term in Einstein's field equations of general
relativity that represents the energy density of empty space. It is often
associated with the concept of dark energy, which is thought to be responsible
for the observed accelerated expansion of the universe. However, the nature and
precise value of the cosmological constant are still active areas of research in
cosmology and theoretical physics. The documents provided discuss various
aspects of the cosmological constant, including its role in cosmological
perturbation theory, its impact on the cosmological evolution of anisotropic
universes, and the possibility of time-dependent dark energy as an alternative
to the cosmological constant.


### We will evaluate this properly later, but as a sanity check, let's test out the same queries on the Mixtral 8x7B model without any RAG context

In [8]:
# Define the method
def query_no_context(question):
    client = OpenAI(
        base_url="https://integrate.api.nvidia.com/v1",
        api_key=nvidia_api_key  # Make sure 'nvidia_api_key' is defined or passed as an argument
    )

    completion = client.chat.completions.create(
        model="mistralai/mixtral-8x7b-instruct-v0.1",
        messages=[{"role": "user", "content": question}],
        temperature=0.5,
        top_p=1,
        max_tokens=1024,
        stream=True
    )

    output = ""
    for chunk in completion:
        if chunk.choices[0].delta.content is not None:
            output += chunk.choices[0].delta.content

    # `output` contains the complete summed up output.
    return output


In [9]:
# Example usage
print(textwrap.fill(query_no_context("What is a Galaxy Cluster?"), width=80))

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


 A galaxy cluster is a large collection of galaxies bound together by gravity.
It is the largest known gravitationally bound structure in the universe and can
contain hundreds to thousands of galaxies, along with hot gas and dark matter.
Galaxy clusters are typically classified into several types based on their
properties, such as their X-ray emission, mass, and distribution of galaxies.
Some common types of galaxy clusters include cool-core clusters, non-cool-core
clusters, and fossil groups.  Cool-core clusters have a high density of hot gas
in their centers, which emits X-rays. Non-cool-core clusters, on the other hand,
have lower densities of hot gas in their centers and do not exhibit the same
level of X-ray emission. Fossil groups are galaxy clusters that have a large
amount of dark matter and a dominant, bright elliptical galaxy, but lack a
significant number of other galaxies.  Galaxy clusters are important for
studying the large-scale structure of the universe and the properti

In [10]:
# Example usage
print(textwrap.fill(query_no_context("What is the Cosmological Constant?"), width=80))

 The cosmological constant is a term that was originally included in Einstein's
equations of general relativity to allow for a static, unchanging universe. At
the time, it was believed that the universe was static and eternal, and
Einstein's equations predicted that the universe should either be expanding or
contracting. To reconcile this discrepancy, Einstein introduced the cosmological
constant as a repulsive force that would counteract the attractive force of
gravity and keep the universe static.  However, subsequent observations by Edwin
Hubble and others showed that the universe is actually expanding, and the
cosmological constant was abandoned by many physicists. In recent years,
however, the cosmological constant has made a comeback in the form of dark
energy, a mysterious substance that is thought to be responsible for the
accelerated expansion of the universe.  The cosmological constant is often
associated with the energy of empty space, or vacuum energy, and it is measured
to

### The problem with these 2 questions (Galaxy Cluster, Cosmological Constant) is that they are well-established areas of research and were likely present in whatever corpus was used to train the Mixtral 8x7B model. Let us instead check how well the context augmentation is working by asking a question from a more recent paper

In [11]:
df_data = pd.read_csv('arxiv_astro-ph_data_cosmo.csv')
df_data = df_data.loc[df_data['categories']=='astro-ph.CO']

df_data = df_data.reset_index(drop=True)

df_data.tail(10) # 10 most recent Cosmology papers in the dataset, not cross-disciplinary

Unnamed: 0,id,title,abstract,categories,cat_text,prepared_text
21664,2403.13068,Compatibility of JWST results with exotic halos,The James Webb Space Telescope (JWST) is unvei...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Compatibility of JWST results with exotic halo...
21665,2403.13418,Constraint on magnetized galactic outflows fro...,"Outflows from galaxies, driven by active galac...",astro-ph.CO,Cosmology and Nongalactic Astrophysics,Constraint on magnetized galactic outflows fro...
21666,2403.13709,New measurements of $E_G$: Testing General Rel...,We combine measurements of galaxy velocities f...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,New measurements of $E_G$: Testing General Rel...
21667,2403.13768,Disentangling the anisotropic radio sky: Fishe...,The existence of a radio synchrotron backgroun...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Disentangling the anisotropic radio sky: Fishe...
21668,2403.13794,"Cosmic shear with small scales: DES-Y3, KiDS-1...",We present a cosmological analysis of the comb...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,"Cosmic shear with small scales: DES-Y3, KiDS-1..."
21669,2403.13885,The DEHVILS in the Details: Type Ia Supernova ...,Measurements of Type Ia Supernovae (SNe Ia) in...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,The DEHVILS in the Details: Type Ia Supernova ...
21670,2403.1406,Inferring astrophysical parameters using the 2...,Enlightening our understanding of the first ga...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Inferring astrophysical parameters using the 2...
21671,2403.14061,Exploring the role of the halo mass function f...,The detection of the 21-cm signal at $z\gtrsim...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Exploring the role of the halo mass function f...
21672,2403.14165,Improving SDSS Cosmological Constraints throug...,The $\beta$-skeleton approach can be convenien...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Improving SDSS Cosmological Constraints throug...
21673,2403.1458,Tomographic redshift dipole: Testing the cosmo...,The cosmological principle posits that the uni...,astro-ph.CO,Cosmology and Nongalactic Astrophysics,Tomographic redshift dipole: Testing the cosmo...


In [12]:
print(df_data.loc[df_data['id']=='2403.13068']['prepared_text'].iloc[0]) # Q1

Compatibility of JWST results with exotic halos 
 The James Webb Space Telescope (JWST) is unveiling astounding results about the first few hundred million years of life of the Universe, delivering images of galaxies at very high redshifts. Here, we develop a UV luminosity function model for high-redshift galaxies, considering parameters such as the stellar formation rate, dust extinction, and halo mass function. Calibration of this luminosity function model using UV luminosity data at redshifts z = 4-7 yields optimal parameter values. Testing the model against data at higher redshifts reveals successful accommodation of the data at z = 8-9, but challenges emerge at z~13. Our findings suggest a negligible role of dust extinction at the highest redshifts, prompting a modification of the stellar formation rate to incorporate a larger fraction of luminous objects per massive halo, consistently with similar recent studies. This effect could be attributed to mundane explanations such as unk

In [13]:
print(df_data.loc[df_data['id']=='2403.13709']['prepared_text'].iloc[0]) # Q2

New measurements of $E_G$: Testing General Relativity with the Weyl   potential and galaxy velocities 
 We combine measurements of galaxy velocities from galaxy surveys with measurements of the Weyl potential from the Dark Energy Survey to test the consistency of General Relativity at cosmological scales. Taking the ratio of two model-independent observables - the growth rate of structure and the Weyl potential - we obtain new measurements of the $E_G$ statistic with precision of $5.8-10.7\%$ at four different redshifts. These measurements provide a considerable improvement to past measurements of $E_G$. They confirm the validity of General Relativity at three redshifts, while displaying a tension of $2.5\sigma$ at $z=0.47$ as a consequence of the tension found in the measurements of the Weyl potential. Contrary to conventional methods that rely on a common galaxy sample with spectroscopic resolution to measure two types of correlations, we directly combine two observables that are ind

In [14]:
print(df_data.loc[df_data['id']=='2403.14580']['prepared_text'].iloc[0]) # Q3

Tomographic redshift dipole: Testing the cosmological principle 
 The cosmological principle posits that the universe is statistically homogeneous and isotropic on large scales, implying all matter share the same rest frame. This principle suggests that velocity estimates of our motion from various sources should agree with the cosmic microwave background (CMB) dipole's inferred velocity of 370 km/s. Yet, for over two decades, analyses of different radio galaxy and quasar catalogs have reported velocities with amplitudes in notable tension with the CMB dipole. In a blind analysis of BOSS and eBOSS spectroscopic data from galaxies and quasars across $0.2<z<2.2$, we applied a novel dipole estimator for a tomographic approach, robustly correcting biases and quantifying uncertainties with state-of-the-art mock catalogs. Our results, indicating a velocity of $v = 353^{+123}_{-111}$ km/s, closely align with the CMB dipole, demonstrating a $1.4\sigma$ agreement. This finding provides signific

### Now, let us test out these questions one by one 

#### Q1. JWST and exotic high-z objects

In [15]:
# No Context

print(textwrap.fill(query_no_context("What does the James Webb Space Telescope tell us about exotic objects at high redshift?"), width=120))

 The James Webb Space Telescope (JWST), set to launch in 2021, will be a powerful tool for observing the universe,
especially exotic objects at high redshift. Redshift is a measure of how much an object's light has been stretched out
by the expansion of the universe, with higher redshifts corresponding to objects that are farther away and observed from
earlier times in the universe's history.  Exotic objects at high redshift include the first generation of stars, known
as Population III stars, as well as black holes, galaxies, and other cosmic phenomena that formed in the early universe.
These objects are of particular interest to astronomers because they can provide insights into the conditions and
processes that governed the universe's formation and evolution.  The JWST will be able to observe exotic objects at high
redshift in several ways. First, its large mirror and advanced instruments will allow it to collect more light from
faint, distant objects than previous telescopes. This 

In [16]:
# With context

print(textwrap.fill(rag_chain.invoke("What does the James Webb Space Telescope tell us about exotic objects at high redshift?"), width=120))

Based on the provided documents, the James Webb Space Telescope (JWST) is expected to revolutionize our understanding of
the high-redshift Universe, including exotic objects such as population III stars, dark stars, and population III
galaxies. However, many of these sources are likely to be intrinsically too faint for JWST to detect directly. To
improve the chances of detecting these exotic objects, researchers propose pointing JWST through foreground lensing
clusters, which can reach significantly deeper than the currently planned JWST ultra deep field in just a fraction of
the exposure time, although at the expense of probing a much smaller volume of the high-redshift Universe.
Additionally, researchers have developed a spectral synthesis code called Yggdrasil to model the first galaxies and
derive the masses of the faintest pop I, II, and III galaxies that can be detected through broadband imaging in JWST
ultra deep fields.  The JWST may also help detect high-redshift dark stars, w

#### Q2. Testing General Relativity with the Weyl potential

In [17]:
# No context

print(textwrap.fill(query_no_context("How can we test GR at cosmological scales with the Weyl potential"), width=120))

 The Weyl potential is a scalar potential that describes the curvature of spACetime and is often used to study the
behavior of gravitational systems on large scales, such as in cosmology. In general relativity (GR), the Weyl potential
is related to the matter Lagrangian through the Einstein field equations. Therefore, one way to test GR at cosmological
scales with the Weyl potential is to compare the predictions of GR for the Weyl potential with observations of the
large-scale structure of the universe.  One approach to doing this is to use observations of the cosmic microwave
background (CMB) radiation, which is the afterglow of the Big Bang and provides a snapshot of the universe when it was
only 380,000 years old. The CMB is sensitive to the curvature of spACetime on large scales, and so it provides a way to
measure the Weyl potential at early times. By comparing the measured CMB power spectrum with the predictions of GR for
the Weyl potential, it is possible to test the validity of

In [18]:
# With context

print(textwrap.fill(rag_chain.invoke("How can we test GR at cosmological scales with the Weyl potential"), width=120))

We can test General Relativity (GR) at cosmological scales with the Weyl potential by combining measurements of galaxy
velocities from galaxy surveys with measurements of the Weyl potential from the Dark Energy Survey. By taking the ratio
of two model-independent observables - the growth rate of structure and the Weyl potential - we can obtain new
measurements of the $E_G$ statistic with improved precision. These measurements can confirm the validity of GR at
certain redshifts, while also displaying any tensions that may exist as a result of differences in the measurements of
the Weyl potential. Additionally, by directly combining two observables that are independent of the galaxy bias, we can
test the relation between the geometry of our Universe and the motion of galaxies with improved precision.  Furthermore,
the Weyl potential can be used to test the theory of gravity and the validity of the $\Lambda$CDM model, as it provides
a direct way of measuring the spatial and temporal disto

#### Q3. Testing the Cosmological Principle using the CMB Dipole

In [19]:
# No context

print(textwrap.fill(query_no_context("What can the CMB dipole tell us about the validity of the Cosmological principle?"), width=120))

 The Cosmic Microwave Background (CMB) dipole is an important observation that provides evidence for the validity of the
Cosmological Principle, which states that the universe is homogeneous and isotropic on large scales. The CMB dipole is a
small anisotropy in the CMB temperature distribution, which is interpreted as a Doppler shift caused by the motion of
the Earth relative to the rest frame of the CMB.  The amplitude and direction of the CMB dipole are consistent with the
motion of the Solar System with respect to the cosmic rest frame, as determined by observations of galaxy velocities.
This agreement provides strong evidence that the Cosmological Principle is a good approximation on large scales, since
it suggests that the observed anisotropy is not due to any intrinsic inhomogeneities in the distribution of matter and
energy in the universe, but rather to our own motion with respect to the cosmic rest frame.  However, it is important to
note that the Cosmological Principle is an 

In [20]:
# With context

print(textwrap.fill(rag_chain.invoke("What can the CMB dipole tell us about the validity of the Cosmological principle?"), width=120))

The CMB dipole, if it is found to be consistent with the number count dipole from other observations such as quasars and
radio sources, would support the validity of the Cosmological Principle. However, if there is a significant discrepancy
between the CMB dipole and the number count dipole from other observations, it could indicate that the Cosmological
Principle is not valid. The tension between the CMB dipole and the number count dipole from other observations is
currently an open issue in the standard cosmological model.  Additionally, it is also mentioned that gravitational waves
(GWs) from compact binary mergers detected by the future next-generation detectors Einstein Telescope and Cosmic
Explorer can be used to detect and estimate the cosmic dipole. A GW dipole consistent with the amplitude of the dipole
in radio galaxies would be detectable with >3σ significance with a few years of observation and estimated with a 16%
precision, while a GW dipole consistent with the CMB one wo

### So, in all 3 cases, using RAG results in very context specific refined answers, as opposed to the more generic answers from the Mixtral-8x7B model without context

## Part 2: Test out a context retrieval (semantic) search 

**Note** For this part we don't need the LLM, just the embedding model, so there is the scope to compare different embedding models here

**Also Note** The similarity score returned here is the L2 distance from the query to the relevant document, so lower score is better

In [32]:
def search_vectordb_and_format_output(query, k, vectordb=vectordb):
    # Perform the similarity search
    results = vectordb.similarity_search_with_score(query, k=k)
    
    # Initialize an empty list to hold formatted results
    formatted_results = []
    
    # Iterate over the results to extract and format the desired information
    for doc, score in results:
        formatted_result = {
            'paper_id': doc.metadata['id'],
            'paper_title': doc.metadata['title'],
            'similarity_score': score
        }
        formatted_results.append(formatted_result)
    
    # Return or print the formatted results
    return formatted_results # Top k results

In [34]:
query = "What is a galaxy cluster?"
k=5

formatted_results = search_vectordb_and_format_output(query, k=k)


print(query)
print()

# Printing the results
for result in formatted_results:
    print(f"Paper ID: {result['paper_id']}, Paper Title: {result['paper_title']}, Similarity Score: {result['similarity_score']}")
    print()

What is a galaxy cluster?

Paper ID: 1906.07683, Paper Title: An Accurate Fitting Function For Scale-dependent Growth Rate in   Hu-Sawicki $f(R)$ Gravity, Similarity Score: 0.2684834599494934

Paper ID: 1006.3381, Paper Title: Central gas entropy excess as a direct evidence for AGN feedback in   galaxy groups and clusters, Similarity Score: 0.3103845417499542

Paper ID: 1108.5736, Paper Title: Halo Contraction Effect in Hydrodynamic Simulations of Galaxy Formation, Similarity Score: 0.330649733543396

Paper ID: 1908.05275, Paper Title: New Limits on Charged Dark Matter from Large-Scale Coherent Magnetic   Fields, Similarity Score: 0.33562418818473816

Paper ID: 2306.11205, Paper Title: The 'spectral index-flux density relation' for extragalactic radio   sources selected at metre and decametre wavelengths, Similarity Score: 0.3447471559047699



In [35]:
query = "What is the cosmological constant?"
k=5

formatted_results = search_vectordb_and_format_output(query, k=k)

print(query)
print()

# Printing the results
for result in formatted_results:
    print(f"Paper ID: {result['paper_id']}, Paper Title: {result['paper_title']}, Similarity Score: {result['similarity_score']}")
    print()

What is the cosmological constant?

Paper ID: 1307.6270, Paper Title: Cosmological post-Newtonian equations from nonlinear perturbation theory, Similarity Score: 0.1184452623128891

Paper ID: 1505.04782, Paper Title: Relativistic perturbations in $\Lambda$CDM: Eulerian & Lagrangian   approaches, Similarity Score: 0.13687889277935028

Paper ID: 2212.03234, Paper Title: Spatially Homogeneous Universes with Late-Time Anisotropy, Similarity Score: 0.13687889277935028

Paper ID: 1709.04046, Paper Title: Crack in the cosmological paradigm, Similarity Score: 0.15486614406108856

Paper ID: 1309.5444, Paper Title: Nonstandard cosmology, Similarity Score: 0.1550758332014084



### Now let's try out the 3 questions we tested out the RAG Mixtral-8x7B application with; the most recent papers should have the lowest similarity score (i.e. they should be closest semantically to the query)

In [38]:
query_1 = "What does the James Webb Space Telescope tell us about exotic objects at high redshift?"
k=5

formatted_results = search_vectordb_and_format_output(query_1, k=k)

print(query_1)
print()

# Printing the results
for result in formatted_results:
    print(f"Paper ID: {result['paper_id']}, Paper Title: {result['paper_title']}, Similarity Score: {result['similarity_score']}")
    print()

What does the James Webb Space Telescope tell us about exotic objects at high redshift?

Paper ID: 1510.02101, Paper Title: Detectability of Local Group Dwarf Galaxy Analogues at High Redshifts, Similarity Score: 0.43622350692749023

Paper ID: 1101.4033, Paper Title: Pointing the James Webb Space Telescope through lensing clusters - can   the first stars and galaxies be detected?, Similarity Score: 0.46487942337989807

Paper ID: 2208.11456, Paper Title: Morpheus Reveals Distant Disk Galaxy Morphologies with JWST: The First   AI/ML Analysis of JWST Images, Similarity Score: 0.5435341000556946

Paper ID: 1002.3368, Paper Title: Finding high-redshift dark stars with the James Webb Space Telescope, Similarity Score: 0.5453366041183472

Paper ID: 2212.06575, Paper Title: Cosmological Model Tests with JWST, Similarity Score: 0.549606204032898



Interestingly, the recent JWST paper, which does seem intuitively to be most relevant, does not show up in the top 5 results. Is this expected? Or is this a deficiency of how the RAG application is constructed? (embedding model, chunking strategies, etc) **Something to explore**

In [39]:
query_2 = "How can we test GR at cosmological scales with the Weyl potential"
k=5

formatted_results = search_vectordb_and_format_output(query_2, k=k)

print(query_2)
print()

# Printing the results
for result in formatted_results:
    print(f"Paper ID: {result['paper_id']}, Paper Title: {result['paper_title']}, Similarity Score: {result['similarity_score']}")
    print()

How can we test GR at cosmological scales with the Weyl potential

Paper ID: 1808.06923, Paper Title: Testing Weyl Gravity at Galactic and Extra-galactic Scales, Similarity Score: 0.46653926372528076

Paper ID: 2403.13709, Paper Title: New measurements of $E_G$: Testing General Relativity with the Weyl   potential and galaxy velocities, Similarity Score: 0.5224183797836304

Paper ID: 2403.13709, Paper Title: New measurements of $E_G$: Testing General Relativity with the Weyl   potential and galaxy velocities, Similarity Score: 0.5355419516563416

Paper ID: 2312.06434, Paper Title: First measurement of the Weyl potential evolution from the Year 3 Dark   Energy Survey data: Localising the $\sigma_8$ tension, Similarity Score: 0.5532782077789307

Paper ID: 2403.13709, Paper Title: New measurements of $E_G$: Testing General Relativity with the Weyl   potential and galaxy velocities, Similarity Score: 0.5532880425453186



In [40]:
query_3 = "What can the CMB dipole tell us about the validity of the Cosmological principle?"
k=5

formatted_results = search_vectordb_and_format_output(query_3, k=k)

print(query_3)
print()

# Printing the results
for result in formatted_results:
    print(f"Paper ID: {result['paper_id']}, Paper Title: {result['paper_title']}, Similarity Score: {result['similarity_score']}")
    print()

What can the CMB dipole tell us about the validity of the Cosmological principle?

Paper ID: 2305.06771, Paper Title: Testing the Cosmological Principle: On the Time Dilation of Distant   Sources, Similarity Score: 0.3361070156097412

Paper ID: 2401.17945, Paper Title: Euclid preparation. The Near-IR Background Dipole Experiment with Euclid, Similarity Score: 0.36623719334602356

Paper ID: 2106.05284, Paper Title: A new way to test the Cosmological Principle: measuring our peculiar   velocity and the large scale anisotropy independently, Similarity Score: 0.45178478956222534

Paper ID: 2209.11658, Paper Title: Detection and estimation of the cosmic dipole with the Einstein   Telescope and Cosmic Explorer, Similarity Score: 0.4780294597148895

Paper ID: 2203.03956, Paper Title: Probing the Anisotropic Universe with Gravitational Waves, Similarity Score: 0.47845783829689026



For Q3 as well, the 2403... paper does not show up in the top results, however, this does seem to be a more general question than the JWST one