<a href="https://colab.research.google.com/github/g-larios/arXiv_RAG/blob/main/Author_Career_Summary_Using_ArXiv_and_Gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Query based on author information

For an author based query, the arXiv API needs

- full_name: author's full name. The expected format is 'first name' + 'middle name' + 'surname', separated with spaces and with middle name possibly null.

- cat: category in (astro-ph, cond-mat, gr-qc, hep-ex, hep-lat, hep-th,hep-ph, math-ph, nlin, nucl-ex, nucl-th, physics, quant-ph, math, CoRR, q-bio, q-fin, stat, eess, econ). See https://arxiv.org/category_taxonomy for details

# Installing Packages and importing relevant Imports

In [1]:
%pip install -q feedparser

%pip install -q langchain
%pip install -q langchain-community
%pip install -qU google-generativeai
%pip install -qU langchain-google-genai

  Preparing metadata (setup.py) ... [?25l[?25hdone
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/81.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m71.7/81.3 kB[0m [31m6.7 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m71.7/81.3 kB[0m [31m6.7 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m71.7/81.3 kB[0m [31m6.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.3/81.3 kB[0m [31m437.3 kB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m62.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m80.6 MB/s[0m eta [36m0:00:00[0m
[2K

In [2]:
import urllib, urllib.request
import feedparser
import os
import json
import textwrap
import math
import getpass

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate

# Set up the LLM Model and Langchain Chain

Here we will use Gemini model to do our inference and use langchain to create a small prompt pipeline.

In [None]:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google AI API key: ")

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0,
    max_tokens=50000,
    timeout=None,
    max_retries=5,
    # other params..
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a analyst that specializes in understanding the topic of interest of a given author or time period. "
    "You are given a set of papers by a certain author and you are to give a report on how the interest of this author have changed over time. The information is in a .json file format. "
    "The information contains the id of the paper, the date it was published, list of authors, and a summary of the paper. "
    "Create a citation of papers with proper bibliography to support why you think the author worked on the topic you state. "
    "Put the references at the end of and use numbers to cite through the body. "
    #"Cite the paper using the id to support why you think the author worked on the topic you state. "
    "Do not summaries the author's papers. " #but only give a description of the progression of the interest of the author throughout the years.
    "Give an in-depth summary of their career but keep it concise. "
    "Focus only on the author the user asked about and disregard any papers that do not contain that author's name."),
    ("user", "{Prompt}\nContext:\n{Context}"),
    ])

chain = prompt | llm

# Query ArXiv for Papers

We use a given author's name to pull at max {max_results} number of papers from ArXiv using their API

In [3]:
# Query parameters
full_name = 'Christopher N. Pope'

alias = "C. N. Pope"
category = 'hep-th'

names = [full_name, alias]

base_url = 'http://export.arxiv.org/api/query?'
max_results=1000

search_query = f'au:{"+".join(full_name.split())}+AND+cat:{category}&sortBy=submittedDate&sortOrder=descending'

# Query
query = 'search_query=%s&max_results=%i' % (search_query,max_results)

data = urllib.request.urlopen(base_url+query)
feed = feedparser.parse(data.read().decode('utf-8'))

### Removing queries that don't contain the Name or Alias

In [4]:
output = []

# Select and record relevant information for each entry, if the query author is among the authors of the entry
for paper in feed.entries:
    paper_info = {
        'id' : paper.id.split('/abs/')[-1],
        'published' : paper.published,
        'authors' : [aut['name'] for aut in paper.authors],
        'title': paper.title,
        'summary': paper.summary
        }
    if full_name in paper_info['authors'] or alias in paper_info["authors"]:
        output.append(paper_info)


### Print out some of the papers

In [5]:
for papers in output[-5:-1]:
    print('\n'.join(textwrap.wrap(str(papers), 150, break_long_words=False)) + "\n")

{'id': 'hep-th/9112076v1', 'published': '1991-12-31T18:22:14Z', 'authors': ['C. N. Pope'], 'title': 'Lectures on W algebras and W gravity', 'summary':
'We give a review of the extended conformal algebras, known as $W$ algebras,\nwhich contain currents of spins higher than 2 in addition to
the\nenergy-momentum tensor. These include the non-linear $W_N$ algebras; the linear\n$W_\\infty$ and $W_{1+\\infty}$ algebras; and their super-
extensions. We discuss\ntheir applications to the construction of $W$-gravity and $W$-string theories.'}

{'id': 'hep-th/9112014v1', 'published': '1991-12-06T21:02:55Z', 'authors': ['H. Lu', 'C. N. Pope', 'X. J. Wang', 'K. W. Xu'], 'title': 'N=2
Superstrings with (1,2m) Spacetime Signature', 'summary': "We show that the $N=2$ superstring in $d=2D\\ge6$ real dimensions, with\ncriticality
achieved by including background charges in the two real time\ndirections, exhibits a ``coordinate-freezing'' phenomenon, whereby the momentum\nin one
of the two time directio

### Restricting number of papers

If the number of papers is greater than 100, we skip some papers in order to bring the number of papers below 100. This is because gemini-1.5-pro does not allow more than 32,000 tokens as input per minute for free-tier. So we need to decrease the number of papers.


In [None]:
if len(output) > 100:
    skip_size = math.ceil(len(output) / 100)
    output_used = output[::skip_size]
else:
    output_used = output

print(f"Total number of papers: {len(output)}\nNumber of papers used: {len(output_used)}")

output_str =  "".join([str(dic)+"\n\n" for dic in output_used])

Total number of papers: 296
Number of papers used: 99


# Getting the Career Summary of the Author

We now invoke the langchain chain we previously created and pass the author's name as well as the papers (id, date of publish, authors (names), title, and summary). The model then returns a summary of the author's career so far and what topics he/she has been interested in as the years pass.

In [None]:
llm_result = chain.invoke({"Prompt": f"Can you tell me about the the interests of {full_name} ({alias}).", "Context": output_str})
print("\n".join(f"{key} = {llm_result.usage_metadata[key]}" for key in llm_result.usage_metadata.keys()))

input_tokens = 30242
output_tokens = 2042
total_tokens = 32284
input_token_details = {'cache_read': 0}


In [None]:
from IPython.display import Markdown, display
display(Markdown(llm_result.content))

Christopher N. Pope's research interests have evolved over time, starting with a focus on W-strings and W-gravity in the early 1990s, then shifting towards supergravity, string theory, M-theory, and black hole physics.

Initially, Pope's work centered on W-algebras and their applications to string theory and gravity [29, 30, 31, 32, 33, 34]. He investigated the physical spectra, interactions, and BRST operators of W-strings, exploring higher-spin generalizations of string theory.

By the mid-1990s, his focus shifted towards supergravity, string theory, and M-theory. He explored p-brane solutions, their classification, and their relation to cosmology [25, 26, 27, 28]. He also investigated discrete states in W-strings and their connection to minimal models.

From the late 1990s onwards, Pope's research predominantly explored supergravity, black holes, and related topics. He worked on embedding AdS black holes in higher dimensions [21], consistent sphere reductions of supergravity theories [19, 20, 22, 23], and the construction of new Einstein-Sasaki metrics [17, 18]. He also investigated the thermodynamics of black holes, supersymmetric limits, and topological solitons [16].

In the 2000s, Pope continued his work on black holes, supergravity, and string theory, studying topics such as AdS/CFT correspondence [15], consistent warped-space Kaluza-Klein reductions [13], and brane-world Kaluza-Klein reductions [12]. He also explored metrics with vanishing quantum corrections [9], time-dependent multi-center solutions [8], and Bohm and Einstein-Sasaki metrics [6].

More recently, Pope's research has delved into topics such as consistent truncations and dualities [3], generalized dualities and supergroups [2], and perturbations of black holes in Einstein-Maxwell-Dilaton theories [1]. He has also investigated the tower of subleading dual BMS charges [5] and the mass of dyonic black holes and entropy super-additivity [4].  His work demonstrates a consistent exploration of the interplay between gravity, string theory, and M-theory, with a particular emphasis on black hole physics and related mathematical structures.

**References**

[1] C. N. Pope, D. O. Rohrer, and B. F. Whiting. *On The Perturbations of Gibbons-Maeda Black Holes in Einstein-Maxwell-Dilaton Theories*. 2024.

[2] Daniel Butter, Falk Hassler, Christopher N. Pope, and Haoyu Zhang. *Generalized Dualities and Supergroups*. 2023.

[3] Daniel Butter, Falk Hassler, Christopher N. Pope, and Haoyu Zhang. *Consistent Truncations and Dualities*. 2022.

[4] Wei-Jian Geng, Blake Giant, H. Lu, and C. N. Pope. *Mass of Dyonic Black Holes and Entropy Super-Additivity*. 2018.

[5] Hadi Godazgar, Mahdi Godazgar, and C. N. Pope. *Tower of subleading dual BMS charges*. 2018.

[6] G. W. Gibbons, S. A. Hartnoll, and C. N. Pope. *Bohm and Einstein-Sasaki Metrics, Black Holes and Cosmological Event Horizons*. 2002.

[7] S. Cremonini, M. Cvetic, C. N. Pope, and A. Saha. *Long-Range Forces Between Non-Identical Black Holes With Non-BPS Extremal Limits*. 2022.

[8] G. W. Gibbons and C. N. Pope. *Time-Dependent Multi-Centre Solutions from New Metrics with Holonomy Sim(n-2)*. 2007.

[9] A. A. Coley, G. W. Gibbons, S. Hervik, and C. N. Pope. *Metrics With Vanishing Quantum Corrections*. 2008.

[10] M. Cvetic, Xing-Hui Feng, H. Lu, and C. N. Pope. *Rotating Solutions in Critical Lovelock Gravities*. 2016.

[11] Arash Azizi, Hadi Godazgar, Mahdi Godazgar, and C. N. Pope. *The Embedding of Gauged STU Supergravity in Eleven Dimensions*. 2016.

[12] M. Cvetic, H. Lu, C. N. Pope, and T. A Tran. *S^3 and S^4 Reductions of Type IIA Supergravity*. 2000.

[13] M. Cvetic, H. Lu, and C. N. Pope. *Consistent Warped-Space Kaluza-Klein Reductions, Half-Maximal Gauged Supergravities and CP^n Constructions*. 2000.

[14] M. Cvetic, G. W. Gibbons, H. Lu, and C. N. Pope. *Rotating Black Holes in Gauged Supergravities; Thermodynamics, Supersymmetric Limits, Topological Solitons and Time Machines*. 2005.

[15] G. W. Gibbons and C. N. Pope. *Kohn's Theorem, Larmor's Equivalence Principle and the Newton-Hooke Group*. 2010.

[16] M. Cvetic, G. W. Gibbons, H. Lu, and C. N. Pope. *New Einstein-Sasaki and Einstein Spaces from Kerr-de Sitter*. 2005.

[17] M. Cvetic, H. Lu, Don N. Page, and C. N. Pope. *New Einstein-Sasaki Spaces from Kerr-de Sitter*. 2005.

[18] H. Lu, C. N. Pope, and J. F. Vazquez-Poritz. *A New Construction of Einstein-Sasaki Metrics in D >= 7*. 2005.

[19] M. Cvetic, H. Lu, C. N. Pope, and K. S. Stelle. *Spherically Symmetric Solutions in Higher-Derivative Gravity*. 2015.

[20] H. Lu, C. N. Pope, and Zhao-Long Wang. *Pseudo-supersymmetry, Consistent Sphere Reduction and Killing Spinors for the Bosonic String*. 2011.

[21] M. Cvetic, M. J. Duff, P. Hoxha, James T. Liu, H. Lu, J. X. Lu, R. Martinez-Acosta, C. N. Pope, H. Sati, and T. A. Tran. *Embedding AdS Black Holes in Ten and Eleven Dimensions*. 1999.

[22] H. Lu and C. N. Pope. *Exact Embedding of N=1, D=7 Gauged Supergravity in D=11*. 1999.

[23] H. Lu, C. N. Pope, and T. A. Tran. *Five-dimensional N=4, SU(2) X U(1) Gauged Supergravity from Type IIB*. 1999.

[24] M. Cvetic, G. W. Gibbons, H. Lu, and C. N. Pope. *Consistent SO(6) Reduction Of Type IIB Supergravity on S^5*. 2000.

[25] H. Lu, C. N. Pope, and J. Rahmfeld. *A Construction of Killing Spinors on S^n*. 1998.

[26] H. Lu and C. N. Pope. *p-brane Taxonomy*. 1997.

[27] H. Lu, S. Mukherji, and C. N. Pope. *From p-branes to Cosmology*. 1996.

[28] H. Lu and C. N. Pope. *SL(N+1,R) Toda Solitons in Supergravities*. 1996.

[29] C. N. Pope. *W-Strings 93*. 1993.

[30] H. Lu, C. N. Pope, and X. J. Wang. *On Higher-spin Generalisations of String Theory*. 1993.

[31] H. Lu, C. N. Pope, S. Schrans, and X. J. Wang. *The Interacting $W_3$ String*. 1992.

[32] C. N. Pope, E. Sezgin, K. S. Stelle, and X. J. Wang. *Discrete States in the $W_3$ String*. 1992.

[33] C. N. Pope. *Review of W Strings*. 1992.

[34] H. Lu, C. N. Pope, S. Schrans, and X. J. Wang. *On Sibling and Exceptional W Strings*. 1992.


