# Paper Savior with LionAGI and LlamaIndex Vector Index

-- how to do auto explorative research with LionAGI plus RAG using llamaindex Vector Index & embedding 

- [LionAGI](https://github.com/lion-agi/lionagi)
- [LlamaIndex](https://www.llamaindex.ai)

In [1]:
# %pip install lionagi llama_index

In [2]:
query = 'Large Language Model Time Series Analysis'
dir = "data/log/researcher/"
num_papers = 20

### 1. Build a Vector Index with llama_index

In [3]:
from llama_index import download_loader, ServiceContext, VectorStoreIndex
from llama_index.llms import OpenAI
from llama_index.node_parser import SentenceSplitter

In [4]:
ArxivReader = download_loader("ArxivReader")
loader = ArxivReader()
node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)

# let us download some papers from arvix
documents, abstracts = loader.load_papers_and_abstracts(search_query=query, 
                                                        max_results=num_papers)
nodes = node_parser.get_nodes_from_documents(documents, show_progress=False)

# set up index object
llm = OpenAI(temperature=0.1, model="gpt-4-1106-preview")
service_context = ServiceContext.from_defaults(llm=llm)
index1 = VectorStoreIndex(nodes, include_embeddings=True, 
                          service_context=service_context)

# set up query engine
query_engine = index1.as_query_engine(include_text=False, 
                                      response_mode="tree_summarize")

In [5]:
len(abstracts)

20

### 2. Write a tool description according to OpenAI schema

In [6]:
import lionagi as li

In [7]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "query_arxiv_papers",
            "description": """
                           Perform a query to a QA bot with access to an 
                           index built with papers from arxiv
                          """,
            "parameters": {
                "type": "object",
                "properties": {
                    "str_or_query_bundle": {
                        "type": "string",
                        "description": "a question to ask the QA bot",
                    }
                },
                "required": ["str_or_query_bundle"],
            },
        }
    }
]

# we will need to register both the function description 
# and actual implementation
func = query_engine.query

### 3. Research: PROMPTS

#### FORMATS

In [14]:
# a rigidly set up prompt can help make outcome more deterministic
# though any string will work as well. 
system = {
    "persona": "a helpful world-class researcher",
    "requirements": """
              think step by step before returning a clear, precise 
              worded answer with a humble yet confident tone
          """,
    "responsibilities": f"""
              you are asked to help with researching on the topic 
              of {query}
          """,
    "tools": "provided with a QA bot for grounding responses"
}

# similarly, we can pass in any string or dictionary to instruction
# here we are modifying model behavior by telling mdel how to output 
deliver_format1 = {"return required": "yes", "return format": "paragraph"}

deliver_format2 = {"return required": "yes", 
    "return format": { 
        "json_mode": {
            'paper': "paper_name",
            "summary": "...", 
            "research question": "...", 
            "talking points": {
                "point 1": "...",
                "point 2": "...",
                "point 3": "..."
            }}}}
            
function_call = {
    "notice":f"""
        At each task step, identified by step number, you must use the tool 
        at least twice. Notice you are provided with a QA bot as your tool, 
        the bot has access to the {num_papers} papers via a queriable index 
        that takes natural language query and return a natural language 
        answer. You can decide whether to invoke the function call, you will 
        need to ask the bot when there are things need clarification or 
        further information. you provide the query by asking a question, 
        please use the tool as extensively as you can.
       """
    }

# here we create a two step process imitating the steps human would take to 
# perform the research task
instruct1 = {
    "task step": "1", 
    "task name": "read paper abstracts", 
    "task objective": "get initial understanding of the papers of interest", 
    "task description": """
            provided with abstracts of paper, provide a brief summary 
            highlighting the paper core points, the purpose is to extract 
            as much information as possible
          """,
    "deliverable": deliver_format1
}


instruct2 = {
    "task step": "2",
    "task name": "propose research questions and talking points", 
    "task objective": "initial brainstorming", 
    "task description": """
          from the improved understanding of the paper, please propose 
          an interesting, unique and practical research question, 
          support your reasoning. Kept on asking questions if things are 
          not clear. 
        """,
    "deliverable": deliver_format2,
    "function calling": function_call
}

### 4. Research: Setup Workflow

In [None]:
abstracts = [x.text for x in abstracts]

In [24]:
async def read_propose(context, num=5):
    researcher = li.Session(system, dir=dir)
    researcher.register_tools(tools, func)
    
    await researcher.initiate(instruct1, context=context, temperature=0.7)
    await researcher.auto_followup(instruct2, tools=tools, 
                                   num=num, tool_parser=lambda x: x.response)
    
    researcher.messages_to_csv()
    researcher.log_to_csv()
    return researcher.conversation.messages[-1]['content']

### 5. Research: Run the workflow

In [25]:
out1 = await li.al_call(abstracts[8:13], read_propose)

8 logs saved to data/log/researcher/_messages_2023-12-16T11_24_55_027803.csv
3 logs saved to data/log/researcher/_llmlog_2023-12-16T11_24_55_029132.csv
11 logs saved to data/log/researcher/_messages_2023-12-16T11_24_55_030996.csv
4 logs saved to data/log/researcher/_llmlog_2023-12-16T11_24_55_031735.csv
11 logs saved to data/log/researcher/_messages_2023-12-16T11_24_55_033999.csv
4 logs saved to data/log/researcher/_llmlog_2023-12-16T11_24_55_034734.csv
11 logs saved to data/log/researcher/_messages_2023-12-16T11_25_10_222276.csv
4 logs saved to data/log/researcher/_llmlog_2023-12-16T11_25_10_222887.csv
11 logs saved to data/log/researcher/_messages_2023-12-16T11_25_12_319998.csv
4 logs saved to data/log/researcher/_llmlog_2023-12-16T11_25_12_320602.csv


In [26]:
from IPython.display import Markdown

Markdown(out1[0])

Based on the understanding of the paper and the additional information gathered, we can formulate the following research question and talking points:

```json
{
  "paper": "SentimentArcs: A Novel Method for Self-Supervised Sentiment Analysis of Time Series Shows SOTA Transformers Can Struggle Finding Narrative Arcs",
  "summary": "SentimentArcs is a self-supervised sentiment analysis methodology for time series data, addressing limitations of traditional sentiment analysis models such as overfitting and poor generalization. It uses an ensemble of models for synthetic ground truth generation and novel metrics for joint optimization, coupled with visualizations for domain experts to analyze narrative arcs.",
  "research question": "How can the integration of domain expert human-in-the-loop in SentimentArcs methodology be enhanced to improve the accuracy and efficiency of sentiment analysis in complex narrative texts?",
  "talking points": {
    "point 1": "Exploring the potential for automated suggestions or guidelines to assist human experts in interpreting narrative arcs and reducing subjective bias.",
    "point 2": "Investigating the scalability of SentimentArcs in processing and analyzing massive datasets while maintaining high levels of accuracy and involving human experts.",
    "point 3": "Assessing the adaptability of SentimentArcs to different genres or styles of narratives, such as non-fiction or technical documents, where sentiment may be expressed differently."
  }
}
```

These research questions and talking points are designed to delve deeper into the capabilities and potential enhancements of the SentimentArcs methodology. They consider the practicality of incorporating human expertise more efficiently and the possible expansion of the methodology to diverse narrative forms.

In [27]:
Markdown(out1[1])

Based on the understanding that the pyunicorn package can be applied in various fields for analyzing time series data and its specific mention of neuroscience as an application area, here's a research question and talking points:

```json
{
  "paper": "Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package",
  "summary": "The paper discusses the pyunicorn package, which is designed for analyzing complex systems through the construction and analysis of functional networks from time series data. The package combines complex network theory with nonlinear time series analysis and is applicable in various fields, including climatology and neuroscience.",
  "research question": "How can the pyunicorn package enhance the understanding of neuroplasticity and the brain's adaptation mechanisms through the analysis of functional brain networks over time?",
  "talking points": {
    "point 1": "Investigating the adaptability of functional brain networks in different cognitive states or in response to various stimuli using the pyunicorn package.",
    "point 2": "Exploring the potential of pyunicorn to identify early biomarkers of neurological disorders through the nuanced analysis of brain network dynamics.",
    "point 3": "Assessing the efficacy of rehabilitation methods on brain network reorganization in patients with brain injuries or neurodegenerative diseases by longitudinal studies using pyunicorn."
  }
}
```

The proposed research question is interesting because it attempts to leverage the analytical strength of pyunicorn in understanding dynamic processes within the brain, such as neuroplasticity. The talking points support this by considering practical applications in cognitive science, clinical diagnosis, and treatment efficacy assessment, which are areas of significant interest in neuroscience research. If further clarification or expansion on these points is needed, we could consult the QA bot for more detailed applications or methods within pyunicorn that might be particularly useful for these types of studies.

In [28]:
Markdown(out1[2])

Based on the limitations highlighted by the TimelyGPT study, an intriguing research question might be:

**Research Question:** How can time-series transformers be further optimized to handle the multi-scale features of complex time-series data, particularly in scenarios involving irregular sampling and non-stationary signals?

Supporting this question, the reasoning would be that addressing this gap could lead to significant improvements in the utility of transformers in real-world applications such as financial markets, weather forecasting, and patient health monitoring, where time-series data is often non-uniform and exhibits multi-scale characteristics.

**Talking Points:**
- **Point 1:** Investigate the potential of hybrid architectures that combine the strengths of transformers with other neural network paradigms, such as convolutional or recurrent layers, to enhance the model's ability to handle multi-scale features and irregular sampling.
- **Point 2:** Explore advanced techniques for positional encoding or embedding to preserve temporal information more effectively, especially in long sequences where traditional positional encodings may fail.
- **Point 3:** Conduct rigorous testing on diverse and large datasets, including those with non-stationary and irregularly sampled data, to assess the robustness and scalability of the proposed models.

Before finalizing these points, let's use the QA bot tool to clarify two aspects:
1. The effectiveness of hybrid architectures in current time-series transformer models.
2. The current state of positional encoding techniques in time-series analysis and their limitations.

Shall we proceed with querying the QA bot for this information?

In [29]:
Markdown(out1[3])

Based on the insights gained from both the summary of the paper and the responses from the function calls, here is a proposed research question with supporting talking points:

```json
{
  "paper": "Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook",
  "summary": "This paper conducts a survey on the use of large models for time series and spatio-temporal data analysis, examining the prevalent data types, model categories, scopes, and application areas. It identifies the need for enhanced pattern recognition, reasoning, and the potential for artificial general intelligence in this domain. The paper categorizes research into two main clusters (LM4TS and LM4STD) and provides an array of resources for practitioners.",
  "research question": "How can theoretical analysis of large models specifically be improved to better understand and capture long-term dependencies in time series and spatio-temporal data sequences?",
  "talking points": {
    "point 1": "Investigate new architectural innovations or modifications to existing large models that could enhance their ability to model long-term dependencies more effectively.",
    "point 2": "Explore the integration of self-supervised learning methods that could lead to better pre-training objectives, such as time-frequency consistency or mask-then-reconstruct strategies, to improve model transparency and reliability.",
    "point 3": "Develop theoretical frameworks to improve interpretability and explainability, which would allow practitioners to better understand model predictions and facilitate more informed decision-making."
  }
}
```

This research question focuses on the theoretical underpinnings of large models, which is a current gap in the field. The talking points suggest practical steps towards addressing this gap, such as pursuing architectural changes, self-supervised learning, and theoretical frameworks for interpretability. This direction could enhance the performance of large models on time series and spatio-temporal data and make them more useful and trustworthy for practitioners.

In [30]:
Markdown(out1[4])

Based on the information we've gathered, let's formulate the following research question and talking points:

Research Question:
- How can the ESAN model be adapted to maintain predictive accuracy for stock prices well beyond the 19-month limitation identified in the paper, and what methodologies might be employed to adjust for the "hot words" phenomenon to ensure long-term relevance?

Talking Points:
1. **Model Adaptability**: Investigate the potential of incorporating dynamic word embedding updates or contextual relevance adjustments to handle the "hot words" phenomenon in financial texts. This involves exploring how the model could self-adjust and learn which terms are gaining or losing predictive power over time.
   
2. **Extended Predictive Horizon**: Examine the feasibility of using ensemble methods or additional temporal features to extend the forecasting capabilities of the ESAN model beyond the current 19-month timeframe. This would involve research into how different data sources and features could be combined to provide a more robust prediction for the long term.

3. **Combining Strategies**: Discuss the application and potential improvements of the four strategies (single model, mean combinatory, concatenating combinatory, and exponential combinatory) proposed by the authors for timeliness issues. This point includes an analysis of how these strategies can be optimized or new strategies developed to enhance the model's performance for far-future predictions.

JSON Format Deliverable:
```json
{
  "paper": "ESAN: Efficient Sentiment Analysis Network of A-Shares Research Reports for Stock Price Prediction",
  "summary": "The ESAN model integrates an NLP module with a time-series forecasting model to predict stock prices. It utilizes RoBERTa for sentiment analysis and combines industry and transaction information with sentiment analysis outputs. The model shows a significant correlation between predictions and actual return rates but has limitations in predicting far-future stock prices beyond 19 months.",
  "research question": "How can the ESAN model be adapted to maintain predictive accuracy for stock prices well beyond the 19-month limitation identified in the paper, and what methodologies might be employed to adjust for the 'hot words' phenomenon to ensure long-term relevance?",
  "talking points": {
    "point 1": "Investigate model adaptability through dynamic word embedding updates or contextual relevance adjustments to handle the 'hot words' phenomenon in financial texts.",
    "point 2": "Examine the feasibility of using ensemble methods or additional temporal features to extend the forecasting capabilities of the ESAN model beyond the 19-month timeframe.",
    "point 3": "Discuss the application and potential improvements of the four strategies proposed by the authors for addressing timeliness issues in predictions."
  }
}
```