reference:
- https://github.com/benitomartin/Milvus-Agentic-RAG-Claude/blob/main/Agentic_RAG_Using_Claude_3_5_Sonnet%2C_LlamaIndex%2C_and_Milvus.ipynb

- https://zilliz.com/blog/a-beginners-guide-to-using-llama-3-with-ollama-milvus-langchain

- https://github.com/tspannhw/AIM-NYCStreetCams/blob/main/MultipleVectorsAdvanced%20SearchDataModelDesign/streetcamsrag.ipynb

- https://github.com/dhivyeshrk/Retrieval-Augmented-Generation-for-news

- https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo/


In [2]:
import openai
from llama_index.llms.openai import OpenAI
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore

In [2]:

openai.api_key = "key" # change to your key

### Read files to documents

In [3]:
# change to your wsj dir
documents = SimpleDirectoryReader(
    input_dir="./data/wsj_weekday"
).load_data()

In [7]:
print("Document ID 0:", documents[0].doc_id)
print("Document ID 1:", documents[1].doc_id)

Document ID 0: 8ac88bf6-91de-47cb-82b6-214fadcdcc84
Document ID 1: 3c66273e-5bf2-4007-b235-0cac834802c3


### Data indexing

In [8]:
vector_store = MilvusVectorStore(uri="./milvus_wsj.db", dim=1536, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

### Query

In [None]:
# from llama_index.llms.openai import OpenAI
# llm = OpenAI(model="gpt-3.5-turbo")
# llm = OpenAI(model="gpt-4o-mini", strict=True)

In [14]:
# llm = OpenAI(model="gpt-4o")
llm = OpenAI(model="gpt-4o")

query_engine = index.as_query_engine(similarity_top_k=5, llm=llm)
res = query_engine.query("""
### role ###
You are a great and helpful finance economist. You can use all the news on 2019-12-27 and before this date.
### task ###
can you explain and analysis the market condition (0,1,2,3,4) on 2019-12-27, give five reference of news.
### output format ###
date:yyyy-mm-dd
Group: <market condition on the date>
market trend analysis: <why it is this market condition? how to conclude to this condition?>
news reference: [[<news 1 title>, <news 1 url>], [<news 2 title>, <news 2 url>], ....]
### output ###
""")
print(res)

date: 2019-12-27
Group: 0
market trend analysis: The market condition on 2019-12-27 is classified as 0, indicating a neutral or stable market environment. This conclusion can be drawn from various factors such as steady economic indicators, balanced investor sentiment, and lack of significant market-moving events. The news references below highlight a mix of economic activities and central bank policies that suggest a stable market without extreme volatility or directional bias.

news reference: 
1. ["Quantitative Easing, a Decade Later", "https://www.wsj.com/articles/quantitative-easing-a-decade-later-1533032573"]
2. ["Derby’s Take: K.C. Fed Digs Into What’s Really Driving Hawks and Doves", "https://www.wsj.com/articles/derbys-take-k-c-fed-digs-into-whats-really-driving-hawks-and-doves-1533032519"]
3. ["BP Earnings Boosted by Higher Oil Prices", "https://www.wsj.com/articles/bp-posts-sharp-rise-in-profit-1533020146"]
4. ["Turkey’s Central Bank Raises 2018 Inflation Forecast", "https:/

In [None]:
# change to model you want
llm = OpenAI(model="gpt-4o")

# change the sampling params
query_engine = index.as_query_engine(similarity_top_k=10, llm=llm)
res = query_engine.query("""
### role ###
You are a great and helpful finance economist. You can use all the news in milvus_wsj_left.db.
### task ###
can you explain and analysis the market condition (0,1,2,3,4), give five reference of news as example to explain the condition.
### output ###
""")
print(res)

# Save the result to a text file named 'llm_response.txt'
with open("llm_response.txt", "w") as file:
    file.write(str(res))  # Convert res to string in case it's not already a string

In [None]:
# change to model you want
llm = OpenAI(model="gpt-4o")

# change the sampling params
query_engine = index.as_query_engine(similarity_top_k=10,response_mode="tree_summarize",
    verbose=True,llm=llm)
res = query_engine.query("""
### Role ###
You are a highly knowledgeable and helpful finance economist specializing in the US equity market. You have access to all relevant news with matched regime groups,please reference on US-specific news accross different time span.

### Task ###
Please note the following:
- Important information is enclosed within `<>`.
- Fill in the details in the provided placeholders `()`.
- Market Regime is defined as: a period of time when a market's conditions remain similar and fluctuate, affecting key investing factors. These factors include risk/return relationships, correlations, and volatilities
Your task is to analyze and explain all the market regimes, represented by a regime number (0, 1, 2, 3, 4). For each market regime, provide 5 pieces of <US Market-related News> as evidence to support your analysis, Use as many news as necessary. 

### Output Format Example ###
**Market Regime:** 0
**Summary of Market Regime:** (In less than 3 sentences)
**Evidence:** (Provide one paragraph with the selected news as evidence to support your reasoning behind the identified market regime.)
**References / Links and date to the News:** ()

**Market Regime:** 1
**Summary of Market Regime:** (In less than 3 sentences)
**Evidence:** (Provide one paragraph with the selected news as evidence to support your reasoning behind the identified market regime.)
**References / Links and date to the News:** ()

**Market Regime:** 2
**Summary of Market Regime:** (In less than 3 sentences)
**Evidence:** (Provide one paragraph with the selected news as evidence to support your reasoning behind the identified market regime.)
**References / Links and date to the News:** ()

**Market Regime:** 3
**Summary of Market Regime:** (In less than 3 sentences)
**Evidence:** (Provide one paragraph with the selected news as evidence to support your reasoning behind the identified market regime.)
**References / Links and date to the News:** ()

**Market Regime:** 4
**Summary of Market Regime:** (In less than 3 sentences)
**Evidence:** (Provide one paragraph with the selected news as evidence to support your reasoning behind the identified market regime.)
**References / Links and date to the News:** ()

### Output ###
""")
print(res)

# Save the result to a text file named 'llm_response.txt'
with open("llm_response/tree_sum_1.txt", "w") as file:
    file.write(str(res))  # Convert res to string in case it's not already a string