Ideas we want to try

- [x] Try using CSV instead of JSON * <-- this is the priority. Let's investigate this first.
- [x] Chunking more intelligently -- maybe it makes sense to split data into smaller indexable chunks
  - [x] Split long each column? This might makes more sense if we want to track a particular number over time.

This sort of worked, but didn't really make such a huge difference. The biggest takeaway is re: chunking information.
It might might more sense to split these up into different "documents" (one for each column), which seems to have made it easier to retrieve specific information. Unfortunately, the big challenge is that now the agent needs to decide carefully how to query for the information. All of this points towards using SQL as being a better idea.
- [x] What about using agents?
- [ ] Does query planning work?
- [x] Does the ReAct planning framework work?

The ReACT planning framework works _extremely_ well.

Shows promise, but they seem to struggle with open-ended questions, or being able to call functions (aka retrieval operations using the built-in llama-index tools) correctly, sometimes not being able to retreive the right context.

- [ ] Does using metadata on the docs help with retrieval?

  - [ ] Split along date? This makes sense if we want to do a comprehensive analysis across all factors over time.
  - [ ] Maybe we should put this in a SQLite database instead? This makes a lot of sense -- then the agent can decide the best way of accessing the data in order to make a decision. This might be a really good way of side-stepping the chunking issue.
- [ ] Explore better [indexing approaches](https://archive.is/Pwz4C)
- [ ] Can we query better?
  - [ ] Retrieval strategies
- [ ] What about query planning?

## Notes
- Sometimes agents get in their own way (they try to calculate the result instead of looking it up -- maybe this can be fixed with prompts?
- Answering very specific data-retrieval questions work OK. It's when you're asking about analysis that things get complicated -- this is where I hope agents will do better
- I'm watching the function calling try to call "all" --> this implies SQL is just the way to go, imo.

In [6]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader

In [7]:
from openbb import obb

In [8]:
TICKER = "TSLA"

In [9]:
# Let's get some data
df = obb.equity.fundamental.cash(TICKER).to_df()
#df = obb.stocks.fa.cash(TICKER).to_df()
df.head()

Unnamed: 0_level_0,symbol,period,cik,net_income,depreciation_and_amortization,stock_based_compensation,deferred_income_tax,other_non_cash_items,accounts_receivables,inventory,...,other_working_capital,common_stock_issued,effect_of_forex_changes_on_cash,cash_at_beginning_of_period,cash_at_end_of_period,operating_cash_flow,capital_expenditure,free_cash_flow,link,final_link
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-12-31,TSLA,FY,1318605,-1062582000.0,1901050000.0,749024000.0,0.0,452359000.0,-496732000.0,-1023264000.0,...,-144903000,-14973000,-22700000,3964959000,4276388000,2097802000,-2319516000,-221714000,https://www.sec.gov/Archives/edgar/data/131860...,https://www.sec.gov/Archives/edgar/data/131860...
2019-12-31,TSLA,FY,1318605,-775000000.0,2154000000.0,898000000.0,0.0,477000000.0,-367000000.0,-429000000.0,...,743000000,848000000,8000000,4277000000,6783000000,2405000000,-1437000000,968000000,https://www.sec.gov/Archives/edgar/data/131860...,https://www.sec.gov/Archives/edgar/data/131860...
2020-12-31,TSLA,FY,1318605,862000000.0,2322000000.0,1734000000.0,433000000.0,408000000.0,-652000000.0,-422000000.0,...,328000000,12269000000,334000000,6783000000,19901000000,5943000000,-3242000000,2701000000,https://www.sec.gov/Archives/edgar/data/131860...,https://www.sec.gov/Archives/edgar/data/131860...
2021-12-31,TSLA,FY,1318605,5644000000.0,2911000000.0,2121000000.0,58000000.0,245000000.0,-130000000.0,-1709000000.0,...,979000000,0,-183000000,19901000000,18144000000,11497000000,-8014000000,3483000000,https://www.sec.gov/Archives/edgar/data/131860...,https://www.sec.gov/Archives/edgar/data/131860...
2022-12-31,TSLA,FY,1318605,12587000000.0,3747000000.0,1560000000.0,0.0,738000000.0,-1124000000.0,-6465000000.0,...,-2348000000,541000000,-444000000,18144000000,16924000000,14724000000,-7172000000,7552000000,https://www.sec.gov/Archives/edgar/data/131860...,https://www.sec.gov/Archives/edgar/data/131860...


In [10]:
from pathlib import Path

base_path = Path(f"financial_data/csv/{TICKER}")
base_path.mkdir(exist_ok=True)

In [11]:
# TODO: Automate all of this

# Cashflow

cash_statement_base_path = base_path / "cash_statement"
cash_statement_base_path.mkdir(exist_ok=True)

for col in df.columns:
    df[col].to_frame().to_csv(cash_statement_base_path / f"{TICKER}_cash_statement_{col}.csv")

In [13]:
df = obb.equity.fundamental.earnings("AAPL").to_df()

In [14]:
# Earnings
earnings_base_path = base_path / "earnings"
earnings_base_path.mkdir(exist_ok=True)

for col in df.columns:
    df[col].to_frame().to_csv(earnings_base_path / f"{TICKER}_earnings_{col}.csv")

In [15]:
df = obb.equity.fundamental.ratios("AAPL").to_df()

In [16]:
# Ratios
ratios_base_path = base_path / "ratios"
ratios_base_path.mkdir(exist_ok=True)

for col in df.columns:
    df[col].to_frame().to_csv(ratios_base_path / f"{TICKER}_ratios_{col}.csv")

In [28]:
# TODO: Write out the columns of the dataframe to make an easy "look-up" tool for the agent.

In [207]:
df_ratios.head()

Unnamed: 0_level_0,symbol,period,current_ratio,quick_ratio,cash_ratio,days_of_sales_outstanding,days_of_inventory_outstanding,operating_cycle,days_of_payables_outstanding,cash_conversion_cycle,...,price_earnings_ratio,price_to_free_cash_flows_ratio,price_to_operating_cash_flows_ratio,price_cash_flow_ratio,price_earnings_to_growth_ratio,price_sales_ratio,dividend_yield,enterprise_value_multiple,price_fair_value,calendarYear
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2012-09-29,AAPL,FY,1.495849,1.24075,0.278813,43.592532,3.286604,46.879136,87.982094,-41.102958,...,14.943146,15.043719,12.262512,12.262512,0.253274,3.984603,0.00399,10.473296,5.275546,2012
2013-09-28,AAPL,FY,1.678639,1.401507,0.326607,44.081476,6.039623,50.121098,76.580633,-26.459535,...,12.061028,10.018037,8.323786,8.323786,-1.198565,2.613681,0.023649,8.060214,3.615604,2013
2014-09-27,AAPL,FY,1.080113,0.824234,0.218194,54.350146,6.863787,61.213933,98.180441,-36.966508,...,15.518132,12.287002,10.267804,10.267804,1.167944,3.354147,0.018146,10.266002,5.49653,2014
2015-09-26,AAPL,FY,1.108771,0.892495,0.262002,47.387609,6.120288,53.507897,92.468716,-38.96082,...,12.36047,9.458209,8.121169,8.121169,0.286057,2.823845,0.017517,8.526397,5.529512,2015
2016-09-24,AAPL,FY,1.352669,1.220844,0.259271,49.592768,5.923304,55.516073,103.613369,-48.097297,...,13.496533,11.795396,9.367649,9.367649,-1.361389,2.859483,0.019704,9.686287,4.80796,2016


In [209]:
for col in df_ratios.columns:
    df_ratios[col].to_frame().to_csv(f"financial_data/csv/aapl_ratios_{col}.csv")

In [64]:
from llama_index import ServiceContext
from llama_index.llms import OpenAI

# Let's explore CSV

In [198]:
csv_documents = SimpleDirectoryReader(
    "financial_data/csv/"
).load_data()

In [199]:
index = VectorStoreIndex.from_documents(
    csv_documents,
    show_progress=True
)

Parsing documents into nodes:   0%|          | 0/47 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/47 [00:00<?, ?it/s]

In [200]:
retriever = index.as_retriever(similarity_top_k=5)
nodes = retriever.retrieve("Analyze the earnings across the board.")

for node in nodes:
    print(f"###### {node.metadata['file_path']} :: {node.score} #####")
    print(node.text)
    print("----------")

###### financial_data/csv/aapl_earnings_eps.csv :: 0.7726544831481565 #####
2012-07-24, 0.33
2012-10-25, 0.31
2013-01-23, 0.49
2013-04-23, 0.36
2013-07-23, 0.27
2013-10-28, 0.3
2014-01-27, 0.52
2014-04-23, 0.42
2014-07-22, 0.32
2014-10-20, 0.36
2015-01-27, 0.77
2015-04-27, 0.58
2015-07-21, 0.46
2015-10-27, 0.49
2016-01-26, 0.82
2016-04-26, 0.48
2016-07-26, 0.36
2016-10-25, 0.42
2017-01-31, 0.84
2017-05-02, 0.53
2017-08-01, 0.42
2017-11-02, 0.52
2018-02-01, 0.97
2018-05-01, 0.68
2018-07-31, 0.59
2018-11-01, 0.73
2019-01-29, 1.05
2019-04-30, 0.62
2019-07-30, 0.55
2019-10-30, 0.76
2020-01-28, 1.25
2020-04-30, 0.64
2020-07-30, 0.65
2020-10-29, 0.73
2021-01-27, 1.68
2021-04-28, 1.4
2021-07-27, 1.3
2021-10-28, 1.24
2022-01-27, 2.1
2022-04-28, 1.52
2022-07-28, 1.2
2022-10-27, 1.29
2023-02-02, 1.88
2023-05-04, 1.52
2023-08-03, 1.26
2023-11-02, 1.46
2023-11-02, 1.46
2024-02-01, nan
2024-05-02, nan
2024-08-01, nan
----------
###### financial_data/csv/aapl_earnings_revenue.csv :: 0.77102018668607

In [201]:
# Let's try a Chat engine
chat_engine = index.as_chat_engine(chat_mode="context", similarity_top_k=6)

Fun things to ask:
- Is aapl's net income growing?
- Is there anything notable about aaple's cash flow?
- Ask questions about what the agent tells you!

In [202]:
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.



Human:  Tell me about aapl's revenue.


Assistant: Apple Inc. (AAPL) is a multinational technology company that is known for its consumer electronics, software, and online services. Revenue is a key financial metric that reflects the total amount of money generated by a company's business activities.

The provided data shows Apple's revenue over a period of time. Here are some key points:

- In 2012, Apple's revenue ranged from $35.02 billion to $36.97 billion.
- From 2013 to 2014, Apple's revenue showed consistent growth, reaching $57.59 billion in Q1 2014.
- In 2015, Apple's revenue continued to increase, with Q1 2015 reaching $74.59 billion.
- In 2016, Apple's revenue fluctuated between $42.36 billion and $75.87 billion.
- In 2017, Apple's revenue ranged from $45.41 billion to $88.29 billion.
- In 2018, Apple's revenue ranged from $53.27 billion to $88.29 billion.
- In 2019, Apple's revenue ranged from $53.81 billion to $84.31 billion.
- In 2020, Apple's revenue ranged from $58.31 billion to $91.82 billion.
- In 2021, App

Human:  Tell me about appl's revenue. Don't give any additional explanation.


Assistant: Apple Inc. (AAPL) revenue:

2012-07-24: $35,023,000,000.0
2012-10-25: $35,966,000,000.0
2013-01-23: $54,512,000,000.0
2013-04-23: $43,603,000,000.0
2013-07-23: $35,323,000,000.0
2013-10-28: $37,472,000,000.0
2014-01-27: $57,594,000,000.0
2014-04-23: $45,646,000,000.0
2014-07-22: $37,432,000,000.0
2014-10-20: $42,123,000,000.0
2015-01-27: $74,599,000,000.0
2015-04-27: $58,010,000,000.0
2015-07-21: $49,605,000,000.0
2015-10-27: $51,501,000,000.0
2016-01-26: $75,872,000,000.0
2016-04-26: $50,557,000,000.0
2016-07-26: $42,358,000,000.0
2016-10-25: $46,852,000,000.0
2017-01-31: $78,351,000,000.0
2017-05-02: $52,896,000,000.0
2017-08-01: $45,408,000,000.0
2017-11-02: $52,579,000,000.0
2018-02-01: $88,293,000,000.0
2018-05-01: $61,137,000,000.0
2018-07-31: $53,265,000,000.0
2018-11-01: $62,900,000,000.0
2019-01-29: $84,310,000,000.0
2019-04-30: $58,015,000,000.0
2019-07-30: $53,809,000,000.0
2019-10-30: $64,040,000,000.0
2020-01-28: $91,819,000,000.0
2020-04-30: $58,313,000,000.0
2

Human:  Cite your data file sources.


Assistant: The data file sources for the revenue information of Apple Inc. (AAPL) are as follows:

- File: financial_data/csv/aapl_earnings_symbol.csv
- File: financial_data/csv/aapl_cash_statement_final_link.csv

Please note that the data provided is based on the available information and may not include the most recent financial figures. It is always recommended to refer to the latest financial reports and statements from reliable sources for the most accurate and up-to-date information.



Human:  Does apple look like a financially healthy company?


Assistant: Determining the financial health of a company involves analyzing various financial metrics and indicators. While I can provide some insights based on the available data, it is important to note that a comprehensive evaluation requires a deeper analysis of multiple factors.

Based on the revenue data provided, Apple Inc. (AAPL) has shown consistent growth in its revenue over the years. This indicates that the company has been able to generate increasing amounts of money from its business activities.

However, revenue alone does not provide a complete picture of a company's financial health. Other factors such as profitability, liquidity, debt levels, and cash flow need to be considered.

To assess Apple's financial health more comprehensively, it would be necessary to analyze additional financial statements such as the balance sheet and income statement, as well as key financial ratios such as profitability ratios (e.g., gross margin, net profit margin), liquidity ratios (e.g

Human:  exit


In [17]:
pip freeze

aiodns==3.1.1
aiofiles==22.1.0
aiohttp==3.8.6
aiosignal==1.3.1
aiosqlite==0.19.0
aiostream==0.5.2
alpha-vantage==2.3.1
altair==5.1.2
annotated-types==0.6.0
ansiwrap==0.8.4
anyio==3.7.1
appnope==0.1.3
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
ascii-magic==1.6
astroid==3.0.1
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.1.0
Babel==2.13.1
backcall==0.2.0
backoff==2.2.1
bandit==1.7.5
base58==2.1.1
black==23.11.0
bleach==6.1.0
blinker==1.7.0
bs4==0.0.1
bt==0.2.9
build==1.0.3
CacheControl==0.13.1
cachetools==5.3.2
cattrs==23.1.2
ccxt==4.1.44
certifi==2023.7.22
cffi==1.16.0
cfgv==3.4.0
chardet==5.2.0
charset-normalizer==3.3.2
cleo==2.1.0
click==8.1.7
colorama==0.4.6
comm==0.2.0
commonmark==0.9.1
contourpy==1.1.1
convertdate==2.4.0
coverage==7.3.2
crashtest==0.4.1
cryptography==41.0.5
cssselect==1.2.0
cycler==0.12.1
dataclasses-json==0.5.14
dateparser==1.1.8
DateTime==5.2
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
degiro-connector==2.0.21
Deprecate