# This notebook, I will work on following:
* Import the News urls using Finnhub API. 
* Chat with the news. 

In [3]:
from financial_data import FinancialDataFinnHub
import os

In [4]:
# Create Finnhub object
symbol = "NVDA"
api_key = os.environ.get("FINHUB_API_KEY")
nvda = FinancialDataFinnHub(symbol, api_key)

## Load News then chat about news:
* Use `FinancialDataFinnHub`class to load news url
* langchain `UnstructuredURLLoader` to load news
* langchain `RecursiveCharacterTextSplitter` to split the data
* Remove unreachable news from the list
* Create `OpenAIEmbeddings` model
* Split text with `RecursiveCharacterTextSplitter`
* Store vectors in `Chroma`
* Create prompt template with `PromprTemplate`
* Create llm chain and chat with News. Look `basics.py`and `index_vectorstore_index_creation.ipynb`


In [9]:
# Use FinancialHub to load news url
news_urls = nvda.company_news['url']
urls = [news_urls.iloc[i][0] for i in range(len(news_urls))]

In [10]:
# langchain UnstructuredURLLoader to load news
from langchain.document_loaders import UnstructuredURLLoader

loader = UnstructuredURLLoader(urls=urls)
data = loader.load()

Error fetching or processing https://finnhub.io/api/news?id=69af8d0a9004b53912dfc61cf5100e5c313e7da4cfbf550471ef061c50ad13ac, exeption: URL return an error: 404
Error fetching or processing https://finnhub.io/api/news?id=ac10ffbe74f9a93d4e019463841cc7dcbec04effa443610b76d2bc1624cd3b7e, exeption: URL return an error: 404
Error fetching or processing https://finnhub.io/api/news?id=6195fe84efcd9d477a4f5c19ecf48cf4d8f7228decd2ff0f6822c11a8cbbb704, exeption: URL return an error: 500
Error fetching or processing https://finnhub.io/api/news?id=842a0c69f973e4b40922ddf2783ec3db5b7a16469319696f2a3b6ee0f3c9f440, exeption: URL return an error: 500
Error fetching or processing https://finnhub.io/api/news?id=0acbb2b2acb7bcfd5229414864a9981deb34deec0c6c48b8a44554807a282bc1, exeption: URL return an error: 500
Error fetching or processing https://finnhub.io/api/news?id=f159c759cdc5611c75ed35cdc587e2c0ad2eb79a31e5405bb269b725bd63778e, exeption: URL return an error: 500
Error fetching or processing https

In [11]:
data

[Document(page_content='Javascript is Disabled\n\nYour current browser configuration', metadata={'source': 'https://finnhub.io/api/news?id=2468f842ecf674796b71f2b6dfcbddb33c1231900ed4a7008a20ff81fb163fc1'}),
 Document(page_content='Skip Navigation\n\nwatchlive\n\nMarkets\n\nPre-Markets\n\nU.S. Markets\n\nCurrencies\n\nCryptocurrency\n\nFutures & Commodities\n\nBonds\n\nFunds & ETFs\n\nBusiness\n\nEconomy\n\nFinance\n\nHealth & Science\n\nMedia\n\nReal Estate\n\nEnergy\n\nClimate\n\nTransportation\n\nIndustrials\n\nRetail\n\nWealth\n\nLife\n\nSmall Business\n\nInvesting\n\nPersonal Finance\n\nFintech\n\nFinancial Advisors\n\nOptions Action\n\nETF Street\n\nBuffett Archive\n\nEarnings\n\nTrader Talk\n\nTech\n\nCybersecurity\n\nEnterprise\n\nInternet\n\nMedia\n\nMobile\n\nSocial Media\n\nCNBC Disruptor 50\n\nTech Guide\n\nPolitics\n\nWhite House\n\nPolicy\n\nDefense\n\nCongress\n\nEquity and Opportunity\n\nCNBC TV\n\nLive TV\n\nLive Audio\n\nBusiness Day Shows\n\nEntertainment Shows\n\nFu

In [12]:
# Remove news if the content is not reacheable. 
for i, d in enumerate(data):
    p_content = d.page_content[:22]
    if p_content == 'Javascript is Disabled':
        data.pop(i)
print(f"You got {len(data)} news to read about {symbol}. ")

You got 21 news to read about NVDA. 


In [13]:
# Create `OpenAIEmbeddings` model
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [14]:
# Split text with RecursiveCharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(data)

In [84]:
docs

[Document(page_content="Skip Navigation\n\nwatchlive\n\nMarkets\n\nPre-Markets\n\nU.S. Markets\n\nCurrencies\n\nCryptocurrency\n\nFutures & Commodities\n\nBonds\n\nFunds & ETFs\n\nBusiness\n\nEconomy\n\nFinance\n\nHealth & Science\n\nMedia\n\nReal Estate\n\nEnergy\n\nClimate\n\nTransportation\n\nIndustrials\n\nRetail\n\nWealth\n\nLife\n\nSmall Business\n\nInvesting\n\nPersonal Finance\n\nFintech\n\nFinancial Advisors\n\nOptions Action\n\nETF Street\n\nBuffett Archive\n\nEarnings\n\nTrader Talk\n\nTech\n\nCybersecurity\n\nEnterprise\n\nInternet\n\nMedia\n\nMobile\n\nSocial Media\n\nCNBC Disruptor 50\n\nTech Guide\n\nPolitics\n\nWhite House\n\nPolicy\n\nDefense\n\nCongress\n\nEquity and Opportunity\n\nCNBC TV\n\nLive TV\n\nLive Audio\n\nBusiness Day Shows\n\nEntertainment Shows\n\nFull Episodes\n\nLatest Video\n\nTop Video\n\nCEO Interviews\n\nCNBC Documentaries\n\nCNBC Podcasts\n\nCNBC World\n\nDigital Originals\n\nLive TV Schedule\n\nWatchlist\n\nInvesting Club\n\nTrust Portfolio\n\nAn

In [15]:
# Store vectors in Chroma 
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(docs, embeddings)

Using embedded DuckDB without persistence: data will be transient


#### Create prompt template and add to the chain. 

# ConversationalRetrievalChain:
https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html?highlight=ConversationalRetrievalChain#conversationalretrievalchain-with-search-distance

https://python.langchain.com/en/latest/reference/modules/chains.html?highlight=ConversationalRetrievalChain#langchain.chains.ConversationalRetrievalChain

Check Mayo's code makechain.ts. Bottom url will show how it works in python

In [26]:
# Create prompt template with PromprTemplate
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain

chat = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
cd_template = """Given the following conversation and a follow up question, rephras the follow up question to be a standalone
question. 

Chat History:
{chat_history}
Follow Up Input: {question}
standalone question:"""
CONDENSE_PROMPT = PromptTemplate(template=cd_template, 
                                           input_variables=["chat_history", "question"])

qa_prompt = """You are AI Entertainer and brilliant in economics and finance. Please use the following pieces of context 
to answerthe question at the end. You must make some kind of conclusion and DO NOT give me a answer that YOU DON'T KNOW. 
Your answer is NOT the advise and the porpose is for your opinion is entertainment use.

{context}
Question: {question}
Your Answer: 
"""

QA_PROMPT = PromptTemplate(template=qa_prompt, 
                                    input_variables=['context', 'question'])

# Create llm chain
chain = ConversationalRetrievalChain.from_llm(llm=chat, 
                                             retriever=vectorstore.as_retriever(),
                                             condense_question_prompt=CONDENSE_PROMPT,
                                             qa_prompt=QA_PROMPT, 
                                             )


In [27]:
# Chat with News
chat_history = []
question = "What do you see the company from the news?"
result = chain({"question": question, "chat_history": chat_history})

In [28]:
result

{'question': 'What do you see the company from the news?',
 'chat_history': [],
 'answer': "Based on the news, I see that several companies have made headlines for their performance in the stock market. Nvidia received a double upgrade from HSBC and could potentially extend its rally even further. Chubb was upgraded to buy from neutral by Citi, while PowerSchool Holdings was upgraded to buy from neutral by Goldman Sachs. Lockheed Martin beat Wall Street's expectations in the first quarter and reaffirmed its full-year guidance. Johnson & Johnson reported strong first quarter sales and raised its 2023 guidance midpoints. Bank of America topped first-quarter expectations on the top and bottom lines, while Sunrun was upgraded by KeyBanc and could potentially rally more than 31% from Monday's close. Overall, it seems that these companies are performing well in their respective industries."}