# Leveraging LLMs for Stock Selection

## Problem Statement

**Challenge:** The challenge of analyzing vast amounts of unstructured textual data (e.g., news articles, financial reports, social media) to predict stock movements and identify opportunities.

**Opportunity:** By leveraging LLMs, Company X can make data-driven decisions faster and with greater accuracy.

## Install Libraries

In [None]:
from google.colab import drive
drive.mount('/content/drive')


# Connect to google drive, so we can get txt file in drive and in case we lost the file because we do not use persistent storage in Google Colab, We still have txt file that we can load again to continue the multiprocessing

import os
os.chdir('/content/drive/MyDrive/Headstarter')
print(os.listdir())

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
['chroma_langchain_db', 'company_tickers.json', 'successful_tickers.txt', 'unsuccessful.txt']


In [None]:
! pip install yfinance langchain_pinecone openai python-dotenv langchain-community sentence_transformers



In [None]:
pip install -qU "langchain-chroma>=0.1.2"

In [None]:
from langchain_pinecone import PineconeVectorStore
from openai import OpenAI
import dotenv
import json
import yfinance as yf
import concurrent.futures
from langchain_community.embeddings import HuggingFaceEmbeddings
from google.colab import userdata
from langchain.schema import Document
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
from pinecone import Pinecone
import numpy as np
import requests
import os
from langchain_chroma import Chroma


## Try using chromaDB

In [None]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")


  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

In [None]:
from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
    id=1,
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
    id=2,
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
    id=3,
)

In [None]:
documents = [
    document_1,
    document_2,
    document_3]

uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

['9e66acea-185d-4ab4-8331-0c010da01500',
 'e28284de-6df8-4c80-aca1-c8b1f2f37d08',
 '568c6323-96ee-4619-b36b-1b729a4cfad0']

In [None]:
documents

[Document(id='1', metadata={'source': 'tweet'}, page_content='I had chocolate chip pancakes and scrambled eggs for breakfast this morning.'),
 Document(id='2', metadata={'source': 'news'}, page_content='The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.'),
 Document(id='3', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!')]

In [None]:
results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": "tweet"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]


In [None]:
results[0].page_content

'Building an exciting new project with LangChain - come check it out!'

## Create Function

In [None]:
def get_stock_info(symbol: str) -> dict:
  """
  Retrieves and formats detailed info about stock from Yahoo Finance.

  Args:
    symbol(str):  The stock ticker symbol to look up

  Returns:
    dict: A dictionary containing the formatted stock information.

  """
  data = yf.Ticker(symbol)
  stock_info = data.info

  properties = {
      "Ticker" : stock_info.get('symbol', 'Information not available'),
      "Name" : stock_info.get('longName', 'Information not available'),
      "Business Summary" : stock_info.get('longBusinessSummary', 'Information not available'),
      "City": stock_info.get('city', 'Information not available'),
      "State" : stock_info.get("state", "Information not available"),
      "Country" : stock_info.get('country', 'Information not available'),
      "Industry" : stock_info.get('industry', 'Information not available'),
      "Sector" : stock_info.get('sector', 'Information not available'),
  }

  return properties



In [None]:
data = yf.Ticker("NVDA")
stock_info = data.info

In [None]:
stock_info

{'address1': '2788 San Tomas Expressway',
 'city': 'Santa Clara',
 'state': 'CA',
 'zip': '95051',
 'country': 'United States',
 'phone': '408 486 2000',
 'website': 'https://www.nvidia.com',
 'industry': 'Semiconductors',
 'industryKey': 'semiconductors',
 'industryDisp': 'Semiconductors',
 'sector': 'Technology',
 'sectorKey': 'technology',
 'sectorDisp': 'Technology',
 'longBusinessSummary': "NVIDIA Corporation provides graphics and compute and networking solutions in the United States, Taiwan, China, Hong Kong, and internationally. The Graphics segment offers GeForce GPUs for gaming and PCs, the GeForce NOW game streaming service and related infrastructure, and solutions for gaming platforms; Quadro/NVIDIA RTX GPUs for enterprise workstation graphics; virtual GPU or vGPU software for cloud-based visual and virtual computing; automotive platforms for infotainment systems; and Omniverse software for building and operating metaverse and 3D internet applications. The Compute & Networki

In [None]:
get_stock_info("AAPL")

{'Ticker': 'AAPL',
 'Name': 'Apple Inc.',
 'Business Summary': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experi

In [None]:
def get_huggingface_embeddings(text, model_name="sentence-transformers/all-mpnet-base-v2"):
    """
    Generates embeddings for the given text using a specified Hugging Face model.

    Args:
        text (str): The input text to generate embeddings for.
        model_name (str): The name of the Hugging Face model to use.
                          Defaults to "sentence-transformers/all-mpnet-base-v2".

    Returns:
        np.ndarray: The generated embeddings as a NumPy array.
    """
    model = SentenceTransformer(model_name)
    embeddings = model.encode(text)
    return embeddings

In [None]:
def cosine_similarity_between_sentences(sentence1, sentence2):
  """
  Calculates the cosine similarity between two sentences.

  Args:
    sentence1(str): The first sentence to compare.
    sentence2(str): The second sentence to compare.

  Returns:
    float: The cosine similarity between the two sentences.

  Notes:
    Prints the cosine similarity score to the console in formatted string
  """

  # Get embeddings for both sentences
  embedding1 = np.array(get_huggingface_embeddings(sentence1))
  embedding2 = np.array(get_huggingface_embeddings(sentence2))

  # Reshape the embeddings for cosine similarity
  embedding1 = embedding1.reshape(1, -1)
  embedding2 = embedding2.reshape(1, -1)

  # Calculate cosine similarity
  similarity = cosine_similarity(embedding1, embedding2)
  similarity_score = similarity[0][0]
  print(f"Cosine similarity between the two sentences: {similarity_score:.4f}")
  return similarity_score




In [None]:
# Example case

sentence1 = "I like to cook asian food"
sentence2 = "Japanese food is my favorite"

similarity = cosine_similarity_between_sentences(sentence1, sentence2)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Cosine similarity between the two sentences: 0.5717


In [None]:
msft_stock_info = get_stock_info("MSFT")
print(msft_stock_info)

{'Ticker': 'MSFT', 'Name': 'Microsoft Corporation', 'Business Summary': 'Microsoft Corporation develops and supports software, services, devices and solutions worldwide. The Productivity and Business Processes segment offers office, exchange, SharePoint, Microsoft Teams, office 365 Security and Compliance, Microsoft viva, and Microsoft 365 copilot; and office consumer services, such as Microsoft 365 consumer subscriptions, Office licensed on-premises, and other office services. This segment also provides LinkedIn; and dynamics business solutions, including Dynamics 365, a set of intelligent, cloud-based applications across ERP, CRM, power apps, and power automate; and on-premises ERP and CRM applications. The Intelligent Cloud segment offers server products and cloud services, such as azure and other cloud services; SQL and windows server, visual studio, system center, and related client access licenses, as well as nuance and GitHub; and enterprise services including enterprise support

In [None]:
# Another example case

msft_stock_info["Business Summary"]

company_desc = "I want to find company that specialize in cloud computing"

similarity = cosine_similarity_between_sentences(company_desc, msft_stock_info["Business Summary"])

Cosine similarity between the two sentences: 0.3962


## Get all the Stocks in the Stock Market

First, we need to get the symbols (also known as tickers) of all the stocks in the stock market

In [None]:
def get_company_tickers():
  """
  Downloads and parses the stocke ticker symbols from github hosted SEC company tickers JSON file.

  Returns"
    dict: dictionary containing company tickers and related info

  Notes:
    Data is sourced from official SEC website via github repo:
    "https://raw.githubusercontent.com/team-headstart/Financial-Analysis-and-Automation-with-LLMs/main/company_tickers.json"
  """

  # URL to fetch the raw JSON file from Github
  url = "https://raw.githubusercontent.com/team-headstart/Financial-Analysis-and-Automation-with-LLMs/main/company_tickers.json"

  # Send GET request to the url
  response = requests.get(url)

  # Check if the request is seccessful
  if response.status_code == 200:
    # Parse the JSON data
    company_tickers = json.loads(response.content.decode('utf-8'))

    #JSON load = function is used to read JSON data from a file object and convert it into a Python object (like a dictionary or list).


    # Optional: Save content to local file for future use
    with open("company_tickers.json", "w", encoding="utf-8") as file:
      json.dump(company_tickers, file, indent=4)

    print("File downloaded successfully and saved as 'company_tickers.json'")
    return company_tickers
  else:
    print(f"Failed to download the file. Status code: {response.status_code}")
    return None


In [None]:
company_tickers = get_company_tickers()
company_tickers

File downloaded successfully and saved as 'company_tickers.json'


{'0': {'cik_str': 1045810, 'ticker': 'NVDA', 'title': 'NVIDIA CORP'},
 '1': {'cik_str': 320193, 'ticker': 'AAPL', 'title': 'Apple Inc.'},
 '2': {'cik_str': 789019, 'ticker': 'MSFT', 'title': 'MICROSOFT CORP'},
 '3': {'cik_str': 1018724, 'ticker': 'AMZN', 'title': 'AMAZON COM INC'},
 '4': {'cik_str': 1652044, 'ticker': 'GOOGL', 'title': 'Alphabet Inc.'},
 '5': {'cik_str': 1326801, 'ticker': 'META', 'title': 'Meta Platforms, Inc.'},
 '6': {'cik_str': 1318605, 'ticker': 'TSLA', 'title': 'Tesla, Inc.'},
 '7': {'cik_str': 1067983,
  'ticker': 'BRK-B',
  'title': 'BERKSHIRE HATHAWAY INC'},
 '8': {'cik_str': 1046179,
  'ticker': 'TSM',
  'title': 'TAIWAN SEMICONDUCTOR MANUFACTURING CO LTD'},
 '9': {'cik_str': 1730168, 'ticker': 'AVGO', 'title': 'Broadcom Inc.'},
 '10': {'cik_str': 59478, 'ticker': 'LLY', 'title': 'ELI LILLY & Co'},
 '11': {'cik_str': 19617, 'ticker': 'JPM', 'title': 'JPMORGAN CHASE & CO'},
 '12': {'cik_str': 104169, 'ticker': 'WMT', 'title': 'Walmart Inc.'},
 '13': {'cik_str'

In [None]:
len(company_tickers)

9998

## Inserting Stocks into Pinecone

In [None]:
pinecone_api_key = userdata.get("PINECONE_API_KEY")
os.environ['PINECONE_API_KEY'] = pinecone_api_key

index_name = "stocks2"
namespace = "stock-descriptions"

hf_embeddings = HuggingFaceEmbeddings()
vectorstore = PineconeVectorStore(index_name=index_name, embedding=hf_embeddings)

  hf_embeddings = HuggingFaceEmbeddings()
  hf_embeddings = HuggingFaceEmbeddings()


## Sequential Processing

This processing is one by one and will took a long time to finish

[![](https://mermaid.ink/img/pako:eNqNkl1rgzAUhv9KyMXYwF74cSVjYA0rhY6WKRQWe5FqWqU1cUm8GKX_ffmwa65Gc6Hn5LxvzmM8F1jzhsIUHgUZWlCiigG9MlwoXp_AqpPqdS_esmyzCsBivV7o10fxXgagLFbZDsxmb2COi44dzxRsBK-plDoBZSsoaXbuNPeU4941qWBBv0fKVEfOvud5yejh0NWdLr1U0LnMmts2eYhDMLsZgEHa3TV5aEXbEG-zZZnasiXfLMGnaScVeNKRHDiT1DNunRGF5pMFtUaAiCKe5h4hp84jHHks9mJ8mMjBRBOMrT9G45wommis84bjYThZHuPYwzA_xqeIHUU8UZjyYxDOiOIJwhj_uRKnzhOc3FHsdHgoiUNJJhRTfgzFGVEyoRijj0JZAwPYU9GTrtFjfDHbFVQt7WkFUx02RJzMMF21joyKFz-shqkSIw2g4OOxhemBnKXOxqEhiqKO6DHt_3YHwr44v-XXX7It6B4?type=png)](https://mermaid.live/edit#pako:eNqNkl1rgzAUhv9KyMXYwF74cSVjYA0rhY6WKRQWe5FqWqU1cUm8GKX_ffmwa65Gc6Hn5LxvzmM8F1jzhsIUHgUZWlCiigG9MlwoXp_AqpPqdS_esmyzCsBivV7o10fxXgagLFbZDsxmb2COi44dzxRsBK-plDoBZSsoaXbuNPeU4941qWBBv0fKVEfOvud5yejh0NWdLr1U0LnMmts2eYhDMLsZgEHa3TV5aEXbEG-zZZnasiXfLMGnaScVeNKRHDiT1DNunRGF5pMFtUaAiCKe5h4hp84jHHks9mJ8mMjBRBOMrT9G45wommis84bjYThZHuPYwzA_xqeIHUU8UZjyYxDOiOIJwhj_uRKnzhOc3FHsdHgoiUNJJhRTfgzFGVEyoRijj0JZAwPYU9GTrtFjfDHbFVQt7WkFUx02RJzMMF21joyKFz-shqkSIw2g4OOxhemBnKXOxqEhiqKO6DHt_3YHwr44v-XXX7It6B4)

In [None]:
get_stock_info("AAPL")

{'Ticker': 'AAPL',
 'Name': 'Apple Inc.',
 'Business Summary': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experi

In [None]:
'''

for idx, stock in company_tickers.items():
    stock_ticker = stock['ticker']
    stock_data = get_stock_info(stock_ticker)
    stock_description = stock_data['Business Summary']

    print(f"Processing stock {idx} / {len(company_tickers)} :", stock_ticker)

    vectorstore_from_documents = PineconeVectorStore.from_documents(
        documents=[Document(page_content=stock_description, metadata=stock_data)],
        embedding=hf_embeddings,
        index_name=index_name,
        namespace=namespace
    )

  '''

'\n\nfor idx, stock in company_tickers.items():\n    stock_ticker = stock[\'ticker\']\n    stock_data = get_stock_info(stock_ticker)\n    stock_description = stock_data[\'Business Summary\']\n\n    print(f"Processing stock {idx} / {len(company_tickers)} :", stock_ticker)\n\n    vectorstore_from_documents = PineconeVectorStore.from_documents(\n        documents=[Document(page_content=stock_description, metadata=stock_data)],\n        embedding=hf_embeddings,\n        index_name=index_name,\n        namespace=namespace\n    )\n\n  '

## Parallelizing

[![](https://mermaid.ink/img/pako:eNqFk0uLgzAQgP_KkMOe7MHXRZaC1baXvsDCwqqHrGarVJMSE9hS-983NnWxULceRmf4Pp2MyQVlLCfIQweOTwXsw4SCuvw4Eiw7wqpsxPsXn_r-bmXAcrtdqts6WuwN2Ecr3wB__blJYTKZwixe45JCwKjgrKoI7zwXdphjlVXwwfiR8CbVH9BxdjPbqKxlJTAlTDbVuYXAjDUNZveSHWcZaRromkj_F61etIbire8Xpt2b9tDslvpCdHrRGYrddF6Ibi-6D4vsBjqcUWDe_NCMl0TAG6gfwwmEWOA7FlgasEYBWwP2KOBowBkFXA24Y4COoW61DRklLcwvQUHUHooEFrK53hHrAbkX7WdF51nRfVLUcX4fs8y6ObawMON-phvyI2CGRVakyEA14TUuc7XnL52ZIFGQmiTIU4855scEJfSqOCwFi840Q57gkhiIM3kokPeNq0Zl8pRjQcISq4NT_1VPmH4y1ufXXwdkCM8?type=png)](https://mermaid.live/edit#pako:eNqFk0uLgzAQgP_KkMOe7MHXRZaC1baXvsDCwqqHrGarVJMSE9hS-983NnWxULceRmf4Pp2MyQVlLCfIQweOTwXsw4SCuvw4Eiw7wqpsxPsXn_r-bmXAcrtdqts6WuwN2Ecr3wB__blJYTKZwixe45JCwKjgrKoI7zwXdphjlVXwwfiR8CbVH9BxdjPbqKxlJTAlTDbVuYXAjDUNZveSHWcZaRromkj_F61etIbire8Xpt2b9tDslvpCdHrRGYrddF6Ibi-6D4vsBjqcUWDe_NCMl0TAG6gfwwmEWOA7FlgasEYBWwP2KOBowBkFXA24Y4COoW61DRklLcwvQUHUHooEFrK53hHrAbkX7WdF51nRfVLUcX4fs8y6ObawMON-phvyI2CGRVakyEA14TUuc7XnL52ZIFGQmiTIU4855scEJfSqOCwFi840Q57gkhiIM3kokPeNq0Zl8pRjQcISq4NT_1VPmH4y1ufXXwdkCM8)

In [None]:
# Initialize tracking lists
successful_tickers = []
unsuccessful_tickers = []

# Load existing successful/unsuccessful tickers
try:
    with open('successful_tickers.txt', 'r') as f:
        successful_tickers = [line.strip() for line in f if line.strip()]
    print(f"Loaded {len(successful_tickers)} successful tickers")
except FileNotFoundError:
    print("No existing successful tickers file found")

try:
    with open('unsuccessful_tickers.txt', 'r') as f:
        unsuccessful_tickers = [line.strip() for line in f if line.strip()]
    print(f"Loaded {len(unsuccessful_tickers)} unsuccessful tickers")
except FileNotFoundError:
    print("No existing unsuccessful tickers file found")

Loaded 9227 successful tickers
No existing unsuccessful tickers file found


In [None]:
len(unsuccessful_tickers)

0

In [None]:
def process_stock(stock_ticker: str) -> str:
  # Skip if already processed
  if stock_ticker in successful_tickers:
    return f"Already processed {stock_ticker}"

  try:
    # Get and store stock data
    stock_data = get_stock_info(stock_ticker)
    stock_description = stock_data["Business Summary"]

    # Store stock description in Pinecone
    vectorstore_from_texts = PineconeVectorStore.from_documents(
        documents =[Document(page_content= stock_description, metadata= stock_data)],
        embedding = hf_embeddings,
        index_name = index_name,
        namespace = namespace
    )

    #Track success
    with open ("successful_tickers.txt", "a") as f:
      f.write(f"{stock_ticker}\n")
    successful_tickers.append(stock_ticker)

    return f"Processed {stock_ticker} successfully"

  except Exception as e:
    # Track failure
    with open ("unsuccessful.txt", "a") as f:
      f.write(f"{stock_ticker}\n")
    unsuccessful_tickers.append(stock_ticker)

    return f"Failed to process {stock_ticker}: {e}"

In [None]:
import time


In [None]:
def parallel_process_stocks(tickers: list, max_workers: int=10) -> None:
  with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:

    for ticker_batch in range(len(tickers) // max_workers):
      start_index = ticker_batch * max_workers
      #Creates a thread pool to execute tasks concurrently.
      future_to_ticker = {executor.submit(process_stock, ticker): ticker
                          for ticker in tickers[start_index:start_index + max_workers]
                          }
      # executor.submit() schedules the function for execution in the thread pool.

      for future in concurrent.futures.as_completed(future_to_ticker):
        ticker = future_to_ticker[future]

        try:
          result = future.result()
          print(result)


          # Stop on Error
          if result.startswith("ERROR"):
            print(f" Stopping the program due to error in {ticker}")
            executor.shutdown(wait=False)
            raise SystemExit(1)
        except Exception as exc:
          print(f"{ticker} generated an exception: {exc}")
          print("Stopping the program due to exception")
          executor.shutdown(wait=False)
          raise SystemExit(1)

      print("Sleeping for 5 seconds to avoid rate limiting")
      time.sleep(5)




In [None]:
#Prepare tickers

tickers_to_process = [ company_tickers[num]["ticker"] for num in company_tickers.keys()]
tickers_to_process = [ticker for ticker in tickers_to_process if ticker not in successful_tickers ]

# tickers_to_process = list(set(tickers_to_process) - set(successful_tickers))


# Process them
# Play with the max_workers, (reduce it if it can not start new thread)
# 1st try: 10 workers
parallel_process_stocks(tickers_to_process, max_workers=5)

Failed to process LVWR-WT: can't start new thread
Failed to process NOVVR: can't start new thread
Failed to process NOVVW: can't start new thread
Failed to process NOVVU: can't start new thread
Failed to process MOBQW: can't start new thread
Sleeping for 5 seconds to avoid rate limiting


KeyboardInterrupt: 

In [None]:
len(successful_tickers)

9227

## Perform RAG

In [None]:
# Initialize pinecone

pc = Pinecone(api_key= userdata.get("PINECONE_API_KEY"))

# Connect to Pinecone index
pinecone_index = pc.Index(index_name)

In [None]:
query = "What are some companies that do consumer goods?"

In [None]:
query2 = "Apple"

In [None]:
raw_query_embed = get_huggingface_embeddings(query2)

In [None]:
raw_query_embed.shape

(768,)

In [None]:
top_matches = pinecone_index.query(
    vector = raw_query_embed.tolist(),
    top_k = 10,
    include_metadata = True,
    namespace = namespace
)

In [None]:
top_matches

{'matches': [{'id': '6d784a4a-980b-49ac-b55a-26faa63bf38e',
              'metadata': {'Business Summary': 'Apple Inc. designs, '
                                               'manufactures, and markets '
                                               'smartphones, personal '
                                               'computers, tablets, wearables, '
                                               'and accessories worldwide. The '
                                               'company offers iPhone, a line '
                                               'of smartphones; Mac, a line of '
                                               'personal computers; iPad, a '
                                               'line of multi-purpose tablets; '
                                               'and wearables, home, and '
                                               'accessories comprising '
                                               'AirPods, Apple TV, Apple '
                 

In [None]:
contexts = [item["metadata"]["text"] for item in top_matches["matches"]]

In [None]:
augmented_query = "<CONTEXT>\n" + "\n\n ---------- \n\n".join(contexts[:10]) + "\n-------\n</CONTEXTS>" + "\n\n\n MY QUESTION:\n" + query2

In [None]:
print(augmented_query)

<CONTEXT>
Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a sub

## Setting up GROQ for RAG

In [None]:
!pip install groq




In [None]:
from groq import Groq
client = Groq(
    api_key=userdata.get("GROQ_API_KEY"),
)

In [None]:
#system_prompt = f"""You are an expert at providing answers about stocks. Please answer my question provided."""


system_prompt = """
You are a financial expert specializing in stocks and the stock market.
Provide clear, accurate, and well-researched answers to any stock-related questions.
If relevant, include key details such as company names, sectors, market capitalization, and recent trends.
Additionally, consider related companies that might be impacted by the question's context, including suppliers, competitors, or companies that have a direct supply-demand effect.
Ensure your responses are concise, actionable, and easy to understand.
"""

In [None]:
chat_completion = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": augmented_query}
    ]
)

response = chat_completion.choices[0].message.content

In [None]:
print(response)

Based on the provided context, Apple Inc. is a technology giant that designs, manufactures, and markets a wide range of products and services, including iPhones, Macs, iPads, wearables, and accessories. The company has a strong presence in various markets and offers a range of subscription-based services such as Apple Music, Apple TV+, and Apple Arcade.

Some key aspects of Apple Inc. include:

1. **Diversified product portfolio**: Apple has a wide range of products that cater to different segments of the market, from consumer electronics to software and services.
2. **Strong brand loyalty**: Apple has a strong brand presence and loyalty among its customers, which contributes to its success in the market.
3. **Partnerships and collaborations**: Apple partners with various companies, such as Intel Corporation, to develop and distribute its products and services.
4. **Innovative technologies**: Apple is known for its innovative technologies, such as Face ID, Touch ID, and Siri, which dif

In [None]:
import requests
from datetime import datetime, timedelta


In [None]:

def fetch_top_news(search_term):

    one_month_ago = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')
    url = "https://api.thenewsapi.com/v1/news/top"
    params = {
        'api_token': userdata.get("NEWS_API_KEY"),
        'locale': 'us',
        'limit': 3,
        'search': search_term,
        'language': "en",
        'published_after': one_month_ago,
        'sort': "relevance_score"
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        news_data = response.json()
        news_articles = []
        for article in news_data['data']:
            news_articles.append(
                f"title: {article['title']}\n"
                f"source: {article['source']}\n"
                f"date: {article['published_at']}\n"
                f"description: {article['description']}\n"
                f"categories: {', '.join(article['categories'])}\n"
                f"link: {article['url']}\n"
            )

        return news_articles

    else:
        print(f"Failed to fetch news: {response.status_code}")
        return []



In [None]:
# Example usage
news_articles = fetch_top_news("Apple")
print(news_articles)

["title: Apple Quietly Releases a Gorgeous Apple Watch Accessory\nsource: yahoo.com\ndate: 2024-11-13T17:29:55.000000Z\ndescription: First revealed with Series 10's launch, it's only finally available for order.\ncategories: general, business, sports, entertainment\nlink: https://www.yahoo.com/tech/apple-quietly-releases-gorgeous-apple-172955297.html\n", 'title: Black Friday Apple deals 2024: The best Apple sales on iPads, AirPods, Apple Watches and MacBooks\nsource: yahoo.com\ndate: 2024-11-19T16:51:43.000000Z\ndescription: Here are the best Black Friday deals we could find on Apple devices including iPads, AirPods, MacBooks and more.\ncategories: general, business, sports, entertainment\nlink: https://www.yahoo.com/tech/black-friday-apple-deals-2024-the-best-apple-sales-on-ipads-airpods-apple-watches-and-macbooks-165143887.html\n', 'title: Apple Unusual Options Activity For December 06 - Apple (NASDAQ:AAPL)\nsource: benzinga.com\ndate: 2024-12-06T20:54:40.000000Z\ndescription: \ncate

In [None]:
augmented_query_with_news = (
    "<CONTEXT>\n" + "\n\n ---------- \n\n".join(contexts[:10]) +
"\n-------\n</CONTEXTS>" + "\n\n\n<RELATED NEWS ARTICLE>\n" +
    "\n-------\n".join(news_articles) + "\n-------\n</RELATED NEWS ARTICLE>"
"\n\n\nMY QUESTION:\n" + query2
)

In [None]:
print(augmented_query_with_news)

<CONTEXT>
Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a sub

In [None]:
chat_completion = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": augmented_query_with_news}
    ]
)

response = chat_completion.choices[0].message.content

In [None]:
print(response)

Apple Inc. is a multinational technology company that designs, manufactures, and markets consumer electronics, computer software, and online services. As a leading player in the tech industry, Apple's stocks are closely watched by investors and analysts alike.

Recent news articles have highlighted various promotions and releases related to Apple's products. For instance, a recent article on Yahoo! mentioned Apple's quiet release of a gorgeous Apple Watch accessory, which was first revealed with the launch of Series 10. Another article on Yahoo! listed some of the best Black Friday deals on Apple devices, including iPads, AirPods, Apple Watches, and MacBooks.

From an investment perspective, Apple's unusual options activity has been reported by Benzinga, with the company's stock ticker (AAPL) being mentioned in this context.

In terms of market performance, Apple's stock price has been relatively stable over the past year, with some fluctuations due to market trends and investor sentim

## Putting All Together

In [None]:
def perform_rag(query):
    raw_query_embedding = get_huggingface_embeddings(query)

    top_matches = pinecone_index.query(vector=raw_query_embedding.tolist(), top_k=10, include_metadata=True, namespace="stock-descriptions")

    # Get the list of retrieved texts
    contexts = [item['metadata']['text'] for item in top_matches['matches']]

    #Get top 3 article
    news_articles = fetch_top_news(query)


    augmented_query_with_news = (
    "<CONTEXT>\n" + "\n\n ---------- \n\n".join(contexts[:10]) +
"\n-------\n</CONTEXTS>" + "\n\n\n<RELATED NEWS ARTICLE>\n" +
    "\n-------\n".join(news_articles) + "\n-------\n</RELATED NEWS ARTICLE>"
"\n\n\nMY QUESTION:\n" + query
)
    # Modify the prompt below as need to improve the response quality

    system_prompt = """
    You are a financial expert specializing in stocks and the stock market.
    Provide clear, accurate, and well-researched answers to any stock-related questions.
    If relevant, include key details such as company names, sectors, market capitalization, and recent trends.
    Additionally, consider related companies that might be impacted by the question's context, including suppliers, competitors, or companies that have a direct supply-demand effect.
    Ensure your responses are concise, actionable, and easy to understand.
    """


    chat_completion = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": augmented_query_with_news}
    ]
    )

    return chat_completion.choices[0].message.content

In [None]:
response = perform_rag("News related to apple")

In [None]:
print(response)

There are two recent news articles related to Apple. 

One article from cnet.com dated November 17, 2024, titled "After Using Apple Intelligence for Weeks, This One Feature Stands Out" and another article from the same source dated November 10, 2024, titled "I've Been Using Apple Intelligence for Weeks, and One Feature Stands Out." Both articles discuss the experience of using Apple's new AI feature, specifically the message summaries, which seem to offer a genuinely useful experience but require improvement in terms of reliability.

While the other news article from foxnews.com is not directly related to Apple, it touches upon the aspect of social media, which Apple integrates with in many of its services and devices. In this article, users are advised how to ask others not to post photos of their kids on social media in order to help maintain their safety.

Some relevant companies impacted by this news include:
1. Apple Inc., mentioned in the cnet.com articles and the largest smartph

In [None]:
response = perform_rag("I want to buy consumer goods stock, which stock should i buy and give me related news for the decision to buy it or not, and can you share the link as well")

In [None]:
print(response)

Based on the provided context, I recommend considering Walmart Inc. (WMT) as a potential consumer goods stock to buy. Walmart is a well-established retail giant with a diverse portfolio of consumer goods, including groceries, electronics, home appliances, and more.

Recent News:
Walmart has been making significant investments in its e-commerce platform, with a focus on enhancing the online shopping experience for its customers. According to a recent report by CNBC, Walmart's e-commerce sales grew 6% in the third quarter, driven by investments in its online grocery shopping platform and the expansion of its online services.

Additionally, Walmart has announced a new partnership with JD.com, a leading Chinese e-commerce company, to expand its online presence in China. This partnership will enable Walmart to leverage JD.com's vast logistics and delivery network to reach more customers in China.

As for related news, here's an article from CNBC:

https://www.cnbc.com/2022/12/05/walmart-e- 