<a href="https://colab.research.google.com/github/parky-sood/ai-financial-analysis/blob/main/Stock_Insider_DB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Libraries

In [2]:
! pip install yfinance langchain_pinecone openai python-dotenv langchain-community sentence_transformers

Collecting langchain_pinecone
  Downloading langchain_pinecone-0.2.0-py3-none-any.whl.metadata (1.7 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.10-py3-none-any.whl.metadata (2.9 kB)
Collecting aiohttp<3.10,>=3.9.5 (from langchain_pinecone)
  Downloading aiohttp-3.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.5 kB)
Collecting pinecone-client<6.0.0,>=5.0.0 (from langchain_pinecone)
  Downloading pinecone_client-5.0.1-py3-none-any.whl.metadata (19 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain<0.4.0,>=0.3.10 (from langchain-community)
  Downloading langchain-0.3.10-py3-none-any.whl.metadata (7.1 kB)
Co

In [3]:
from langchain_pinecone import PineconeVectorStore
from openai import OpenAI
import dotenv
import json
import yfinance as yf
import concurrent.futures
from langchain_community.embeddings import HuggingFaceEmbeddings
from google.colab import userdata
from langchain.schema import Document
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
from pinecone import Pinecone
import numpy as np
import requests
import os

# Get Stock Info

In [4]:
def get_stock_info(symbol: str) -> dict:
  """
  Retrives and formats detailed information about a sotck from Yahoo Finance.

  Args:
    symbol (str): The stock ticker symbol to look up.

  Returns:
    dict: A dictionary containing detailed stock information, including ticker,
          name, business summary, city, state, country, industry, and sector.
  """

  data = yf.Ticker(symbol)
  stock_info = data.info

  properties = {
      "Ticker": stock_info.get('symbol', 'N/A'),
      # "Name": stock_info.get('longName', 'Name not available'),
      "Business Summary": stock_info.get('longBusinessSummary', "N/A"),
      # "City": stock_info.get('city', 'City not available'),
      # "State": stock_info.get('state', 'State not available'),
      # "Country": stock_info.get('country', 'Country not available'),
      # "Industry": stock_info.get('industry', 'Industry not available'),
      # "Sector": stock_info.get('sector', 'Sector not available'),
      # "Debt-To-Equity Ratio": str(stock_info.get('debtToEquity')),
      # "Earnings Per Share": str(stock_info.get('revenuePerShare')),
      # "Price-To-Book Ratio": str(stock_info.get('priceToBook')),
      # "Dividend Yield Ratio": str(stock_info.get('dividendYield')),
      # "Market Capitalization": str(stock_info.get('marketCap')),
      # "Book Value": str(stock_info.get('bookValue')),
      # "Return on Equity": str(stock_info.get('returnOnEquity'))
  }

  return properties

In [5]:
def get_huggingface_embeddings(text, model_name="sentence-transformers/all-mpnet-base-v2"):
  """
  Generates embeddings for the given test using the specified Hugging Face model.

  Args:
    text (str): The input text to convert to embedding.
    model_name (str): The name of the Hugging Face model to use.
                        Defaults to sentence_transformers/all-mpnet-base-v2.

  Returns:
    np.ndarray: The generated embeddings as a NumPy array.
  """

  model = SentenceTransformer(model_name)
  return model.encode(text)

def cosine_similarity_between_sentences(sentence1, sentence2):
  """
  Calculates the cosine similarity between two sentences.

  Args:
    sentence1 (str): The first sentence for similarity comparison.
    sentence2 (str): The second sentence for similarity comparison.

  Returns:
    float: The cosine similarity score between the two sentences, ranging from
            -1 (opposites) to 1 (identical).

  Notes:
    Prints similarity score to console in a formatted string.
  """

  embedding1 = np.array(get_huggingface_embeddings(sentence1))
  embedding2 = np.array(get_huggingface_embeddings(sentence2))

  embedding1 = embedding1.reshape(-1, 1)
  embedding2 = embedding2.reshape(-1, 1)

  similarity = cosine_similarity(embedding1, embedding2)
  similarity_score = similarity[0][0]

  print(f"Cosine similarity between two sentences: {similarity_score:.4f}")
  return similarity_score


# Get all Stocks in NYSE

In [6]:
def get_company_tickers():
  """
  Downloads and parses stock ticker symbols from GitHub-hosted SEC company tickers JSON file.

  Returns:
    dict: A dictionary containing company tickers and related information.

  Notes:
    The data is sourced from official SEC website via this GitHub repo:
    https://raw.githubusercontent.com/parky-sood/ai-financial-analysis/refs/heads/main/company_tickers.json
  """

  url = "https://raw.githubusercontent.com/parky-sood/ai-financial-analysis/refs/heads/main/company_tickers.json"

  response = requests.get(url)

  if response.status_code == 200:
    company_tickers = json.loads(response.content.decode('utf-8'))

    with open("company-tickers.json", "w", encoding="utf-8") as file:
      json.dump(company_tickers, file, indent=4)

      print("File downloaded successfully and saved as 'company-tickers.json'")

      return company_tickers

  else:
    print(f"Failed to download file. Status code: {response.status_code}")
    return None

company_tickers = get_company_tickers()





File downloaded successfully and saved as 'company-tickers.json'


In [7]:
 company_tickers

{'0': {'cik_str': 1045810, 'ticker': 'NVDA', 'title': 'NVIDIA CORP'},
 '1': {'cik_str': 320193, 'ticker': 'AAPL', 'title': 'Apple Inc.'},
 '2': {'cik_str': 789019, 'ticker': 'MSFT', 'title': 'MICROSOFT CORP'},
 '3': {'cik_str': 1018724, 'ticker': 'AMZN', 'title': 'AMAZON COM INC'},
 '4': {'cik_str': 1652044, 'ticker': 'GOOGL', 'title': 'Alphabet Inc.'},
 '5': {'cik_str': 1326801, 'ticker': 'META', 'title': 'Meta Platforms, Inc.'},
 '6': {'cik_str': 1318605, 'ticker': 'TSLA', 'title': 'Tesla, Inc.'},
 '7': {'cik_str': 1067983,
  'ticker': 'BRK-B',
  'title': 'BERKSHIRE HATHAWAY INC'},
 '8': {'cik_str': 1046179,
  'ticker': 'TSM',
  'title': 'TAIWAN SEMICONDUCTOR MANUFACTURING CO LTD'},
 '9': {'cik_str': 1730168, 'ticker': 'AVGO', 'title': 'Broadcom Inc.'},
 '10': {'cik_str': 59478, 'ticker': 'LLY', 'title': 'ELI LILLY & Co'},
 '11': {'cik_str': 19617, 'ticker': 'JPM', 'title': 'JPMORGAN CHASE & CO'},
 '12': {'cik_str': 104169, 'ticker': 'WMT', 'title': 'Walmart Inc.'},
 '13': {'cik_str'

In [8]:
pinecone_api_key = userdata.get("PINECONE_API_KEY")
os.environ['PINECONE_API_KEY'] = pinecone_api_key

index_name = "ai-financial"
namespace = "stock-descriptions"

hf_embeddings = HuggingFaceEmbeddings()
vectorstore = PineconeVectorStore(index_name=index_name, embedding=hf_embeddings)

  hf_embeddings = HuggingFaceEmbeddings()
  hf_embeddings = HuggingFaceEmbeddings()
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [9]:
unsuccessful_tickers = []
successful_tickers = []
try:
  with open('successful_tickers.txt', 'r') as f:
    successful_tickers = [line.strip() for line in f if line.strip()]

  print(f"Loaded {len(successful_tickers)} successful tickers")
except FileNotFoundError:
  print("No existing successful tickers file found")

try:
  with open('unsuccessful_tickers.txt', 'r') as f:
    unsuccessful_tickers = [line.strip() for line in f if line.strip()]
  print(f"Loaded {len(unsuccessful_tickers)} unsuccessful tickers")

except FileNotFoundError:
  print("No existing unsuccessful tickers file found")

Loaded 7787 successful tickers
Loaded 1311 unsuccessful tickers


In [10]:
def process_stock(stock_ticker: str) -> str:
    if stock_ticker in successful_tickers:
        return f"Already processed {stock_ticker}"

    try:
        stock_data = get_stock_info(stock_ticker)
        stock_description = stock_data['Business Summary']

        vectorstore_from_texts = PineconeVectorStore.from_documents(
            documents=[Document(page_content=stock_description, id=stock_ticker)],
            embedding=hf_embeddings,
            index_name=index_name,
            namespace=namespace
        )

        with open('successful_tickers.txt', 'a') as f:
            f.write(f"{stock_ticker}\n")
        successful_tickers.append(stock_ticker)

        return f"Processed {stock_ticker} successfully"

    except Exception as e:
        with open('unsuccessful_tickers.txt', 'a') as f:
            f.write(f"{stock_ticker}\n")
        unsuccessful_tickers.append(stock_ticker)

        return f"ERROR processing {stock_ticker}: {e}"

def parallel_process_stocks(tickers: list, max_workers: int = 10) -> None:
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_ticker = {
            executor.submit(process_stock, ticker): ticker
            for ticker in tickers
        }

        for future in concurrent.futures.as_completed(future_to_ticker):
            ticker = future_to_ticker[future]
            try:
                result = future.result()
                print(result)

                # Stop on error
                if result.startswith("ERROR"):
                    print(f"Stopping program due to error in {ticker}")
                    executor.shutdown(wait=False)
                    raise SystemExit(1)

            except Exception as exc:
                print(f'{ticker} generated an exception: {exc}')
                print("Stopping program due to exception")
                executor.shutdown(wait=False)
                raise SystemExit(1)

# Prepare your tickers
tickers_to_process = [company_tickers[num]['ticker'] for num in company_tickers.keys()]

# Process them
parallel_process_stocks(tickers_to_process, max_workers=10)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Already processed MXE
Already processed ALDF
Already processed CREX
Already processed LEDS
Already processed KALA
Already processed GVXXF
Already processed INTV
Already processed EGF
Already processed IVHI
Already processed APWC
Already processed FTFT
Already processed OPTX
Already processed CVU
Already processed JPOTF
Already processed FLUX
Already processed RDAC
Already processed HKIT
Already processed WTO
Already processed MATH
Already processed ABLV
Already processed SIGY
Already processed NAAS
Already processed WLAC
Already processed HFBL
Already processed VLDX
Already processed NRT
Already processed BSPK
Already processed RVPH
Already processed INTJ
Already processed WLKP
Already processed MMA
Already processed THQ
Already processed CYRB
Already processed GSM
Already processed CLEU
Already processed BDTX
Already processed HBNC
Already processed ETO
Already processed CSTL
Already processed DDI
Already processed SLQT


ERROR:yfinance:404 Client Error: Not Found for url: https://query2.finance.yahoo.com/v10/finance/quoteSummary/DAIC%20?modules=financialData%2CquoteType%2CdefaultKeyStatistics%2CassetProfile%2CsummaryDetail&corsDomain=finance.yahoo.com&formatted=false&symbol=DAIC+&crumb=zltVLOP2ok6


Processed DAIC  successfully
Processed INN-PE successfully
Processed HL-PB successfully
Processed FMCCH successfully
Processed FMCKP successfully
Processed IIPR-PA successfully
Processed FMCCP successfully
Processed GRAF-UN successfully
Processed FMCCK successfully
Processed PLLTL successfully
Processed BELFB successfully
Processed FMCCM successfully
Processed AHH-PA successfully
Processed IRET successfully
Processed GAM-PB successfully
Processed DRDGF successfully
Processed GTN-A successfully
Processed AGM-PD successfully
Processed PTCHF successfully
Processed CMRE-PB successfully
Processed CMRE-PC successfully
Processed MEOBF successfully
Processed CMRE-PD successfully
Processed BH successfully
Processed KELYB successfully
Processed HLTC successfully
Processed KBSR successfully
Processed SMBMF successfully
Processed GGT-PE successfully
Processed CELJF successfully
Processed DEFTF successfully
Processed GAMI successfully
Processed HVT-A successfully
Processed MPSYF successfully
Proces

ERROR:yfinance:500 Server Error: Internal Server Error for url: https://query2.finance.yahoo.com/v10/finance/quoteSummary/THCPW?modules=financialData%2CquoteType%2CdefaultKeyStatistics%2CassetProfile%2CsummaryDetail&corsDomain=finance.yahoo.com&formatted=false&symbol=THCPW&crumb=zltVLOP2ok6


Processed LFT-PA successfully
Processed KEY-PL successfully
Processed IRAAU successfully
Processed UNOV successfully
Processed ADNWW successfully
Processed THCPU successfully
Processed IRAAW successfully
Processed THCPW successfully
Processed VLYPN successfully
Processed MHNC successfully
Processed ARQQW successfully
Processed MSCF successfully
Processed SCCC successfully
Processed DSAQW successfully
Processed SCCD successfully
Processed SCCE successfully
Processed SCCF successfully
Processed SCCG successfully
Processed SACC successfully
Processed GDL-PC successfully
Processed SACH-PA successfully
Processed BLEUR successfully
Processed BLEUU successfully
Processed OUST-WT successfully
Processed DSAQU successfully
Processed GFAIW successfully
Processed CTLPP successfully
Processed BLEUW successfully
Processed OUST-WTA successfully
Processed TEN-PE successfully
Processed TEN-PF successfully
Processed NMKCP successfully
Processed NMKBP successfully
Processed NMPWP successfully
Processed N

ERROR:yfinance:404 Client Error: Not Found for url: https://query2.finance.yahoo.com/v10/finance/quoteSummary/ADZCF?modules=financialData%2CquoteType%2CdefaultKeyStatistics%2CassetProfile%2CsummaryDetail&corsDomain=finance.yahoo.com&formatted=false&symbol=ADZCF&crumb=zltVLOP2ok6


Processed NXDT-PA successfully
Processed SBXD-WT successfully
Processed LEXXW successfully
Processed DEENF successfully
Processed FRBP successfully
Processed SBXD-UN successfully
Processed ADZCF successfully
Processed CTSWF successfully
Processed OLOXF successfully
Processed DGZ successfully
Processed DZZ successfully
Processed DGP successfully
Processed CTSUF successfully
Processed CAPNU successfully
Processed CAPNR successfully
Processed FRSPF successfully
Processed SAT successfully
Processed SAY successfully
Processed SAZ successfully
Processed SAJ successfully
Processed MKFGW successfully
Processed GRND-WT successfully
Processed BW-PA successfully
Processed RELIW successfully
Processed GMTH successfully
Processed CFR-PB successfully
Processed CNDAW successfully
Processed CNDAU successfully
Processed ADSEW successfully
Processed KHOB successfully
Processed NCPLW successfully
Processed WBS-PG successfully
Processed FFHPF successfully
Processed BLUAW successfully
Processed FAXRF succe

ERROR:yfinance:500 Server Error: Internal Server Error for url: https://query2.finance.yahoo.com/v10/finance/quoteSummary/DMYY-WT?modules=financialData%2CquoteType%2CdefaultKeyStatistics%2CassetProfile%2CsummaryDetail&corsDomain=finance.yahoo.com&formatted=false&symbol=DMYY-WT&crumb=zltVLOP2ok6


Processed FITBO successfully
Processed NVAWW successfully
Processed FVNNU successfully
Processed FVNNR successfully
Processed FITBP successfully
Processed NVAAF successfully
Processed TOIIW successfully
Processed DMYY-WT successfully
Processed CHEB-WT successfully
Processed DMYY-UN successfully
Processed CHEB-UN successfully
Processed ATEK-WT successfully
Processed LUNRW successfully
Processed IONQ-WT successfully
Processed XFOWW successfully
Processed ATEK-UN successfully
Processed AGXRW successfully
Processed TBMCR successfully
Processed NMHIW successfully
Processed BURUW successfully
Processed VAL-WT successfully
Processed MDNC successfully
Processed GDEVW successfully
Processed CRTDW successfully
Processed HSPOR successfully
Processed HSPOU successfully
Processed GLTK successfully
Processed HSPOW successfully
Processed HUDAR successfully
Processed AP-WT successfully
Processed ABLLL successfully
Processed ABLLW successfully
Processed WTFCP successfully
Processed RCFA-WT successfully

ERROR:yfinance:500 Server Error: Internal Server Error for url: https://query2.finance.yahoo.com/v10/finance/quoteSummary/LUCYW?modules=financialData%2CquoteType%2CdefaultKeyStatistics%2CassetProfile%2CsummaryDetail&corsDomain=finance.yahoo.com&formatted=false&symbol=LUCYW&crumb=zltVLOP2ok6


Processed NEMCL successfully
Processed UHGI successfully
Processed DYCQU successfully
Processed DYCQR successfully
Processed ASB-PF successfully
Processed BIPJ successfully
Processed LUCYW successfully
Processed BRIPF successfully
Processed BIPH successfully
Processed MNESP successfully
Processed VFSWW successfully
Processed BIPI successfully
Processed MNQFF successfully
Processed ATHS successfully
Processed JOCM successfully
Processed STRRP successfully
Processed CLDT-PA successfully
Processed HYZNW successfully
Processed MNLCF successfully
Processed BIP-PA successfully
Processed BKKT-WT successfully
Processed MNUFF successfully
Processed BIP-PB successfully
Processed SAIHW successfully
Processed ATH-PB successfully
Processed SIMAW successfully
Processed SIMAU successfully
Processed ATH-PC successfully
Processed ATH-PD successfully
Processed ATH-PE successfully
Processed ATLCL successfully
Processed ALSAR successfully
Processed ALSAU successfully
Processed ICUCW successfully
Processed

ERROR:yfinance:500 Server Error: Internal Server Error for url: https://query2.finance.yahoo.com/v10/finance/quoteSummary/PFTAW?modules=financialData%2CquoteType%2CdefaultKeyStatistics%2CassetProfile%2CsummaryDetail&corsDomain=finance.yahoo.com&formatted=false&symbol=PFTAW&crumb=zltVLOP2ok6


Processed ESGLW successfully
Processed INPAP successfully
Processed COF-PI successfully
Processed PFTAW successfully
Processed PFTAU successfully
Processed COF-PJ successfully
Processed LIFWZ successfully
Processed LIFWW successfully
Processed PSPX successfully
Processed COF-PL successfully
Processed COF-PN successfully
Processed COF-PK successfully
Processed EPDU successfully
Processed SBEV-WT successfully
Processed CRESW successfully
Processed MYPSW successfully
Processed LMMY successfully
Processed PROCW successfully
Processed DHAIW successfully
Processed AMBI-WT successfully
Processed BFRIW successfully
Processed BWVTF successfully
Processed BMTX-WT successfully
Processed ATMP successfully
Processed TFC-PI successfully
Processed EVVAQ successfully
Processed COWTF successfully
Processed TFC-PR successfully
Processed DJP successfully
Processed TFC-PO successfully
Processed VXZ successfully
Processed DTSTW successfully
Processed VXX successfully
Processed PGMFF successfully
Processed 