<a href="https://colab.research.google.com/github/saikishore1903/Finance-Pulse/blob/main/Finance_Pulse_Minor_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Finance Pulse



## Installing Dependencies and Setting API Keys

In [None]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes
!pip install sec_api
!pip install -U langchain
!pip install -U langchain-community
!pip install -U sentence-transformers
!pip install -U faiss-gpu

In [None]:
# HuggingFace token, required for accessing gated models (like LLaMa 3 8B Instruct)
hf_token = "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"
# SEC-API Key
sec_api_key = "db6e58ce5ddeee92160006f280f9b9a780ea8b69f24f41555bfd60f44e1bae45"

In [None]:
# Fine Tuning Related Packages
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# Pipeline & RAG Related Packages
from sec_api import ExtractorApi, QueryApi
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


---
## **Part 1: Fine Tuning LLaMa 3 with Unsloth**



### **Initializing Pre Trained Model and Tokenizer**



In [None]:
# Load the model and tokenizer from the pre-trained FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    # Specify the pre-trained model to use
    model_name = "meta-llama/Meta-Llama-3-8B-Instruct",
    # Specifies the maximum number of tokens (words or subwords) that the model can process in a single forward pass
    max_seq_length = 2048,
    # Data type for the model. None means auto-detection based on hardware, Float16 for specific hardware like Tesla T4
    dtype = None,
    # Enable 4-bit quantization, By quantizing the weights of the model to 4 bits instead of the usual 16 or 32 bits, the memory required to store these weights is significantly reduced. This allows larger models to be run on hardware with limited memory resources.
    load_in_4bit = True,
    # Access token for gated models, required for authentication to use models like Meta-Llama-2-7b-hf
    token = hf_token,
)


==((====))==  Unsloth 2024.10.6: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/220 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.1k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

Unsloth: We fixed a gradient accumulation bug, but it seems like you don't have the latest transformers version!
Please update transformers, TRL and unsloth via:
`pip install --upgrade --no-cache-dir unsloth git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/trl.git`


**Adding in LoRA adapters for parameter efficient fine tuning**



In [None]:
# Apply LoRA (Low-Rank Adaptation) adapters to the model for parameter-efficient fine-tuning
model = FastLanguageModel.get_peft_model(
    model,
    # Rank of the adaptation matrix. Higher values can capture more complex patterns. Suggested values: 8, 16, 32, 64, 128
    r = 16,
    # Specify the model layers to which LoRA adapters should be applied
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    # Scaling factor for LoRA. Controls the weight of the adaptation. Typically a small positive integer
    lora_alpha = 16,
    # Dropout rate for LoRA. A value of 0 means no dropout, which is optimized for performance
    lora_dropout = 0,
    # Bias handling in LoRA. Setting to "none" is optimized for performance, but other options can be used
    bias = "none",
    # Enables gradient checkpointing to save memory during training. "unsloth" is optimized for very long contexts
    use_gradient_checkpointing = "unsloth",
    # Seed for random number generation to ensure reproducibility of results
    random_state = 3407,
)

Unsloth 2024.10.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### **Preparing the Fine Tuning Dataset**


In [None]:
# Defining the expected prompt
ft_prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Below is a user question, paired with retrieved context. Write a response that appropriately answers the question,
include specific details in your response. <|eot_id|>

<|start_header_id|>user<|end_header_id|>

### Question:
{}

### Context:
{}

<|eot_id|>

### Response: <|start_header_id|>assistant<|end_header_id|>
{}"""

# Grabbing end of sentence special token
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

# Function for formatting above prompt with information from Financial QA dataset
def formatting_prompts_func(examples):
    questions = examples["question"]
    contexts       = examples["context"]
    responses      = examples["answer"]
    texts = []
    for question, context, response in zip(questions, contexts, responses):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = ft_prompt.format(question, context, response) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

dataset = load_dataset("virattt/llama-3-8b-financialQA", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

README.md:   0%|          | 0.00/419 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7000 [00:00<?, ? examples/s]

Map:   0%|          | 0/7000 [00:00<?, ? examples/s]

In [None]:
dataset[0]

{'question': 'What area did NVIDIA initially focus on before expanding to other computationally intensive fields?',
 'answer': 'NVIDIA initially focused on PC graphics.',
 'context': 'Since our original focus on PC graphics, we have expanded to several other large and important computationally intensive fields.',
 'ticker': 'NVDA',
 'filing': '2023_10K',
 'text': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\nBelow is a user question, paired with retrieved context. Write a response that appropriately answers the question,\ninclude specific details in your response. <|eot_id|>\n\n<|start_header_id|>user<|end_header_id|>\n\n### Question:\nWhat area did NVIDIA initially focus on before expanding to other computationally intensive fields?\n\n### Context:\nSince our original focus on PC graphics, we have expanded to several other large and important computationally intensive fields.\n\n<|eot_id|>\n\n### Response: <|start_header_id|>assistant<|end_header_id|>\nNVIDIA initially

### **Defining the Trainer Arguments**



In [None]:
trainer = SFTTrainer(
    # The model to be fine-tuned
    model = model,
    # The tokenizer associated with the model
    tokenizer = tokenizer,
    # The dataset used for training
    train_dataset = dataset,
    # The field in the dataset containing the text data
    dataset_text_field = "text",
    # Maximum sequence length for the training data
    max_seq_length = 2048,
    # Number of processes to use for data loading
    dataset_num_proc = 2,
    # Whether to use sequence packing, which can speed up training for short sequences
    packing = False,
    args = TrainingArguments(
        # Batch size per device during training
        per_device_train_batch_size = 2,
        # Number of gradient accumulation steps to perform before updating the model parameters
        gradient_accumulation_steps = 4,
        # Number of warmup steps for learning rate scheduler
        warmup_steps = 5,
        # Total number of training steps
        max_steps = 60,
        # Number of training epochs, can use this instead of max_steps, for this notebook its ~900 steps given the dataset
        # num_train_epochs = 1,
        # Learning rate for the optimizer
        learning_rate = 2e-4,
        # Use 16-bit floating point precision for training if bfloat16 is not supported
        fp16 = not is_bfloat16_supported(),
        # Use bfloat16 precision for training if supported
        bf16 = is_bfloat16_supported(),
        # Number of steps between logging events
        logging_steps = 1,
        # Optimizer to use (in this case, AdamW with 8-bit precision)
        optim = "adamw_8bit",
        # Weight decay to apply to the model parameters
        weight_decay = 0.01,
        # Type of learning rate scheduler to use
        lr_scheduler_type = "linear",
        # Seed for random number generation to ensure reproducibility
        seed = 3407,
        # Directory to save the output models and logs
        output_dir = "outputs",
    ),
)


Map (num_proc=2):   0%|          | 0/7000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


## Training the dataset




In [None]:
trainer_stats = trainer.train()

**** Unsloth: Please use our fixed gradient_accumulation_steps by updating transformers, TRL and Unsloth!
`pip install --upgrade --no-cache-dir unsloth git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/trl.git`


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 7,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040
[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Step,Training Loss
1,4.5839
2,4.087
3,4.1819
4,3.8635
5,2.8175
6,2.5199
7,2.0887
8,2.0464
9,1.806
10,1.4477


---
### **Saving Your Fine-Tuned Model Locally**

First click on files, then mount drive to connect google drive! Create a folder and replace with path below

In [None]:
model.save_pretrained("/content/drive/MyDrive/l3_finagent/l3_finagent_step60") # Local saving
tokenizer.save_pretrained("/content/drive/MyDrive/l3_finagent/l3_finagent_step60")

('/content/drive/MyDrive/l3_finagent/l3_finagent_step60/tokenizer_config.json',
 '/content/drive/MyDrive/l3_finagent/l3_finagent_step60/special_tokens_map.json',
 '/content/drive/MyDrive/l3_finagent/l3_finagent_step60/tokenizer.json')

### **Function for Loading Your New Model Later**


In [None]:
# Redefining prompt if importing without training
ft_prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Below is a user question, paired with retrieved context. Write a response that appropriately answers the question,
include specific details in your response. <|eot_id|>

<|start_header_id|>user<|end_header_id|>

### Question:
{}

### Context:
{}

<|eot_id|>

### Response: <|start_header_id|>assistant<|end_header_id|>
{}"""

if True: # switch to true to load model back up
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "/content/drive/MyDrive/l3_finagent/l3_finagent_step60", # Path into where you saved your model
        max_seq_length = 2048, # Existing arguments from when we loaded earlier
        dtype = None,
        load_in_4bit = True,
    )
    FastLanguageModel.for_inference(model)

==((====))==  Unsloth 2024.10.6: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


### **Setting Up Functions for Running Inference**


In [None]:
# Main Inference Function, handles generating and decoding tokens
def inference(question, context):
  inputs = tokenizer(
  [
      ft_prompt.format(
          question,
          context,
          "", # output - leave this blank for generation!
      )
  ], return_tensors = "pt").to("cuda")

  # Generate tokens for the input prompt using the model, with a maximum of 64 new tokens.
  # The `use_cache` parameter enables faster generation by reusing previously computed values.
  # The `pad_token_id` is set to the EOS token to handle padding properly.
  outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True, pad_token_id=tokenizer.eos_token_id)
  response = tokenizer.batch_decode(outputs) # Decoding tokens into english words
  return response

In [None]:
# Function for extracting just the language model generation from the full response
def extract_response(text):
    text = text[0]
    start_token = "### Response: <|start_header_id|>assistant<|end_header_id|>"
    end_token = "<|eot_id|>"

    start_index = text.find(start_token) + len(start_token)
    end_index = text.find(end_token, start_index)

    if start_index == -1 or end_index == -1:
        return None

    return text[start_index:end_index].strip()

In [None]:
# Testing it out!
context = "The increase in research and development expense for fiscal year 2023 was primarily driven by increased compensation, employee growth, engineering development costs, and data center infrastructure."
question = "What were the primary drivers of the notable increase in research and development expenses for fiscal year 2023?"

resp = inference(question, context)
parsed_response = extract_response(resp)
print(parsed_response)

The notable increase in research and development expenses in fiscal year 2023 was primarily driven by increased compensation, employee growth, engineering development costs, and data center infrastructure.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

MessageError: Error: credential propagation was unsuccessful

In [None]:
print(resp)

['<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>\nBelow is a user question, paired with retrieved context. Write a response that appropriately answers the question,\ninclude specific details in your response. <|eot_id|>\n\n<|start_header_id|>user<|end_header_id|>\n\n### Question:\nWhat were the primary drivers of the notable increase in research and development expenses for fiscal year 2023?\n\n### Context:\nThe increase in research and development expense for fiscal year 2023 was primarily driven by increased compensation, employee growth, engineering development costs, and data center infrastructure.\n\n<|eot_id|>\n\n### Response: <|start_header_id|>assistant<|end_header_id|>\nThe notable increase in research and development expenses in fiscal year 2023 was primarily driven by increased compensation, employee growth, engineering development costs, and data center infrastructure.<|eot_id|>']


---
# **Part 2: Setting Up SEC 10-K Data Pipeline & Retrieval Functionality**



### **Function For 10-K Retrieval**


In [None]:
# Extract Filings Function
def get_filings(ticker):
    global sec_api_key

    # Finding Recent Filings with QueryAPI
    queryApi = QueryApi(api_key=sec_api_key)
    query = {
      "query": f"ticker:{ticker} AND formType:\"10-K\"",
      "from": "0",
      "size": "1",
      "sort": [{ "filedAt": { "order": "desc" } }]
    }
    filings = queryApi.get_filings(query)

    # Getting 10-K URL
    filing_url = filings["filings"][0]["linkToFilingDetails"]

    # Extracting Text with ExtractorAPI
    extractorApi = ExtractorApi(api_key=sec_api_key)
    onea_text = extractorApi.get_section(filing_url, "1A", "text") # Section 1A - Risk Factors
    seven_text = extractorApi.get_section(filing_url, "7", "text") # Section 7 - Management’s Discussion and Analysis of Financial Condition and Results of Operations

    # Joining Texts
    combined_text = onea_text + "\n\n" + seven_text

    return combined_text

### **Setting Up Embeddings Locally**



In [None]:
# HF Model Path
modelPath = "BAAI/bge-large-en-v1.5"
# Create a dictionary with model configuration options, specifying to use the cuda for GPU optimization
model_kwargs = {'device':'cuda'}
encode_kwargs = {'normalize_embeddings': True}

# Initialize an instance of LangChain's HuggingFaceEmbeddings with the specified parameters
embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,     # Provide the pre-trained model's path
    model_kwargs=model_kwargs, # Pass the model configuration options
    encode_kwargs=encode_kwargs # Pass the encoding options
)

  embeddings = HuggingFaceEmbeddings(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

### **Processing & Defining the Vector Database**



In [None]:
# Prompt the user to input the stock ticker they want to analyze
ticker = input("What Ticker Would you Like to Analyze? ex. AAPL: ")

print("-----")
print("Getting Filing Data")
# Retrieve the filing data for the specified ticker
filing_data = get_filings(ticker)

print("-----")
print("Initializing Vector Database")
# Initialize a text splitter to divide the filing data into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,         # Maximum size of each chunk
    chunk_overlap = 500,       # Number of characters to overlap between chunks
    length_function = len,     # Function to determine the length of the chunks
    is_separator_regex = False # Whether the separator is a regex pattern
)
# Split the filing data into smaller, manageable chunks
split_data = text_splitter.create_documents([filing_data])

# Create a FAISS vector database from the split data using embeddings
db = FAISS.from_documents(split_data, embeddings)

# Create a retriever object to search within the vector database
retriever = db.as_retriever()

print("-----")
print("Filing Initialized")


What Ticker Would you Like to Analyze? ex. AAPL: AAPL
-----
Getting Filing Data
-----
Initializing Vector Database
-----
Filing Initialized



### **Retrieval**



In [None]:
# Retrieval Function
def retrieve_context(query):
    global retriever
    retrieved_docs = retriever.invoke(query) # Invoke the retriever with the query to get relevant documents
    context = []
    for doc in retrieved_docs:
        context.append(doc.page_content) # Collect the content of each retrieved document
    return context

In [None]:
context = retrieve_context("How have currency fluctuations impacted the company's net sales and gross margins?")
print(context)

['The Company&#8217;s financial performance is subject to risks associated with changes in the value of the U.S. dollar relative to local currencies. \n\nThe Company&#8217;s primary exposure to movements in foreign exchange rates relates to non&#8211;U.S. dollar&#8211;denominated sales, cost of sales and operating expenses worldwide. Gross margins on the Company&#8217;s products in foreign countries and on products that include components obtained from foreign suppliers have in the past been adversely affected and could in the future be materially adversely affected by foreign exchange rate fluctuations.', 'The weakening of foreign currencies relative to the U.S. dollar adversely affects the U.S. dollar value of the Company&#8217;s foreign currency&#8211;denominated sales and earnings, and generally leads the Company to raise international pricing, potentially reducing demand for the Company&#8217;s products. In some circumstances, for competitive or other reasons, the Company may deci

---
# **Main Script: Putting it All Together!**


In [None]:
while True:
  question = input(f"What would you like to know about {ticker}'s form 10-K? ")
  if question == "exit":
    break
  else:
    context = retrieve_context(question) # Context Retrieval
    resp = inference(question, context) # Running Inference
    parsed_response = extract_response(resp) # Parsing Response
    print(f"L3 Agent: {parsed_response}")
    print("-----\n")


What would you like to know about AAPL's form 10-K? What region contributes most to international sales?
L3 Agent: Asia
-----

What would you like to know about AAPL's form 10-K? why apple phones are costlier in india?
L3 Agent: Apple iPhones are costlier in India due to high import duties, taxes, and other levies imposed by the Indian government.
-----

What would you like to know about AAPL's form 10-K? where can i get iphones for cheaper price ?
L3 Agent: Some carriers offering cellular network service for the company's products provide financing, installment payment plans or subsidies for users' purchases of the device.
-----

What would you like to know about AAPL's form 10-K? exit
L3 Agent: The Company may invest in new business strategies or acquisitions, which involve risks and uncertainties, including the potential for distraction of management from current operations, greater-than-expected liabilities and expenses, and economic, political, legal, and regulatory challenges.
--

What region contributes most to international sales?  
Where is outsourcing located currently?  
Does the US dollar weakening help or hurt the company?  
What are significant announcements of products during fiscal year 2023?  
iPhone Net Sales?
