# Assignment 1: Solving 3x Problems (in the Financial Space)

# Requirement:
- `GROQ_API_KEY` in colab secrets
- internet connection to run and install necessary pip (e.g. OpenAI)
- this code is to run in google colab and not in local ipython. Dependency on some pre-installed packages such as yfinance, pandas etc

# About Me

I am currently a Data Scientist Intern at Monetary Authority Singapore (MAS) or Central Bank of Singapore under the Fintech & Innovation Group. The uses cases that are very close to my work or industry revolves around Finance and Trading. These are the three identified use cases which i think we can use Prompt Engineering with LLMs for:

1.   Automated Analysis of Stock Performance
2.   Financial Data Extraction from Earnings Call Transcript
3.   Sentiment Analysis of FED Chair FOMC on Interest Rate Decision

In [None]:
#@title Pip Install Dependencies & Mounting of Gdrive & Others

In [None]:
! pip install OpenAI

Collecting OpenAI
  Downloading openai-1.35.14-py3-none-any.whl (328 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/328.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m327.7/328.5 kB[0m [31m12.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.5/328.5 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from OpenAI)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->OpenAI)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->OpenAI)
  Downloading h11-0.14.0-py3-non

In [None]:
import openai
from google.colab import userdata
import torch
import pandas as pd
import yfinance as yf

import json
import re

#check gpu device and assign it to device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [None]:
#@title Get Historical data of Stock Price (e.g. Tesla (TSLA), Apple (AAPL), Nvidia (NVDA)); Each Stock for each Input Sample for Problem #1

list_of_ticker = ['TSLA','AAPL','NVDA']

#get past 365 days data equivalent to 1y
data = pd.DataFrame()
for ticker in list_of_ticker:
  temp_df = yf.Ticker(ticker).history(period="1y")
  temp_df.reset_index(inplace=True)
  temp_df['Ticker'] = ticker
  #temp_df.rename(columns={'Date':'date', 'Open':'open', 'High':'high', 'Low':'low', 'Close':'close', 'Volume':'volume'}, inplace=True)
  data = pd.concat([data, temp_df])

print('Stock Price extracted: ',data['Ticker'].unique())
data.head()

Stock Price extracted:  ['TSLA' 'AAPL' 'NVDA']


Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
0,2023-07-17 00:00:00-04:00,286.630005,292.230011,283.570007,290.380005,131569600,0.0,0.0,TSLA
1,2023-07-18 00:00:00-04:00,290.149994,295.26001,286.01001,293.339996,112434700,0.0,0.0,TSLA
2,2023-07-19 00:00:00-04:00,296.040009,299.290009,289.519989,291.26001,142355400,0.0,0.0,TSLA
3,2023-07-20 00:00:00-04:00,279.559998,280.929993,261.200012,262.899994,175158300,0.0,0.0,TSLA
4,2023-07-21 00:00:00-04:00,268.0,268.0,255.800003,260.019989,161050100,0.0,0.0,TSLA


In [None]:
#@title Initialise and Check Available LLM Model(s)
#_base_url = None

# if using groq
_base_url = "https://api.groq.com/openai/v1"
_model = "llama3-70b-8192"
_api_key = "GROQ_API_KEY"

# if using openai
# _base_url = None
# _model = "gpt-3.5-turbo"
# _api_key = "OPENAI_API_KEY"

client = openai.OpenAI(
    api_key=userdata.get(_api_key),
    base_url=_base_url,
)

print("Available Models:")
for model in client.models.list():
  print(f"- {model.id}")

print()
print("Chosen Model:", _model)

Available Models:
- gemma2-9b-it
- gemma-7b-it
- llama3-70b-8192
- llama3-8b-8192
- llama3-groq-70b-8192-tool-use-preview
- llama3-groq-8b-8192-tool-use-preview
- mixtral-8x7b-32768
- whisper-large-v3

Chosen Model: llama3-70b-8192


Among the three use cases I have identified, using Whisper-large-v3 is not suitable as I am not working with speech data. Instead, I have reviewed the following models: Gemma by Google, Llama by Meta, and Mixtral. Based on my review and references, I have decided to use Llama for the following reasons:

Given its superior performance, versatility, and relevance to my specific use cases, I will utilize Llama 3 70B for all three tasks. This decision is backed by its demonstrated success in multiple benchmarks and its high ranking on the Leaderboard against other models like Gemma and Mixtral detailed in the article by Samir (2023),  ["Llama 3 70B significantly outperforms other models in understanding and generating human language across diverse scenarios"](https://medium.com/@samir20/mistral-7b-vs-llama-3-70b-vs-gemma-2-9b-a-comprehensive-benchmark-showdown-9c3128f24b23#:~:text=Llama%203%2070B%20significantly%20outperforms,human%20language%20across%20diverse%20scenarios.&text=In%20the%20GPQA%20benchmark%2C%20which,70B%20again%20leads%20the%20pack) &
[Hugging Face. (n.d.) Gemma Leaderboard Ranking]( https://huggingface.co/blog/gemma)

In [None]:
'''
Helpful utilities
'''
def get_completion(prompt, system = None, model=_model, max_tokens=100, temperature=0.7):
    messages = []
    if system is not None and isinstance(system, str):
      messages.append({"role": "system", "content": system})
    messages.append({"role": "user", "content": prompt})
    response = client.chat.completions.create(messages=messages,
                                              model=model,
                                              max_tokens=max_tokens,
                                              temperature=temperature)
    return response.choices[0].message.content

def get_completion_for_messages(messages, model=_model, max_tokens=100, temperature=0.7):
    response = client.chat.completions.create(messages=messages,
                                              model=model,
                                              max_tokens=max_tokens,
                                              temperature=temperature)
    return response.choices[0].message.content

# Problem 1: Automated Analysis of Stock Performance

`Problem Statement`: Traders and analysts spend X amount of time analyzing individual stock's market trend for the past N days and consolidating these information into a report is very time-consuming/manual.

`Proposed Solution`: Instead of manual analysis, provide the historical trading data and a few example prompts to generate the analysis of the stock performance based on past N days.

`Proposed Method`: Few Shot Prompting as discussed in Class

`LLM used`: LLAMA3 70B

In [None]:
# System message to set the context for the assistant | this is the role assignment
_system_message = "You are a quant financial analyst. Given the historical data of stock prices and trading volume, analyze the price action and volume patterns for the recent days. Return in a JSON format"

def get_promptTemplate_for_problem_one(ticker, system_message=_system_message):
  data_ticker = data[data['Ticker'] == ticker]
  historical_data_str = data_ticker.tail(5).to_string(index=False)

  # Providing examples for inclusion in the prompt
  usecase_one_examples = """
  1. Date: 2023-01-01
    Analysis: On this day, Tesla's stock opened at $800, reached a high of $820, and closed at $815 with a significant increase in volume. The price action suggests strong bullish momentum as the stock managed to close near the high of the day.

  2. Date: 2023-01-02
    Analysis: On this day, Tesla's stock opened at $815, reached a high of $825, but closed lower at $810. The high volume and closing lower than the open indicate potential selling pressure despite the initial bullish move.
  """
  # Create the In-context learning Prompt
  prompt_template = f"""
  ###Historical Data###
  {historical_data_str}

  ###Examples###
  {usecase_one_examples}

  ###Outputs###
  JSON Output only
  """
  print("====== Expected Prompt to input into Model ======")
  print(system_message)
  print(prompt_template)

  return prompt_template

def post_processing_problemOne(response):
  # Use a regular expression to extract the JSON string
  json_match = re.search(r'\[\s*{.*}\s*\]', response, re.DOTALL)

  if json_match:
      json_string = json_match.group(0)
  else:
      print("No valid JSON found in the response.")
      json_string = "not valid JSON found"
  return json_string

In [None]:
#@title ==== Problem 1:Automated Stock Trading Strategy Generation - Input Sample #1 Tesla Historical Data====
ticker = 'TSLA'
tsla_pt = get_promptTemplate_for_problem_one(ticker)

You are a quant financial analyst. Given the historical data of stock prices and trading volume, analyze the price action and volume patterns for the recent days. Return in a JSON format

  ###Historical Data###
                       Date       Open       High        Low      Close    Volume  Dividends  Stock Splits Ticker
2024-07-10 00:00:00-04:00 262.799988 267.589996 257.859985 263.260010 128519400        0.0           0.0   TSLA
2024-07-11 00:00:00-04:00 263.299988 271.000000 239.649994 241.029999 221707300        0.0           0.0   TSLA
2024-07-12 00:00:00-04:00 235.800003 251.839996 233.089996 248.229996 155694400        0.0           0.0   TSLA
2024-07-15 00:00:00-04:00 255.970001 265.600006 251.729996 252.639999 146912900        0.0           0.0   TSLA
2024-07-16 00:00:00-04:00 255.309998 258.619995 245.800003 256.559998 126135900        0.0           0.0   TSLA

  ###Examples###
  
  1. Date: 2023-01-01
    Analysis: On this day, Tesla's stock opened at $800, reached a high

In [None]:
'''
max_token is the hyperparameter
'''
#set temperature to 0 for consistentcy
generated_analysis  = get_completion(tsla_pt,
                                     system=_system_message,
                                     max_tokens=600,
                                     temperature=0)
#print(generated_analysis)
df_output =  pd.DataFrame(json.loads(post_processing_problemOne(generated_analysis)))
df_output['Ticker'] = ticker
df_output
#df_output.to_csv('./tsla_analysis.csv') #<-----------uncomment if needed to save result into csv

Unnamed: 0,Date,Analysis,Ticker
0,2024-07-10,"On this day, Tesla's stock opened at $262.80, ...",TSLA
1,2024-07-11,"On this day, Tesla's stock opened at $263.30, ...",TSLA
2,2024-07-12,"On this day, Tesla's stock opened at $235.80, ...",TSLA
3,2024-07-15,"On this day, Tesla's stock opened at $255.97, ...",TSLA
4,2024-07-16,"On this day, Tesla's stock opened at $255.31, ...",TSLA


In [None]:
#@title ==== Problem 1:Automated Stock Trading Strategy Generation - Input Sample #2 Apple Historical Data====
ticker = 'AAPL'
aapl_pt = get_promptTemplate_for_problem_one(ticker)

You are a quant financial analyst. Given the historical data of stock prices and trading volume, analyze the price action and volume patterns for the recent days. Return in a JSON format

  ###Historical Data###
                       Date       Open       High        Low      Close   Volume  Dividends  Stock Splits Ticker
2024-07-10 00:00:00-04:00 229.300003 233.080002 229.250000 232.979996 62627700        0.0           0.0   AAPL
2024-07-11 00:00:00-04:00 231.389999 232.389999 225.770004 227.570007 64710600        0.0           0.0   AAPL
2024-07-12 00:00:00-04:00 228.919998 232.639999 228.679993 230.539993 53008200        0.0           0.0   AAPL
2024-07-15 00:00:00-04:00 236.479996 237.229996 233.089996 234.399994 62631300        0.0           0.0   AAPL
2024-07-16 00:00:00-04:00 235.000000 236.270004 232.330002 234.820007 43176800        0.0           0.0   AAPL

  ###Examples###
  
  1. Date: 2023-01-01
    Analysis: On this day, Tesla's stock opened at $800, reached a high of $8

In [None]:
'''
max_token is the hyperparameter
'''
#set temperature to 0 for consistentcy
generated_analysis  = get_completion(aapl_pt,
                                     system=_system_message,
                                     max_tokens=600,
                                     temperature=0)
#print(generated_analysis)
df_output = pd.DataFrame(json.loads(post_processing_problemOne(generated_analysis)))
df_output['Ticker'] = ticker
df_output
#df_output.to_csv('./aapl_analysis.csv') #<-----------uncomment if needed to save result into csv

Unnamed: 0,Date,Analysis,Ticker
0,2024-07-10,"On this day, AAPL's stock opened at 229.30, re...",AAPL
1,2024-07-11,"On this day, AAPL's stock opened at 231.39, re...",AAPL
2,2024-07-12,"On this day, AAPL's stock opened at 228.92, re...",AAPL
3,2024-07-15,"On this day, AAPL's stock opened at 236.48, re...",AAPL
4,2024-07-16,"On this day, AAPL's stock opened at 235.00, re...",AAPL


In [None]:
#@title ==== Problem 1:Automated Stock Trading Strategy Generation - Input Sample #3 Nvida Historical Data====
ticker = 'NVDA'
nvda_pt = get_promptTemplate_for_problem_one(ticker)

You are a quant financial analyst. Given the historical data of stock prices and trading volume, analyze the price action and volume patterns for the recent days. Return in a JSON format

  ###Historical Data###
                       Date       Open       High        Low      Close    Volume  Dividends  Stock Splits Ticker
2024-07-10 00:00:00-04:00 134.029999 135.100006 132.419998 134.910004 248978600        0.0           0.0   NVDA
2024-07-11 00:00:00-04:00 135.750000 136.149994 127.050003 127.400002 374782700        0.0           0.0   NVDA
2024-07-12 00:00:00-04:00 128.259995 131.919998 127.220001 129.240005 252103100        0.0           0.0   NVDA
2024-07-15 00:00:00-04:00 130.559998 131.389999 127.180000 128.440002 208326200        0.0           0.0   NVDA
2024-07-16 00:00:00-04:00 128.440002 129.039993 124.580002 126.360001 214057700        0.0           0.0   NVDA

  ###Examples###
  
  1. Date: 2023-01-01
    Analysis: On this day, Tesla's stock opened at $800, reached a high

In [None]:
'''
max_token is the hyperparameter
'''
#set temperature to 0 for consistentcy
generated_analysis  = get_completion(nvda_pt,
                                     system=_system_message,
                                     max_tokens=600,
                                     temperature=0)
#print(generated_analysis)
df_output = pd.DataFrame(json.loads(post_processing_problemOne(generated_analysis)))
df_output['ticker'] = ticker
df_output
#df_output.to_csv('./nvda_analysis.csv') #<-----------uncomment if needed to save result into csv

Unnamed: 0,Date,Analysis,ticker
0,2024-07-10,"On this day, NVDA's stock opened at 134.029999...",NVDA
1,2024-07-11,"On this day, NVDA's stock opened at 135.750000...",NVDA
2,2024-07-12,"On this day, NVDA's stock opened at 128.259995...",NVDA
3,2024-07-15,"On this day, NVDA's stock opened at 130.559998...",NVDA
4,2024-07-16,"On this day, NVDA's stock opened at 128.440002...",NVDA


# Problem 2:  Financial Data Extraction from Earnings Call Transcript

`Problem Statement`: Analysts spend X amount of time analyzing earnings call transcript to extract the financial numbers (e.g. Current Revenue, Forecasted Outlook, etc) into a report which is time-consuming/manual. It is tedious because typical Earnings call could last for hours and it is not easy to review
for every segment.

`Proposed Solution`: Instead of manual analysis, provide the earnings call transcript (e.g. from Nvidia) and prompt for the data in a certain structured format (e.g. JSON) which the analyst could work on later.

`Methods Explored`: Zero Shot Prompting as discussed in Class, and then later, finalise to use Few Shot Prompting as discussed in Class

`LLM used`: Llama3 70B

In [None]:
#@title ===Input Sample 2.1: NVIDA Earnings Call Transcript | Zero Shot===
earnings_call = """
Simona Jankowski welcomed everyone to NVIDIA's third quarter earnings call Colette Kress, the executive vice president and chief financial officer, reported that revenue was $5.93 billion, down 12% sequentially and 17% year-on-year. Data center revenue was up 1% sequentially and 31% year-on-year, driven by leading U.S. cloud providers and a broadening set of consumer Internet companies. Gaming revenue was down 23% sequentially and 51% year-on-year due to channel inventory corrections and challenging external conditions. The new Ada Lovelace GPU architecture had an exceptional launch, with the first RTX 4090 becoming available in mid-October.

Colette Kress, NVIDIA's CFO, stated that the sell-through rate for their gaming business was relatively solid in Q3 and expected to be stronger in Q4 due to the upcoming holidays and continued adoption of ADA. Jen-Hsun Huang, NVIDIA's CEO, added that their data center business is indexed to two fundamental dynamics: general purpose computing no longer scaling, and accelerated computing being recognized as the path forward.

NVIDIA's dynamic computing environment is driven by two main forces: general purpose computing and AI. General purpose computing is focused on power efficiency and cost efficiency, while AI is focused on productivity. NVIDIA is making excellent progress in its AI enterprise software stack, which is now available in the cloud and can be accessed through either a GPU instance hour or a software license. The company also took an inventory charge of $702 million in the quarter due to expected changes in demand for China data centers.

Jen-Hsun, I wanted to ask a question about your data center business and the growth outlook. You mentioned that you're seeing strong demand for accelerated computing and AI. Can you just talk a little bit more about what you're seeing in terms of the demand environment? And then, as it relates to the inventory charge, can you just talk a little bit more about what drove that decision? We are changing the way we report our data center business to better reflect the complexity of our customers and their needs. We will now be breaking out hyperscale and cloud purchases separately to provide more insight into the demand for our products. Additionally, we will be providing more color on large installations that we are seeing in the hyperscale space. NVIDIA reported strong Q4 2021 earnings, driven by strong demand for its GPUs. The company's CEO, Jen-Hsun Huang, discussed the strong adoption of NVIDIA GPUs among Internet service companies and cloud computing providers. He also noted that blockchain is not expected to be a major part of the company's business in the future. CFO Colette Kress discussed supply constraints and stock-based compensation.
"""

prompt = f"""
{earnings_call}
"""

# System message to set the context for the assistant | this is the role assignment
system_message = "You are a financial analyst. Extract all the financial data from the following earnings call transcript. Return in a json format:"

print("====== Expected Prompt to input into Model ======")
print(system_message)
print(prompt)

You are a financial analyst. Extract all the financial data from the following earnings call transcript. Return in a json format:


Simona Jankowski welcomed everyone to NVIDIA's third quarter earnings call Colette Kress, the executive vice president and chief financial officer, reported that revenue was $5.93 billion, down 12% sequentially and 17% year-on-year. Data center revenue was up 1% sequentially and 31% year-on-year, driven by leading U.S. cloud providers and a broadening set of consumer Internet companies. Gaming revenue was down 23% sequentially and 51% year-on-year due to channel inventory corrections and challenging external conditions. The new Ada Lovelace GPU architecture had an exceptional launch, with the first RTX 4090 becoming available in mid-October.

Colette Kress, NVIDIA's CFO, stated that the sell-through rate for their gaming business was relatively solid in Q3 and expected to be stronger in Q4 due to the upcoming holidays and continued adoption of ADA. Jen-Hsu

In [None]:
# Use Case Two Output
'''
max_token is the hyperparameter
'''
generated_analysis  = get_completion(prompt,
                                     system=system_message,
                                     max_tokens=600)
print(generated_analysis )

Here is the financial data extracted from the earnings call transcript in JSON format:

```
{
  "Revenue": {
    "Q3": 5930000000,
    "Sequential Change": -12,
    "Year-over-Year Change": -17
  },
  "Data Center Revenue": {
    "Sequential Change": 1,
    "Year-over-Year Change": 31
  },
  "Gaming Revenue": {
    "Sequential Change": -23,
    "Year-over-Year Change": -51
  },
  "Inventory Charge": 702000000
}
```

Note: The revenue figures are in millions of USD. The sequential and year-over-year changes are in percentages.


Evaluation of zero-shot:

> Why is the response Not-so-Good: The json format returned through the zero-shot doesn't seem to be formatted to a financial analysis standpoint. I do not need Invnetory Charge. Rather, I need more information of the Business Segment Performance Reason found in transcript.

> Why is the response Good: Surprisingly QoQ was understood even though the earnings call was using the term 'Sequentially' instead

So I will attempt to use few shot to guide the return of response instead.


In [None]:
#@title ===Input Sample 2.1: NVIDA Earnings Call Transcript | Few Shot===
usecase_two_examples = """[{"Business Segment":"", "Revenue":"", "Year-on-Year Change":"", "Quarter-on-Quarter Change":"", "Reason":""}]"""

#my second attempt to improve from the previous
prompt = f"""
{earnings_call}

{usecase_two_examples}
"""
# System message to set the context for the assistant | this is the role assignment
system_message = "You are a financial analyst. Extract business segment financial data from the following earnings call transcript, excluding total revenue. If numbers cannot be found, replace with 'Not Mentioned'. Return in a JSON format:"

print("====== Expected Prompt to input into Model ======")
print(system_message)
print(prompt)

You are a financial analyst. Extract business segment financial data from the following earnings call transcript, excluding total revenue. If numbers cannot be found, replace with 'Not Mentioned'. Return in a JSON format:


Simona Jankowski welcomed everyone to NVIDIA's third quarter earnings call Colette Kress, the executive vice president and chief financial officer, reported that revenue was $5.93 billion, down 12% sequentially and 17% year-on-year. Data center revenue was up 1% sequentially and 31% year-on-year, driven by leading U.S. cloud providers and a broadening set of consumer Internet companies. Gaming revenue was down 23% sequentially and 51% year-on-year due to channel inventory corrections and challenging external conditions. The new Ada Lovelace GPU architecture had an exceptional launch, with the first RTX 4090 becoming available in mid-October.

Colette Kress, NVIDIA's CFO, stated that the sell-through rate for their gaming business was relatively solid in Q3 and expec

In [None]:
'''
max_token is the hyperparameter
'''
generated_analysis  = get_completion(prompt,
                                     system=system_message,
                                     max_tokens=600, temperature=0)

#print(generated_analysis)
#can leverage the post processing function used from Problem #1 in this Problem #2 as well
df_output =  pd.DataFrame(json.loads(post_processing_problemOne(generated_analysis)))
df_output['Company'] = 'NVDA Earnings Call'
df_output

Unnamed: 0,Business Segment,Revenue,Year-on-Year Change,Quarter-on-Quarter Change,Reason,Company
0,iPhone,3957,3,2,Strong demand for the iPhone 13 series,NVDA Earnings Call
1,Mac,824,-10,-5,Supply chain constraints and the transition to...,NVDA Earnings Call
2,iPad,722,-14,-10,Reduced demand following the surge in sales du...,NVDA Earnings Call
3,"Wearables, Home, and Accessories",876,5,4,Strong sales of the Apple Watch and AirPods,NVDA Earnings Call
4,Services,1960,12,3,Increasing adoption of our subscription services,NVDA Earnings Call


In [None]:
#@title ===Input Sample 2.2: APPLE Earnings Call Transcript | Few Shot===

earnings_call = """Tim Cook:
"Welcome to Apple's Q3 earnings call. We reported a total revenue of $81.43 billion, up 2% year-on-year. Our performance in various business segments is as follows:
Luca Maestri:
iPhone: Revenue was $39.57 billion, up 3% year-on-year and 2% sequentially. This growth was driven by strong demand for the iPhone 13 series.
Mac: Revenue was $8.24 billion, down 10% year-on-year and down 5% sequentially due to supply chain constraints and the transition to new models.
iPad: Revenue was $7.22 billion, down 14% year-on-year and down 10% sequentially, primarily due to reduced demand following the surge in sales during the previous year's remote work and learning periods.
Wearables, Home, and Accessories: Revenue was $8.76 billion, up 5% year-on-year and 4% sequentially, driven by strong sales of the Apple Watch and AirPods.
Services: Revenue was $19.60 billion, up 12% year-on-year and 3% sequentially, reflecting the increasing adoption of our subscription services.
We continue to see strong performance across our Services and Wearables segments, while facing challenges in our Mac and iPad segments due to ongoing supply chain issues."
"""

usecase_two_examples = """[{"Business Segment": "", "Revenue": "", "Year-on-Year Change": "%", "Quarter-on-Quarter Change": "%", "Reason": ""},
{"Business Segment": "", "Revenue": "", "Year-on-Year Change": "%", "Quarter-on-Quarter Change": "%", "Reason": ""}]"""

#my second attempt to improve from the previous
prompt = f"""
{earnings_call}

{usecase_two_examples}
"""
# System message to set the context for the assistant | this is the role assignment
system_message = "You are a financial analyst. Extract business segment financial data from the following earnings call transcript, excluding total revenue. If numbers cannot be found, replace with 'Not Mentioned'. Return in a JSON format:"

print("====== Expected Prompt to input into Model ======")
print(system_message)
print(prompt)


You are a financial analyst. Extract business segment financial data from the following earnings call transcript, excluding total revenue. If numbers cannot be found, replace with 'Not Mentioned'. Return in a JSON format:

Tim Cook:
"Welcome to Apple's Q3 earnings call. We reported a total revenue of $81.43 billion, up 2% year-on-year. Our performance in various business segments is as follows:
Luca Maestri:
iPhone: Revenue was $39.57 billion, up 3% year-on-year and 2% sequentially. This growth was driven by strong demand for the iPhone 13 series.
Mac: Revenue was $8.24 billion, down 10% year-on-year and down 5% sequentially due to supply chain constraints and the transition to new models.
iPad: Revenue was $7.22 billion, down 14% year-on-year and down 10% sequentially, primarily due to reduced demand following the surge in sales during the previous year's remote work and learning periods.
Wearables, Home, and Accessories: Revenue was $8.76 billion, up 5% year-on-year and 4% sequential

In [None]:
'''
max_token is the hyperparameter
'''
generated_analysis  = get_completion(prompt,
                                     system=system_message,
                                     max_tokens=600,
                                     temperature=0)
#print(generated_analysis)
#can leverage the post processing function used from Problem #1 in this Problem #2 as well
df_output =  pd.DataFrame(json.loads(post_processing_problemOne(generated_analysis)))
df_output['Company'] = 'Appl Earnings Call'
df_output

Unnamed: 0,Business Segment,Revenue,Year-on-Year Change,Quarter-on-Quarter Change,Reason,Company
0,iPhone,3957,3,2,Strong demand for the iPhone 13 series,Appl Earnings Call
1,Mac,824,-10,-5,Supply chain constraints and the transition to...,Appl Earnings Call
2,iPad,722,-14,-10,Reduced demand following the surge in sales du...,Appl Earnings Call
3,"Wearables, Home, and Accessories",876,5,4,Strong sales of the Apple Watch and AirPods,Appl Earnings Call
4,Services,1960,12,3,Increasing adoption of our subscription services,Appl Earnings Call


In [None]:
#@title ===Input Sample 2.3: TESLA Earnings Call Transcript | Few Shot===

usecase_two_examples = """[{"Business Segment": "", "Revenue": "", "Year-on-Year Change": "%", "Quarter-on-Quarter Change": "%", "Reason": ""},
{"Business Segment": "", "Revenue": "", "Year-on-Year Change": "%", "Quarter-on-Quarter Change": "%", "Reason": ""}]"""

earnings_call = """
"Elon Musk: Welcome to Tesla's Q2 earnings call. We reported a total revenue of $24.93 billion, up 47% compared to the same period last year.
Our performance in various business segments is as follows. Automotive Sales revenue reached $21.13 billion, which is a 50% increase year-over-year and a 10% rise quarter-on-quarter,
driven by robust demand for our Model 3 and Model Y vehicles. Automotive Leasing revenue totaled $579 million, reflecting a 30% year-on-year
growth and a 5% sequential increase, showing the growing popularity of our leasing options.
Energy Generation and Storage revenue stood at $1.53 billion, up 40% year-over-year and 12% quarter-on-quarter,
thanks to higher deployments of our solar and energy storage products. Services and Other revenue amounted to $1.67 billion,
up 20% from the previous year and 8% from the prior quarter, due to increased service revenues and used car sales. We continue to observe strong performance across our automotive and energy segments,
while our services segment is also showing steady growth.".
"""

#my second attempt to improve from the previous
prompt = f"""
{earnings_call}

{usecase_two_examples}
"""

# System message to set the context for the assistant | this is the role assignment
system_message = "You are a financial analyst. Extract all the financial data from the following earnings call transcript. Return in a json format:"

print("====== Expected Prompt to input into Model ======")
print(system_message)
print(prompt)

You are a financial analyst. Extract all the financial data from the following earnings call transcript. Return in a json format:


"Elon Musk: Welcome to Tesla's Q2 earnings call. We reported a total revenue of $24.93 billion, up 47% compared to the same period last year.
Our performance in various business segments is as follows. Automotive Sales revenue reached $21.13 billion, which is a 50% increase year-over-year and a 10% rise quarter-on-quarter,
driven by robust demand for our Model 3 and Model Y vehicles. Automotive Leasing revenue totaled $579 million, reflecting a 30% year-on-year
growth and a 5% sequential increase, showing the growing popularity of our leasing options.
Energy Generation and Storage revenue stood at $1.53 billion, up 40% year-over-year and 12% quarter-on-quarter,
thanks to higher deployments of our solar and energy storage products. Services and Other revenue amounted to $1.67 billion,
up 20% from the previous year and 8% from the prior quarter, due to incre

In [None]:
'''
max_token is the hyperparameter
'''
generated_analysis  = get_completion(prompt,
                                     system=system_message,
                                     max_tokens=600, temperature=0)
#print(generated_analysis)
#can leverage the post processing function used from Problem #1 in this Problem #2 as well
df_output =  pd.DataFrame(json.loads(post_processing_problemOne(generated_analysis)))
df_output['Company'] = 'Tesla Earnings Call'
df_output

Unnamed: 0,Business Segment,Revenue,Year-on-Year Change,Quarter-on-Quarter Change,Reason,Company
0,Automotive Sales,21130,50,10.0,Robust demand for Model 3 and Model Y vehicles,Tesla Earnings Call
1,Automotive Leasing,579,30,5.0,Growing popularity of leasing options,Tesla Earnings Call
2,Energy Generation and Storage,1530,40,12.0,Higher deployments of solar and energy storage...,Tesla Earnings Call
3,Services and Other,1670,20,8.0,Increased service revenues and used car sales,Tesla Earnings Call
4,Total Revenue,24930,47,,,Tesla Earnings Call


# Problem 3:  Sentiment Analysis of Fed Powell Meeting on Interest Rate Decision

`Problem Statement`: The influence of Fed Powell on interest rate decision can drive the Global Economy. The challenge here is to interpret the sentiments of the speech given if it is dovish, hawkish or neutral. The study of the transcript from the Post FOMC meeting is very much needed in order to interpret the likelihood of Stock Movememnt Decision. This is typically done manually.

`Proposed Solution`: Instead of manual analysis, Determine the overall sentiment of financial transcripts, identifying whether each transcript exhibits a dovish, hawkish or neutral sentiment based on its content.

`Methods Explored`: Zero Shot as discussed in class

`LLM used`: Llama3 70B

In [None]:
#@title ===Input Sample 3.1: Transcript #1 | Zero Shot===

transcript = """Good afternoon. My colleagues and I remain committed to supporting the recovery of the U.S. economy and ensuring that the benefits of growth are broadly shared. Recent economic indicators suggest a steady pace of growth, with real GDP increasing by 2.1% in the fourth quarter. Consumer spending has been resilient, and the labor market continues to strengthen, with the unemployment rate at 3.8% and job gains averaging 250,000 per month over the last three months.
Inflation remains subdued, with the PCE price index rising by 1.8% over the 12 months ending in December, and core inflation at 1.6%. While we remain vigilant in monitoring inflationary pressures, current data suggest that inflation is well-contained and below our longer-term goal of 2%.
Given these conditions, the Federal Open Market Committee (FOMC) has decided to maintain the current target range for the federal funds rate at 1.25% to 1.50%. We believe that the current stance of monetary policy is appropriate to support continued economic growth and job creation while keeping inflation near our symmetric 2% objective.
In addition, we will continue to support the economy through our asset purchase programs, maintaining our holdings of Treasury securities and agency mortgage-backed securities at their current levels. These measures will help ensure that financial conditions remain accommodative, supporting lending and investment.
We recognize that the economic outlook remains uncertain, particularly given the ongoing challenges posed by the global economic environment and trade tensions. As such, we are prepared to use our full range of tools to support the economy as needed. This includes lowering interest rates if warranted by economic conditions and providing forward guidance to ensure that markets and the public understand our policy intentions.
Looking ahead, we expect the economy to continue growing at a moderate pace, supported by strong consumer spending, a robust labor market, and accommodative financial conditions. We remain committed to achieving our dual mandate of maximum employment and stable prices and will continue to adjust our policies as necessary to support these goals.
In conclusion, we are confident that our current policy stance is well-positioned to support continued economic growth and stability. We will remain flexible and responsive to changes in the economic outlook, ready to act as needed to sustain the expansion. Thank you, and I look forward to your questions."""

prompt = f"""{transcript}"""

# System message to set the context for the assistant | this is the role assignment
#system_message = """Analyze the sentiment (Hawkish or Neutral) of the following transcript sentence by sentence. For each sentence, provide your reasoning. Lastly, provide an overall sentiment for the entire transcript based on your analysis."""
system_message = "Given the transcript below, analyze if the overall sentiment is Dovish, Hawkish or Neutral. Pick out those main sentences which reflects your sentiments."

print("====== Expected Prompt to input into Model ======")
print(system_message)
print(prompt)

Given the transcript below, analyze if the overall sentiment is Dovish, Hawkish or Neutral. Pick out those main sentences which reflects your sentiments.
Good afternoon. My colleagues and I remain committed to supporting the recovery of the U.S. economy and ensuring that the benefits of growth are broadly shared. Recent economic indicators suggest a steady pace of growth, with real GDP increasing by 2.1% in the fourth quarter. Consumer spending has been resilient, and the labor market continues to strengthen, with the unemployment rate at 3.8% and job gains averaging 250,000 per month over the last three months.
Inflation remains subdued, with the PCE price index rising by 1.8% over the 12 months ending in December, and core inflation at 1.6%. While we remain vigilant in monitoring inflationary pressures, current data suggest that inflation is well-contained and below our longer-term goal of 2%.
Given these conditions, the Federal Open Market Committee (FOMC) has decided to maintain th

In [None]:
generated_analysis  = get_completion(prompt,
                                     system=system_message,
                                     max_tokens=2048,
                                     temperature=0)
print(generated_analysis)

I would classify the sentiment of this transcript as Dovish. The main sentences that reflect this sentiment are:

* "We believe that the current stance of monetary policy is appropriate to support continued economic growth and job creation while keeping inflation near our symmetric 2% objective."
* "We will continue to support the economy through our asset purchase programs, maintaining our holdings of Treasury securities and agency mortgage-backed securities at their current levels."
* "We are prepared to use our full range of tools to support the economy as needed. This includes lowering interest rates if warranted by economic conditions..."
* "We remain committed to achieving our dual mandate of maximum employment and stable prices and will continue to adjust our policies as necessary to support these goals."

These sentences suggest that the speaker is more focused on supporting economic growth and job creation, and is willing to maintain an accommodative monetary policy stance to 

In [None]:
#@title ===Input Sample 3.2: Transcript #2 | Zero Shot===
transcript = "Good afternoon. My colleagues and I are strongly committed to bringing inflation back down to our 2% goal. We have both the tools that we need and the resolve it will take to restore price stability on behalf of American families and businesses. Price stability is the responsibility of the Federal Reserve and serves as the bedrock of our economy. Without price stability, the economy does not work for anyone. In particular, without price stability, we will not achieve a sustained period of strong labor market conditions that benefit all. Today, the FOMC raised our policy interest rate by 75 basis points and we continue to anticipate that ongoing increases will be appropriate. We are moving our policy stance purposefully to a level that will be sufficiently restrictive to return inflation to 2%. In addition, we are continuing the process of significantly reducing the size of our balance sheet. Restoring price stability will likely require maintaining a restrictive stance of policy for some time. I will have more to say about today's monetary policy actions after briefly reviewing economic developments. The U.S. economy has slowed significantly from last year's rapid pace. Real GDP rose at a pace of 2.6% last quarter but is unchanged so far this year. Recent indicators point to modest growth of spending and production this quarter. Growth in consumer spending has slowed from last year's rapid pace in part reflecting lower real disposable income and tighter financial conditions. Activity in the housing sector has weakened significantly, largely reflecting higher mortgage rates. Higher interest rates and slower output growth also appear to be weighing on business fixed investment. Despite the slowdown in growth, the labor market remains extremely tight with the unemployment rate at a 50-year low. Job vacancies still very high and wage growth elevated. Job gains have been robust with employment rising by an average of 289,000 jobs per month over August and September. Although job vacancies have moved below their highs and the pace of job gains has slowed from earlier in the year, the labor market continues to be out of balance, with demand substantially exceeding the supply of available workers. The labor force participation rate is little changed since the beginning of the year. Inflation remains well above our longer run goal of 2%. Over the 12 months ending in September, total PCE prices rose at 6.2%, excluding the volatile food and energy categories, core PCE prices rose at 5.1%. The recent inflation data again have come in higher than expected. Price pressures remained evident across a broad range of goods and services. Russia's war against Ukraine has boosted prices for energy and food and has created additional upward pressure on inflation. Despite elevated inflation, longer term inflation expectations appear to remain well anchored, as reflected in a broad range of surveys of households, businesses, and forecasters, as well as measures from financial markets. That is not grounds for complacency. The longer the current amount of high inflation continues, the greater the chance that expectations of higher inflation will become entrenched. The Fed's monetary policy actions are guided by our mandate to promote maximum employment and stable prices for the American people. My colleagues and I are acutely aware that high inflation imposes significant hardship and is at erode's purchasing power, especially for those least able to meet the higher costs of essentials like food, housing, and transportation. We are highly attentive to the risks that high inflation poses to both sides of our mandate, and we're strongly committed to returning inflation to our 2% objective. At today's meeting, the committee raised the target range for the federal funds rate by 75 basis points, and we are continuing the process of significantly reducing the size of our balance sheet, which plays an important role in firming the stance of monetary policy. With today's action, we've raised interest rates by 3 and 3-quarters percentage points this year. We anticipate that ongoing increases in the target range for the federal funds rate will be appropriate in order to attain a stance of monetary policy that is sufficiently restrictive to return inflation to 2% over time. Financial conditions have tightened significantly in response to our policy actions, and we are seeing the effects on demand in the most interest rate sensitive sectors of the economy, such as housing. It will take time, however, for the full effects of monetary restraint to be realized, especially on inflation. That's why we say in our statement that in determining the pace of future increases in the target range, we will take into account the cumulative tightening of monetary policy and the lags with which monetary policy affects economic activity and inflation. At some point, as I've said in the last two press conferences, it will become appropriate to slow the pace of increases, as we approach the level of interest rates that will be sufficiently restrictive to bring inflation down to our 2% goal. There is significant uncertainty around that level of interest rates. Even so, we still have some ways to go, and incoming data since our last meeting suggests that the ultimate level of interest rates will be higher than previously expected. Our decisions will depend on the totality of incoming data and their implications for the outlook for economic activity and inflation. We will continue to make our decisions meeting by being and communicate our thinking as clearly as possible. We're taking forceful steps to moderate demand so that it comes into better alignment with supply. Our overarching focus is using our tools to bring inflation back down to our 2% goal and to keep longer term inflation expectations well anchored. Reducing inflation is likely to require a sustained period of below trend growth and some softening of labor market conditions. Restoring price stability is essential to set the stage for achieving maximum employment and stable prices in the longer run. The historical record caution strongly against prematurely loosening policy. We will stay the course until the job is done. To conclude, we understand that our actions affect communities, families and businesses across the country. Everything we do is in service to our public mission. We at the Fed will do everything we can to achieve our maximum employment and price stability goals. Thank you and I look forward to your questions."

prompt = f"""{transcript}"""

# System message to set the context for the assistant | this is the role assignment
#system_message = """Analyze the sentiment (Hawkish or Neutral) of the following transcript sentence by sentence. For each sentence, provide your reasoning. Lastly, provide an overall sentiment for the entire transcript based on your analysis."""
system_message = "Given the transcript below, analyze if the overall sentiment is Dovish, Hawkish or Neutral. Pick out those main sentences which reflects your sentiments."

print("====== Expected Prompt to input into Model ======")
print(system_message)
print(prompt)

Given the transcript below, analyze if the overall sentiment is Dovish, Hawkish or Neutral. Pick out those main sentences which reflects your sentiments.
Good afternoon. My colleagues and I are strongly committed to bringing inflation back down to our 2% goal. We have both the tools that we need and the resolve it will take to restore price stability on behalf of American families and businesses. Price stability is the responsibility of the Federal Reserve and serves as the bedrock of our economy. Without price stability, the economy does not work for anyone. In particular, without price stability, we will not achieve a sustained period of strong labor market conditions that benefit all. Today, the FOMC raised our policy interest rate by 75 basis points and we continue to anticipate that ongoing increases will be appropriate. We are moving our policy stance purposefully to a level that will be sufficiently restrictive to return inflation to 2%. In addition, we are continuing the proces

In [None]:
generated_analysis  = get_completion(prompt,
                                     system=system_message,
                                     max_tokens=2048,
                                     temperature=0)
print(generated_analysis)

The overall sentiment of this transcript is Hawkish. The main sentences that reflect this sentiment are:

* "We are moving our policy stance purposefully to a level that will be sufficiently restrictive to return inflation to 2%."
* "Restoring price stability will likely require maintaining a restrictive stance of policy for some time."
* "We anticipate that ongoing increases in the target range for the federal funds rate will be appropriate in order to attain a stance of monetary policy that is sufficiently restrictive to return inflation to 2% over time."
* "Reducing inflation is likely to require a sustained period of below trend growth and some softening of labor market conditions."
* "We will stay the course until the job is done."

These sentences indicate a strong commitment to fighting inflation and bringing it back down to the 2% target, even if it requires maintaining a restrictive monetary policy stance for an extended period. The tone is firm and resolute, with a focus on a

In [None]:
#@title ===Input Sample 3.3: Transcript #3 | Zero Shot===

transcript = """"Good afternoon. The Federal Reserve is committed to using its full range of tools to support the U.S. economy in this challenging time, thereby promoting our maximum employment and price stability goals. The COVID-19 pandemic is causing tremendous human and economic hardship across the United States and around the world. Economic activity has contracted sharply, and millions of Americans have lost their jobs. The unprecedented nature of the pandemic requires an unprecedented policy response.
To this end, the Federal Open Market Committee (FOMC) has decided to keep the target range for the federal funds rate at 0 to 0.25 percent. We expect to maintain this target range until we are confident that the economy has weathered recent events and is on track to achieve our maximum employment and price stability goals.
In addition, we are continuing our purchases of Treasury securities and agency mortgage-backed securities in the amounts needed to support smooth market functioning, thereby fostering effective transmission of monetary policy to broader financial conditions. These purchases help ensure that credit flows to households and businesses.
We are also taking a number of additional actions to provide stability and support to the economy. We have established lending programs to support the flow of credit to households, businesses, and municipalities. These programs serve as backstops to key credit markets, thereby enhancing their ability to function effectively in times of stress and bolstering the flow of credit in the economy.
We recognize that these are extraordinary times and the road ahead is highly uncertain. The path of the economy will depend significantly on the course of the virus and the measures undertaken to control its spread. We are committed to using our tools to their fullest until the crisis has passed and the economic recovery is well underway.
Looking ahead, we will closely monitor economic and financial developments and will assess the timing and size of adjustments to our monetary policy stance as appropriate. Our overarching goal is to ensure that the recovery is as robust as possible and that we achieve our maximum employment and price stability mandates.
In conclusion, the Federal Reserve stands ready to do whatever it takes to support the American economy through this difficult period. We will continue to use our full range of tools to ensure that the recovery is strong and sustainable. Thank you, and I look forward to your questions."""

prompt = f"""{transcript}"""

# System message to set the context for the assistant | this is the role assignment
#system_message = """Analyze the sentiment (Hawkish or Neutral) of the following transcript sentence by sentence. For each sentence, provide your reasoning. Lastly, provide an overall sentiment for the entire transcript based on your analysis."""
system_message = "Given the transcript below, analyze if the overall sentiment is Dovish, Hawkish or Neutral. Pick out those main sentences which reflects your sentiments."

print("====== Expected Prompt to input into Model ======")
print(system_message)
print(prompt)

Given the transcript below, analyze if the overall sentiment is Dovish, Hawkish or Neutral. Pick out those main sentences which reflects your sentiments.
"Good afternoon. The Federal Reserve is committed to using its full range of tools to support the U.S. economy in this challenging time, thereby promoting our maximum employment and price stability goals. The COVID-19 pandemic is causing tremendous human and economic hardship across the United States and around the world. Economic activity has contracted sharply, and millions of Americans have lost their jobs. The unprecedented nature of the pandemic requires an unprecedented policy response.
To this end, the Federal Open Market Committee (FOMC) has decided to keep the target range for the federal funds rate at 0 to 0.25 percent. We expect to maintain this target range until we are confident that the economy has weathered recent events and is on track to achieve our maximum employment and price stability goals.
In addition, we are con

In [None]:
generated_analysis  = get_completion(prompt,
                                     system=system_message,
                                     max_tokens=2048,
                                     temperature=0)
print(generated_analysis)

The overall sentiment of this transcript is Dovish.

The main sentences that reflect this sentiment are:

* "We expect to maintain this target range [of 0 to 0.25 percent] until we are confident that the economy has weathered recent events and is on track to achieve our maximum employment and price stability goals." (This indicates a commitment to keeping interest rates low for an extended period.)
* "We are continuing our purchases of Treasury securities and agency mortgage-backed securities in the amounts needed to support smooth market functioning, thereby fostering effective transmission of monetary policy to broader financial conditions." (This suggests a willingness to maintain accommodative monetary policy to support the economy.)
* "We are committed to using our tools to their fullest until the crisis has passed and the economic recovery is well underway." (This emphasizes the Fed's commitment to providing support to the economy until the recovery is well established.)
* "In co