<a href="https://colab.research.google.com/github/realmistic/PythonInvest-basic-fin-analysis/blob/master/yt_videos_colabs/PythonInvest_com_6_OpenAI's_for_News_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

>**Goals:**
* *Task:* Explore ChatGPT programmatically
* *Use-case:* Summarize a massive set of news (5000 articles per week).
* *Why*: Gain insights, understand market sentiment, and make informed investment decisions.

* ***Result***: AI-generated news summaries available at: https://pythoninvest.com/#weekly-fin-news-feed
* ***Full Article***: "Leveraging OpenAI's API for Financial News Summarization
": https://pythoninvest.com/long-read/chatgpt-api-for-financial-news-summarization

* ***How to Support Us:***
  * *Please react:* Like, comment, subscribe, and share your thoughts
  * *Read other blog articles*:  Explore more insights on our website https://pythoninvest.com/blog
  * *Make a Donation*: Support PythonInvest on BuyMeACoffee by becoming a [regular member](https://www.buymeacoffee.com/pythoninvest/membership) or contributing to a specific idea from our [Wishlist](https://www.buymeacoffee.com/pythoninvest/wishlist) : https://www.buymeacoffee.com/pythoninvest  


---


>**Plan:**
  * *Colab Env*: storing secrets in Google Drive
  * *API hygiene:* Rate Limits, Access Keys, other Errors handling
  * *Polygon.io News API:*
    * Ticker: One Stock vs. Market
    * Title vs. Description
    * Chunking
  * *ChatGPT's API*:
    * Model Selection (GPT-3.5 Turbo in most cases, GPT-4 after the first payment). Tokens, prompts, messages, and usage.
    * API call example for one stock
    * Few Shots Prompting for a large set of news
    * Use-case for Summarizer: https://platform.openai.com/examples/default-meeting-notes-summarizer

---

>**Pre-reads:**
  * [7 Jul'23] "Lean how to work with the GPT-35-Turbo and GPT-4 Models": https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions  
  * [6 Jul'23] "GPT-4-API-general-availability (8k tokens)" https://openai.com/blog/gpt-4-api-general-availability
  * [July 2023] "But no access to GPT-4-32k tokens model yet" https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4
  * [14 Jun'23] "OpenAI GPT-3.5 Turbo Model": https://cobusgreyling.medium.com/openai-16k-context-3-5-turbo-model-1ebd979041dc
  * [11 Jun'20] REST APIs usage : https://www.nylas.com/blog/use-python-requests-module-rest-apis/
  * Task Examples on OpenAI: https://platform.openai.com/examples
  * OpenAI docs 3.5 (Chat Completions API): https://platform.openai.com/docs/guides/gpt/chat-completions-api
  * API reference: https://platform.openai.com/docs/api-reference/completions/create
  * Open AI customer stories: https://openai.com/customer-stories

---

#0) Imports and Env Variables for APIs

In [None]:
!pip install openai

Collecting openai
  Downloading openai-0.27.9-py3-none-any.whl (75 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/75.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m71.7/75.5 kB[0m [31m2.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.5/75.5 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.27.9


In [None]:
# use local Google Drive to store/retrive API KEYS
# https://stackoverflow.com/questions/66631333/how-do-i-set-environment-variables-in-google-colab
!pip install colab-env --upgrade


Collecting colab-env
  Downloading colab-env-0.2.0.tar.gz (4.7 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting python-dotenv<1.0,>=0.10.0 (from colab-env)
  Downloading python_dotenv-0.21.1-py3-none-any.whl (19 kB)
Building wheels for collected packages: colab-env
  Building wheel for colab-env (setup.py) ... [?25l[?25hdone
  Created wheel for colab-env: filename=colab_env-0.2.0-py3-none-any.whl size=3805 sha256=bc261702e6b486bd9b979e25a5701073f0c816d8ddbe5dd8dcf1693a9a0e75c6
  Stored in directory: /root/.cache/pip/wheels/ae/36/4f/466c2cd4db5d08f317893a920c4a0f58a81459ee3bdb136d35
Successfully built colab-env
Installing collected packages: python-dotenv, colab-env
Successfully installed colab-env-0.2.0 python-dotenv-0.21.1


In [None]:
import json
import requests
import numpy as np
import pandas as pd

import datetime
from datetime import datetime, timezone #https://stackoverflow.com/questions/796008/cant-subtract-offset-naive-and-offset-aware-datetimes

import random

import os
import openai
import time
import textwrap #for wrapping the output string

In [None]:
# Get Env. Variable for Colab
  # https://github.com/apolitical/colab-env/blob/master/colab_env_testbed.ipynb
import colab_env

print(f'Colab Env. version: {colab_env.__version__}')

Mounted at /content/gdrive
Colab Env. version: 0.2.0


In [None]:
# Test this env. variable
colab_env.envvar_handler.add_env("TEST", "HELLO WORLD!", overwrite=True)
os.getenv("TEST")

'HELLO WORLD!'

In [None]:
# register your keys
# colab_env.envvar_handler.add_env("OPENAI_API_KEY", <your key>, overwrite=True)
# colab_env.envvar_handler.add_env("POLYGON_API_KEY", <your key>, overwrite=True)

In [None]:
# API for openAI, polygonNews
    # Usage: https://platform.openai.com/account/usage
    # Pricing: https://openai.com/pricing

#Import your API keys from the Env. file on Drive/System, or simply type-in it here
POLYGON_API_KEY = os.getenv("POLYGON_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# 1) POLYGON NEWS
Import 5k latest fin. news via 5 calls on the Free API tier: https://polygon.io/docs/stocks/get_v2_reference_news


In [None]:
from datetime import datetime, timezone
datetime.now(timezone.utc).strftime("%Y-%m-%d")

'2023-08-28'

In [None]:
# The exact time of running the Colab
now = datetime.utcnow().isoformat(sep=' ', timespec='milliseconds')
now_right_format = "T".join(now.split(' '))
now_right_format

'2023-08-28T20:54:24.085'

In [None]:
# https://polygon.io/docs/stocks/get_v2_reference_news
# https://polygon.io/blog/api-pagination-patterns/
# API CALL : # https://api.polygon.io/v2/reference/news?order=desc&limit=1000&sort=published_utc&apiKey=<your key> or POLYGON_API_KEY
      # need to get 200 OK status

# retrieve max 1000 news via one API call
def get_one_chunk_of_news_polygon_io(api_key = POLYGON_API_KEY, news_limit=1000, max_date = now_right_format):
  url = f"https://api.polygon.io/v2/reference/news?order=desc&limit={news_limit}&sort=published_utc&published_utc.lt={max_date}&apiKey={api_key}"

  # https://www.nylas.com/blog/use-python-requests-module-rest-apis/ - Python for rest APIs
  # try/catch for HTTP requests: https://stackoverflow.com/questions/16511337/correct-way-to-try-except-using-python-requests-module
  try:
      r = requests.get(url, timeout=3)
      r.raise_for_status()
  except requests.exceptions.HTTPError as errh:
      print ("Http Error:",errh)
  except requests.exceptions.ConnectionError as errc:
      print ("Error Connecting:",errc)
  except requests.exceptions.Timeout as errt:
      print ("Timeout Error:",errt)
  except requests.exceptions.RequestException as err:
      print ("OOps: Something Else",err)

  data = r.json()

  # https://towardsdatascience.com/how-to-convert-json-into-a-pandas-dataframe-100b2ae1e0d8
  df_nested_list = pd.json_normalize(data, record_path =['results'])
  print(f'Retrieved : {len(df_nested_list)} news; min_datetime = {df_nested_list.published_utc.min()}, max_datetime = {df_nested_list.published_utc.max()}')
  return df_nested_list

In [None]:
def get_all_news(api_calls_left = 5, api_key = POLYGON_API_KEY, news_limit=1000, max_date = now_right_format):
  all_news = None
  for i in range(api_calls_left):
    cur = get_one_chunk_of_news_polygon_io(api_key = api_key, news_limit = news_limit, max_date = max_date)
    if all_news is None:
      all_news = cur
    else:
      all_news = pd.concat([all_news,cur], ignore_index=True, axis=0) #stacking dataframes

    max_date = cur.published_utc.min() #update max_date of the news
  return all_news

In [None]:
# 5 calls per minute limit for a free account - all recent news (5000)
all_news = get_all_news()

Retrieved : 1000 news; min_datetime = 2023-08-25T14:35:00Z, max_datetime = 2023-08-28T20:32:00Z
Retrieved : 1000 news; min_datetime = 2023-08-24T11:20:22Z, max_datetime = 2023-08-25T14:33:00Z
Retrieved : 1000 news; min_datetime = 2023-08-22T21:45:14Z, max_datetime = 2023-08-24T11:19:00Z
Retrieved : 1000 news; min_datetime = 2023-08-21T13:00:51Z, max_datetime = 2023-08-22T21:45:13Z
Retrieved : 1000 news; min_datetime = 2023-08-18T10:50:00Z, max_datetime = 2023-08-21T13:00:45Z


In [None]:
all_news.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      5000 non-null   object
 1   title                   5000 non-null   object
 2   author                  5000 non-null   object
 3   published_utc           5000 non-null   object
 4   article_url             5000 non-null   object
 5   tickers                 5000 non-null   object
 6   amp_url                 3985 non-null   object
 7   image_url               5000 non-null   object
 8   description             4938 non-null   object
 9   keywords                2431 non-null   object
 10  publisher.name          5000 non-null   object
 11  publisher.homepage_url  5000 non-null   object
 12  publisher.logo_url      5000 non-null   object
 13  publisher.favicon_url   5000 non-null   object
dtypes: object(14)
memory usage: 547.0+ KB


In [None]:
all_news["publisher.name"].value_counts()

Zacks Investment Research    2098
The Motley Fool               866
GlobeNewswire Inc.            835
Benzinga                      723
MarketWatch                   293
Seeking Alpha                 131
Investing.com                  37
PennyStocks                    17
Name: publisher.name, dtype: int64

In [None]:
# Identify news count per stock (ever mentioned)
tickers = []
for el in all_news.tickers:
  tickers.extend(el)

dd= {}
for e in tickers:
  if e in dd.keys():
    dd[e]+=1
  else:
    dd[e]=1

In [None]:
# https://www.freecodecamp.org/news/sort-dictionary-by-value-in-python/
sorted_dd = sorted(dd.items(), key=lambda x:x[1], reverse=True)

In [None]:
sorted_dd[0:10]

[('NVDA', 283),
 ('AMZN', 157),
 ('AAPL', 139),
 ('MSFT', 116),
 ('TSLA', 103),
 ('GOOGL', 92),
 ('SPY', 74),
 ('GOOG', 74),
 ('AMD', 66),
 ('META', 63)]

In [None]:
all_news.head(1)

Unnamed: 0,id,title,author,published_utc,article_url,tickers,amp_url,image_url,description,keywords,publisher.name,publisher.homepage_url,publisher.logo_url,publisher.favicon_url
0,886xm7AiA-jZTkP-R0ren2x-bsHFqq2HpHg_V4f5xuY,Qorvo Statement on the Passing of Board Member...,"Qorvo, Inc.",2023-08-28T20:32:00Z,https://www.globenewswire.com/news-release/202...,[QRVO],https://www.globenewswire.com/news-release/202...,https://ml.globenewswire.com/Resource/Download...,"GREENSBORO, N.C., Aug. 28, 2023 (GLOBE NEWSW...",[Management statements],GlobeNewswire Inc.,https://www.globenewswire.com,https://s3.polygon.io/public/assets/news/logos...,https://s3.polygon.io/public/assets/news/favic...


In [None]:
# DATA CLEANING Step1: GOOGL = GOOG  # https://datagy.io/python-replace-item-in-list/
all_news['tickers_adj'] = all_news.tickers.apply(lambda x:['GOOG' if item == 'GOOGL' else item for item in x])

In [None]:
# DATA CLEANING Step2: Remove duplicates from List: https://www.geeksforgeeks.org/python-ways-to-remove-duplicates-from-list/
all_news['tickers_adj'] = all_news.tickers_adj.apply(lambda x: [*set(x)])

In [None]:
for i in range(2):
  elem = all_news.iloc[i]
  print(elem)
  print('--------')

id                              886xm7AiA-jZTkP-R0ren2x-bsHFqq2HpHg_V4f5xuY
title                     Qorvo Statement on the Passing of Board Member...
author                                                          Qorvo, Inc.
published_utc                                          2023-08-28T20:32:00Z
article_url               https://www.globenewswire.com/news-release/202...
tickers                                                              [QRVO]
amp_url                   https://www.globenewswire.com/news-release/202...
image_url                 https://ml.globenewswire.com/Resource/Download...
description               GREENSBORO, N.C., Aug.  28, 2023  (GLOBE NEWSW...
keywords                                            [Management statements]
publisher.name                                           GlobeNewswire Inc.
publisher.homepage_url                        https://www.globenewswire.com
publisher.logo_url        https://s3.polygon.io/public/assets/news/logos...
publisher.fa

In [None]:
for i in range(2):
  print(f' Title: {all_news.iloc[i].title} \n Descr: {all_news.iloc[i].description} \n Tickers ={all_news.iloc[i].tickers}')
  print('------------------------')

 Title: Qorvo Statement on the Passing of Board Member Jeffery R. Gardner 
 Descr: GREENSBORO, N.C., Aug.  28, 2023  (GLOBE NEWSWIRE) -- Qorvo, Inc. (Nasdaq: QRVO), a leading global provider of connectivity and power solutions, issued a statement today announcing the loss of board member Jeffery R. Gardner, who unexpectedly passed away on Sunday, August 27th. 
 Tickers =['QRVO']
------------------------
 Title: Ross Acquisition Corp II Receives NYSE Notice Regarding Delayed Form 10-Q Filing 
 Descr: PALM BEACH, Fla., Aug.  28, 2023  (GLOBE NEWSWIRE) -- Ross Acquisition Corp II (NYSE:ROSS) (the “Company”) announced today that it received a notice (the “Notice”) on August 22, 2023 from the NYSE Regulation staff of the New York Stock Exchange (the “NYSE”) stating that the Company is not in compliance with Section 802.01E of the NYSE Listed Company Manual (the “Rule”) because it has not timely filed its Quarterly Report on Form 10-Q for the quarter ended June 30, 2023 (the “Form 10-Q”) wit

In [None]:
# generate the dictionary for all descriptions : MARKET DATA (more than 1 ticker)
  # two regimes : only last day AND all market data from 5k news (~7 days)
def get_market_descriptions_concat(df, only_last_day = False):
  all_descs={}
  descs =[]
  used_news = 0
  unused_news = 0
  total_words_tokens = 0

  now = datetime.now(timezone.utc)
  # .date() # example: datetime.date(2023, 7, 11)
  # datetime.now(timezone.utc).strftime("%Y-%m-%d") # string form of today

  for i in range(len(df)):
    elem = df.iloc[i]
    time_hours_to_now = (now-pd.to_datetime(elem.published_utc))/pd.Timedelta(hours=1) #https://stackoverflow.com/questions/22923775/calculate-time-difference-between-two-pandas-columns-in-hours-and-minutes
    if len(elem.tickers_adj)>1:
      if not only_last_day or (only_last_day and time_hours_to_now<=24): # last 24 hours, or all days before
        news_title_desc_str = str(elem.published_utc)+'| '+ str(elem.tickers_adj)+ '| '+str(elem.title)+ '| ' +str(elem.description)
        descs.append(news_title_desc_str)
        used_news += 1
        total_words_tokens += len(news_title_desc_str.split(' '))
    else:
      unused_news += 1

  print(f'Used news for market summary = {used_news}, not used news (individual tickers) = {unused_news}, and with total_words_tokens = {total_words_tokens}')
  new_e = ('multiple_tickers', used_news)
  all_descs[new_e] = descs
  return all_descs

In [None]:
market_summary_desc_last_day = get_market_descriptions_concat(all_news, only_last_day=True);
market_summary_desc_week = get_market_descriptions_concat(all_news, only_last_day=False);

market_summary_key_last_day = list(market_summary_desc_last_day.keys())[0]
print(f'Used tickers for the last day market summary {market_summary_key_last_day}')

market_summary_key_week = list(market_summary_desc_week.keys())[0]
print(f'Used tickers for the last week market summary {market_summary_key_week}')


Used news for market summary = 231, not used news (individual tickers) = 2792, and with total_words_tokens = 12358
Used news for market summary = 2208, not used news (individual tickers) = 2792, and with total_words_tokens = 111115
Used tickers for the last day market summary ('multiple_tickers', 231)
Used tickers for the last week market summary ('multiple_tickers', 2208)


In [None]:
for i in range(min(20,market_summary_key_last_day[1])):
 print(i,': ',market_summary_desc_last_day[market_summary_key_last_day][i])

0 :  2023-08-28T20:27:00Z| ['DJIA', 'DHR', 'MMM', 'RAD', 'HE', 'COMP', 'ABCM']| S&P 500 and Nasdaq end higher, attempting to beat back worst month since December| Stocks see gains Monday as investors continue to weigh cautious comments from Federal Reserve Chairman Jerome Powell and get ready for another big data week.
1 :  2023-08-28T20:07:00Z| ['DJIA', 'COMP']| The stock market is set up for a relief rally. Don't chase the bounce, says technician.| The potential reward from chasing a stock-market bounce isn't attractive, according to Tyler Richey, co-editor at Sevens Report Research.
2 :  2023-08-28T20:01:00Z| ['FIVE', 'DG', 'DLTR']| Five Below And Dollar General Q2 Earnings Preview: Aiming To Emulate Dollar Tree's Beat| Two leading discount retailers are set to report quarterly earnings this week, aiming to emulate a peer company that recently beat analysts’ estimates.
Here’s a look at what investors should know ahead of second quarter financial results from Five Below Inc (NASDAQ: 

In [None]:
#empirical thing - as tokens are counted for input AND output - > we need some number less 16k
#  we can do less MAX_WORDS - to make sure we're below total 16k tokens used
def get_message_chunks_indexes(news_list, MAX_WORDS = 6000):
  rez = [0]
  cur_words = 0
  for i,elem in enumerate(news_list):
    cur_elem_len = len(elem.split(' '))
    if cur_words + cur_elem_len >MAX_WORDS:
      rez.append(i)
      cur_words = cur_elem_len
    else:
      cur_words += cur_elem_len
  return rez

In [None]:
chunks_day = get_message_chunks_indexes(market_summary_desc_last_day[market_summary_key_last_day])
chunks_day_for_gpt4 = get_message_chunks_indexes(market_summary_desc_last_day[market_summary_key_last_day], MAX_WORDS=3000)
chunks_week = get_message_chunks_indexes(market_summary_desc_week[market_summary_key_week])
print(f'Chunks for market summary last day: {chunks_day}')
print(f'Chunks for market summary last day for GPT-4: {chunks_day_for_gpt4}')
print(f'Chunks for market summary last week: {chunks_week}')

Chunks for market summary last day: [0, 96, 222]
Chunks for market summary last day for GPT-4: [0, 38, 96, 155, 222]
Chunks for market summary last week: [0, 96, 222, 367, 483, 608, 681, 809, 931, 1034, 1145, 1292, 1402, 1504, 1625, 1730, 1851, 1988, 2107]


In [None]:
def get_prompts_for_market_data(news_list, chunks):
  rez =[]
  if chunks == [0]: #only one element - return full list
    rez.append(";".join(news_list))
  else:
    for i in range(len(chunks)-1):
      p = news_list[chunks[i]: chunks[i+1]]
      rez.append(";".join(p))
  return rez

In [None]:
market_summary_prompts_last_day = get_prompts_for_market_data(market_summary_desc_last_day[market_summary_key_last_day],chunks_day)
market_summary_prompts_last_day_for_gpt4 = get_prompts_for_market_data(market_summary_desc_last_day[market_summary_key_last_day],chunks_day_for_gpt4)
market_summary_prompts_week = get_prompts_for_market_data(market_summary_desc_week[market_summary_key_week],chunks_week)

In [None]:
# count words in the prompt with the news:
len(market_summary_prompts_last_day[0].split(' '))

5828

In [None]:
len(market_summary_prompts_week[0].split(' '))

5828

In [None]:
# count words in the prompt with the news:
len(market_summary_prompts_last_day_for_gpt4[0].split(' '))

2943

In [None]:
# generate the dictionary for all descriptions
def get_individual_descriptions_concat(df, max_indiv_stocks = 20):
  all_descs={}
  descs =[]
  used_news = 0

  for e in sorted_dd[0:max_indiv_stocks]:
    print(e[0], e[1])
    ticker = e[0]
    descs =[]
    used_news = 0
    for i in range(len(df)):
      elem = df.iloc[i]
      if ticker in elem.tickers_adj and len(elem.tickers_adj)==1:
        descs.append(str(elem.published_utc)+ '| ' + str(elem.tickers_adj) + '| ' + str(elem.title)+ '| ' +str(elem.description))
        used_news+=1
    print(f' Ticker {e[0]}, used news = {used_news}, length news words = {len(";".join(descs).split(" "))}')
    new_e = (e[0], used_news)
    all_descs[new_e] = descs

  return all_descs

In [None]:
indiv_tickers_summary_desc = get_individual_descriptions_concat(all_news)

NVDA 283
 Ticker NVDA, used news = 52, length news words = 2476
AMZN 157
 Ticker AMZN, used news = 23, length news words = 819
AAPL 139
 Ticker AAPL, used news = 18, length news words = 621
MSFT 116
 Ticker MSFT, used news = 4, length news words = 120
TSLA 103
 Ticker TSLA, used news = 23, length news words = 1318
GOOGL 92
 Ticker GOOGL, used news = 0, length news words = 1
SPY 74
 Ticker SPY, used news = 14, length news words = 1543
GOOG 74
 Ticker GOOG, used news = 10, length news words = 364
AMD 66
 Ticker AMD, used news = 10, length news words = 386
META 63
 Ticker META, used news = 7, length news words = 241
PANW 59
 Ticker PANW, used news = 14, length news words = 655
WMT 58
 Ticker WMT, used news = 4, length news words = 245
DJIA 54
 Ticker DJIA, used news = 11, length news words = 418
COMP 52
 Ticker COMP, used news = 3, length news words = 93
AEO 51
 Ticker AEO, used news = 2, length news words = 58
DIS 46
 Ticker DIS, used news = 13, length news words = 578
TGT 46
 Ticker TGT

In [None]:
indiv_tickers_summary_desc.keys()

# 10th Jul: dict_keys([('META', 34), ('NVDA', 24), ('TSLA', 22), ('AMZN', 15), ('AAPL', 15), ('CRM', 13), ('GOOGL', 12), ('MSFT', 12), ('SPY', 11), ('AMD', 9), ('GOOG', 9), ('DJIA', 9), ('AAL', 9), ('QQQ', 8), ('F', 8), ('GM', 8), ('EPM', 8), ('WFC', 7), ('XOM', 7), ('INTC', 7)])


dict_keys([('NVDA', 52), ('AMZN', 23), ('AAPL', 18), ('MSFT', 4), ('TSLA', 23), ('GOOGL', 0), ('SPY', 14), ('GOOG', 10), ('AMD', 10), ('META', 7), ('PANW', 14), ('WMT', 4), ('DJIA', 11), ('COMP', 3), ('AEO', 2), ('DIS', 13), ('TGT', 7), ('QQQ', 1), ('CRM', 4), ('FL', 8)])

In [None]:
random.choice(list(indiv_tickers_summary_desc.keys()))

('PANW', 14)

In [None]:
# one random Ticker to analyse:
random_key = random.choice(list(indiv_tickers_summary_desc.keys()))
# random.choice(list(indiv_tickers_summary_desc.keys()))
random_ticker_news_array = indiv_tickers_summary_desc[random_key]
random_ticker_joined_news = ';'.join(random_ticker_news_array)
print(f'Selected <ticker, news_count>:{random_key}, with total length {len(random_ticker_joined_news.split(" "))}')

Selected <ticker, news_count>:('CRM', 4), with total length 251


In [None]:
# https://stackoverflow.com/questions/11418192/pandas-complex-filter-on-rows-of-dataframe
# Look in the dataframe for the news about the random_ticker
all_news.loc[all_news['tickers'].apply(lambda x: True if len(x)==1 and x[0]==random_key[0] else False)].head(1)

Unnamed: 0,id,title,author,published_utc,article_url,tickers,amp_url,image_url,description,keywords,publisher.name,publisher.homepage_url,publisher.logo_url,publisher.favicon_url,tickers_adj
274,1_GbB0PB-eSRV7_cAy5F5mYLrfD8Y64py1Q3hQbm7jg,Should You Invest in Salesforce.com (CRM) Base...,Zacks Equity Research,2023-08-28T13:30:06Z,https://www.zacks.com/stock/news/2141297/shoul...,[CRM],https://www.zacks.com/amp/stock/news/2141297/s...,https://staticx-tuner.zacks.com/images/default...,According to the average brokerage recommendat...,,Zacks Investment Research,https://www.zacks.com/,https://s3.polygon.io/public/assets/news/logos...,https://s3.polygon.io/public/assets/news/favic...,[CRM]


In [None]:
# Summary of news: ARRAY OF <ticker| news title| news description>
random_ticker_news_array

["2023-08-28T13:30:06Z| ['CRM']| Should You Invest in Salesforce.com (CRM) Based on Bullish Wall Street Views?| According to the average brokerage recommendation (ABR), one should invest in Salesforce.com (CRM). It is debatable whether this highly sought-after metric is effective because Wall Street analysts' recommendations tend to be overly optimistic. Would it be worth investing in the stock?",
 "2023-08-24T18:03:58Z| ['CRM']| What To Expect From Salesforce Q2? Analyst Sees Mixed Business Trends| Oppenheimer\xa0analyst Brian Schwartz reiterated an Outperform rating on\xa0Salesforce, Inc.\xa0(NYSE: CRM) with a\xa0price target of $235.\nThe earnings risk for CRM weighs slightly positive ahead of 2Q results despite his recent field checks pointing to mixed business trends for Salesforce and with investor expectations at higher levels for this earnings report.\xa0\nPositively, the analyst detected improvement in the demand environment and possibly stabilizing trends in the sales operati

# OPENAI.COM

In [None]:
# setting up a key
openai.api_key = OPENAI_API_KEY

# 0) Text completion several models:

In [None]:
# list of the available models - we'll use gpt-3.5-turbo-16k or gpt-4 (8k)
list_models = openai.Model.list()
gpts= [e for e in list_models['data'] if 'gpt' in e.id]
[e.root for e in gpts]

['gpt-3.5-turbo',
 'gpt-3.5-turbo-0613',
 'gpt-3.5-turbo-16k-0613',
 'gpt-4-0314',
 'gpt-4',
 'gpt-3.5-turbo-16k',
 'gpt-4-0613',
 'gpt-3.5-turbo-0301']

In [None]:
messages = [
    {'role': 'user',
     'content': f"Summarize the news: {random_ticker_joined_news}"}
  ]

In [None]:
# NEW ENDPOINT: https://platform.openai.com/docs/guides/gpt/chat-completions-api
# OLD LEGACY endpoint (until-July'23) https://platform.openai.com/docs/guides/gpt/completions-api
# https://stackoverflow.com/questions/75617865/openai-chatgpt-gpt-3-5-api-error-invalidrequesterror-unrecognized-request-a

response_turbo_16k = openai.ChatCompletion.create(
# openai.Completion.create(
  model="gpt-3.5-turbo-16k",
  messages = messages,
  temperature = 0
)

In [None]:
print(f'Command to OpenAI: {messages[0]["content"]}, len words = {len(messages[0]["content"].split(" "))}')


Command to OpenAI: Summarize the news: 2023-08-28T13:30:06Z| ['CRM']| Should You Invest in Salesforce.com (CRM) Based on Bullish Wall Street Views?| According to the average brokerage recommendation (ABR), one should invest in Salesforce.com (CRM). It is debatable whether this highly sought-after metric is effective because Wall Street analysts' recommendations tend to be overly optimistic. Would it be worth investing in the stock?;2023-08-24T18:03:58Z| ['CRM']| What To Expect From Salesforce Q2? Analyst Sees Mixed Business Trends| Oppenheimer analyst Brian Schwartz reiterated an Outperform rating on Salesforce, Inc. (NYSE: CRM) with a price target of $235.
The earnings risk for CRM weighs slightly positive ahead of 2Q results despite his recent field checks pointing to mixed business trends for Salesforce and with investor expectations at higher levels for this earnings report. 
Positively, the analyst detected improvement in the demand environment and possibly stabilizing trends in t

In [None]:
response_turbo_16k

<OpenAIObject chat.completion id=chatcmpl-7sdY0DqzpNzeNe8wMmiGJt6GEYnu8 at 0x7e2f655ce020> JSON: {
  "id": "chatcmpl-7sdY0DqzpNzeNe8wMmiGJt6GEYnu8",
  "object": "chat.completion",
  "created": 1693256088,
  "model": "gpt-3.5-turbo-16k-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Salesforce.com (CRM) is receiving bullish views from Wall Street analysts, with the average brokerage recommendation suggesting that investors should invest in the stock. However, there is debate over the effectiveness of this metric as analysts' recommendations tend to be overly optimistic. Despite mixed business trends, Oppenheimer analyst Brian Schwartz maintains an Outperform rating on CRM with a price target of $235. He notes improvement in the demand environment and possibly stabilizing trends in sales operations for the company. Salesforce.com's stock closed at $206.76, down 1.09% from the previous day. The stock has been receiving atte

In [None]:
response_turbo_16k["choices"][0]["message"]["content"]

"Salesforce.com (CRM) is receiving bullish views from Wall Street analysts, with the average brokerage recommendation suggesting that investors should invest in the stock. However, there is debate over the effectiveness of this metric as analysts' recommendations tend to be overly optimistic. Despite mixed business trends, Oppenheimer analyst Brian Schwartz maintains an Outperform rating on CRM with a price target of $235. He notes improvement in the demand environment and possibly stabilizing trends in sales operations for the company. Salesforce.com's stock closed at $206.76, down 1.09% from the previous day. The stock has been receiving attention from investors, and it is important to be aware of factors that can impact its prospects."

In [None]:
response_turbo_16k_0613 = openai.ChatCompletion.create(
# openai.Completion.create(
  model="gpt-3.5-turbo-16k-0613",
  messages = messages,
  temperature = 0
)


In [None]:
# SAME response actually
response_turbo_16k_0613

<OpenAIObject chat.completion id=chatcmpl-7sdY4FHFzBqBPdzVx6oyxhIQnkMFF at 0x7e2f64f67ce0> JSON: {
  "id": "chatcmpl-7sdY4FHFzBqBPdzVx6oyxhIQnkMFF",
  "object": "chat.completion",
  "created": 1693256092,
  "model": "gpt-3.5-turbo-16k-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Salesforce.com (CRM) is receiving bullish views from Wall Street analysts, with the average brokerage recommendation suggesting that investors should invest in the stock. However, there is debate over the effectiveness of this metric as analysts' recommendations tend to be overly optimistic. Despite mixed business trends, Oppenheimer analyst Brian Schwartz maintains an Outperform rating on CRM with a price target of $235. He notes improvement in the demand environment and possibly stabilizing trends in sales operations for the company. In the latest trading session, CRM closed at $206.76, down 1.09% from the previous day. The stock has been re

# 1) Analyse individual stocks performance

In [None]:
# https://stackoverflow.com/questions/11418192/pandas-complex-filter-on-rows-of-dataframe
all_news.loc[all_news['tickers'].apply(lambda x: True if len(x)==1 and x[0]=='TSLA' else False)].head(2)

Unnamed: 0,id,title,author,published_utc,article_url,tickers,amp_url,image_url,description,keywords,publisher.name,publisher.homepage_url,publisher.logo_url,publisher.favicon_url,tickers_adj
530,0wPdLQkg5_Wm4ivBCcU4aUhpIDjyK3oQWOAIKuw7mvw,Good News for Tesla Stock Investors,newsfeedback@fool.com (Neil Rozenbaum),2023-08-27T14:00:00Z,https://www.fool.com/investing/2023/08/27/good...,[TSLA],,https://g.foolcdn.com/editorial/images/745580/...,Here's everything you need to know about the e...,[investing],The Motley Fool,https://www.fool.com/,https://s3.polygon.io/public/assets/news/logos...,https://s3.polygon.io/public/assets/news/favic...,[TSLA]
1193,vpzWbs4kvaOhxeziIQCV5YNPz7GWOihpARqqBt8oBmo,Doors Open For Elon Musk's Tesla? India Mulls ...,Arpit Nayak,2023-08-25T12:01:46Z,https://www.benzinga.com/government/23/08/3403...,[TSLA],https://www.benzinga.com/amp/content/34038927,https://cdn.benzinga.com/files/images/story/20...,This story was first published by the Benzinga...,"[News, Government, Regulations, Rumors]",Benzinga,https://www.benzinga.com/,https://s3.polygon.io/public/assets/news/logos...,https://s3.polygon.io/public/assets/news/favic...,[TSLA]


In [None]:
# check 1 ticker inputs
for t in indiv_tickers_summary_desc.keys():
  if len(';'.join(indiv_tickers_summary_desc[t]))==0:
    continue
  print(f'---- {t} --------')
  print('\n ;'.join(indiv_tickers_summary_desc[t]))
  print('------------')
  break;

---- ('NVDA', 52) --------
2023-08-28T16:45:07Z| ['NVDA']| 3 Reasons Growth Investors Will Love Nvidia (NVDA)| Nvidia (NVDA) is well positioned to outperform the market, as it exhibits above-average growth in financials.
 ;2023-08-28T14:50:06Z| ['NVDA']| Arm IPOs on Nvidia's Success and AI Hype. Here's Why I'm Not Touching It| Nvidia wanted to acquire the company for $40 billion and failed. Arm is reported to be valued at $60 billion. In my opinion, it's worth much less.
 ;2023-08-28T13:30:06Z| ['NVDA']| Wall Street Bulls Look Optimistic About Nvidia (NVDA): Should You Buy?| Based on the average brokerage recommendation (ABR), Nvidia (NVDA) should be added to one's portfolio. Wall Street analysts' overly optimistic recommendations cast doubt on the effectiveness of this highly sought-after metric. So, is the stock worth buying?
 ;2023-08-28T13:00:11Z| ['NVDA']| Investors Heavily Search NVIDIA Corporation (NVDA): Here is What You Need to Know| Nvidia (NVDA) has received quite a bit of a

In [None]:
# https://callmefred.com/how-to-fix-openai-error-ratelimiterror-the-server-had-an-error/
# Call an OpenGPT's API with the prompt, model, and system_message_adj parameters.
#  Prompt is an input text (info about the stock news in our case), Model is the model to be used (the list of available models is obtained earlier in the code),
#  System_message_adj is what you want to ask from the System
def chat(prompt, model = "gpt-3.5-turbo-16k", system_message_adj=""):

   try:

      response = openai.ChatCompletion.create(
      model = model,
      messages = [ {"role": 'system', "content": f'Summarize the news. {system_message_adj}'},
                   {"role": 'user', "content": f'{prompt}'}]
      )

      answer = response["choices"][0]["message"]["content"]
      usage = response["usage"]["total_tokens"]

      return (answer, usage)


   except openai.error.RateLimitError as e:

      retry_time = e.retry_after if hasattr(e, 'retry_after') else 30
      print(f"Rate limit exceeded. Retrying in {retry_time} seconds...")
      time.sleep(retry_time)
      return chat(prompt)

   except openai.error.ServiceUnavailableError as e:
      retry_time = 10  # Adjust the retry time as needed
      print(f"Service is unavailable. Retrying in {retry_time} seconds...")
      time.sleep(retry_time)
      return chat(prompt)

   except openai.error.APIError as e:
      retry_time = e.retry_after if hasattr(e, 'retry_after') else 30
      print(f"API error occurred. Retrying in {retry_time} seconds...")
      time.sleep(retry_time)
      return chat(prompt)

   except OSError as e:
        retry_time = 5  # Adjust the retry time as needed
        print(f"Connection error occurred: {e}. Retrying in {retry_time} seconds...")
        time.sleep(retry_time)
        return chat(prompt)

In [None]:
indiv_tickers_summary_desc.keys()

dict_keys([('NVDA', 52), ('AMZN', 23), ('AAPL', 18), ('MSFT', 4), ('TSLA', 23), ('GOOGL', 0), ('SPY', 14), ('GOOG', 10), ('AMD', 10), ('META', 7), ('PANW', 14), ('WMT', 4), ('DJIA', 11), ('COMP', 3), ('AEO', 2), ('DIS', 13), ('TGT', 7), ('QQQ', 1), ('CRM', 4), ('FL', 8)])

In [None]:
# https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions
# About %%time magic: https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.07-Timing-and-Profiling.ipynb#scrollTo=K1vMtils6Q8N
%%time

responses = []
responses_keys = []
usages = []
for t in indiv_tickers_summary_desc.keys():
  ticker, count_news = t
  # SPY, DJIA, QQQ - are market info - we publish the stats about it anyway, as it is precious
  if count_news<10 and ticker not in ('SPY','DJIA','QQQ'):
    continue
  count_words = len(';'.join(indiv_tickers_summary_desc[t]).split(" "))
  print(f' Analyzing <stock, news_count> {t} - total count of words {count_words}')
  prompt = ';'.join(indiv_tickers_summary_desc[t])

  response, usage = chat(prompt=prompt, system_message_adj='Not more than 5 sentences.')

  print(f'.   used tokens {usage}')

  responses.append(response)
  responses_keys.append(t)
  usages.append(usage)
  time.sleep(2)

 Analyzing <stock, news_count> ('NVDA', 52) - total count of words 2476
.   used tokens 4583
 Analyzing <stock, news_count> ('AMZN', 23) - total count of words 819
.   used tokens 1673
 Analyzing <stock, news_count> ('AAPL', 18) - total count of words 621
.   used tokens 1270
 Analyzing <stock, news_count> ('TSLA', 23) - total count of words 1318
.   used tokens 2443
 Analyzing <stock, news_count> ('SPY', 14) - total count of words 1543
.   used tokens 2558
 Analyzing <stock, news_count> ('GOOG', 10) - total count of words 364
.   used tokens 772
 Analyzing <stock, news_count> ('AMD', 10) - total count of words 386
.   used tokens 819
 Analyzing <stock, news_count> ('PANW', 14) - total count of words 655
.   used tokens 1291
 Analyzing <stock, news_count> ('DJIA', 11) - total count of words 418
.   used tokens 893
 Analyzing <stock, news_count> ('DIS', 13) - total count of words 578
.   used tokens 1195
 Analyzing <stock, news_count> ('QQQ', 1) - total count of words 39
.   used tokens

In [None]:
# TOTAL COST is $0.003-$0.004 per 1K tokens https://www.geeky-gadgets.com/chatgpt-update/#:~:text=gpt%2D3.5%2Dturbo%2D16k%20will%20be%20priced%20at%20%240.003,%240.004%20per%201K%20output%20tokens.
print(f'Min-Max cost estimate is : ${np.round(sum(usages)/1000*0.003,3)} - ${np.round(sum(usages)/1000*0.004,3)}')

Min-Max cost estimate is : $0.053 - $0.071


In [None]:
rr = all_news["published_utc"].agg({"min":np.min,
               "max":np.max})
print(f' Dates for the articles min/max: {rr}')


 Dates for the articles min/max: min    2023-08-18T10:50:00Z
max    2023-08-28T20:32:00Z
Name: published_utc, dtype: object


In [None]:
min_date,max_date = pd.to_datetime(rr.to_list())
print(min_date,max_date)

2023-08-18 10:50:00+00:00 2023-08-28 20:32:00+00:00


In [None]:
import yfinance as yf

In [None]:
for (ticker,_) in responses_keys:
  print(ticker)

NVDA
AMZN
AAPL
TSLA
SPY
GOOG
AMD
PANW
DJIA
DIS
QQQ


In [None]:
# TODO: Ivan to change on Adj_Close? (read docs)
# TODO: Ivan: add 7d, 30d, 90d, 365d change
df = None
for (t,_) in responses_keys:
  data_ticker = yf.download(t, start=min_date.date())
  data_ticker['ticker'] = t
  data_ticker['change_close']= data_ticker.Close.pct_change() *100
  data_ticker['date'] = data_ticker.index
  if df is None:
    df = data_ticker
  else:
    df = pd.concat([df,data_ticker], ignore_index =True, axis=0)

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


In [None]:
data_ticker

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,ticker,change_close,date
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2023-08-18,355.26001,359.410004,354.709991,358.130005,358.130005,61119500,QQQ,,2023-08-18
2023-08-21,359.619995,364.589996,359.149994,363.899994,363.899994,50696500,QQQ,1.611144,2023-08-21
2023-08-22,366.549988,366.559998,362.679993,363.380005,363.380005,44613500,QQQ,-0.142893,2023-08-22
2023-08-23,364.579987,370.220001,364.359985,369.109985,369.109985,51770600,QQQ,1.576856,2023-08-23
2023-08-24,372.640015,372.73999,361.01001,361.220001,361.220001,66842500,QQQ,-2.13757,2023-08-24
2023-08-25,362.070007,365.73999,358.579987,364.019989,364.019989,69910900,QQQ,0.775147,2023-08-25
2023-08-28,366.98999,367.709991,364.25,366.76001,366.76001,40614469,QQQ,0.752712,2023-08-28


In [None]:
# daily returns
vect = data_ticker.fillna(0).change_close+100
vect

Date
2023-08-18    100.000000
2023-08-21    101.611144
2023-08-22     99.857107
2023-08-23    101.576856
2023-08-24     97.862430
2023-08-25    100.775147
2023-08-28    100.752712
Name: change_close, dtype: float64

In [None]:
import numpy as np
print(f'Real return in the period: {(np.prod(vect/100)-1)*100}')

Real return in the period: 2.4097408106412033


In [None]:
print(f'Approx. return in the period: {data_ticker.fillna(0).change_close.sum()}')


Approx. return in the period: 2.4353957725022246


In [None]:
ddf = df[df.date==df.date.max()]
ddf

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,ticker,change_close,date
6,470.119995,469.799988,448.880005,468.350006,468.350006,66651624,NVDA,1.775395,2023-08-28
13,133.779999,133.949997,131.850006,133.139999,133.139999,33420369,AMZN,-0.090046,2023-08-28
20,180.089996,180.585007,178.544998,180.190002,180.190002,39522558,AAPL,0.88461,2023-08-28
27,242.580002,244.380005,235.360001,238.820007,238.820007,106838699,TSLA,0.096404,2023-08-28
34,442.23999,443.399994,439.972809,442.76001,442.76001,56650963,SPY,0.634136,2023-08-28
41,132.080002,133.240005,130.850006,131.789993,131.789993,16324635,GOOG,0.841679,2023-08-28
48,103.470001,104.07,100.894402,102.610001,102.610001,53658908,AMD,0.352079,2023-08-28
55,230.770004,233.080002,228.949997,232.419998,232.419998,2012410,PANW,0.719364,2023-08-28
62,21.9,21.989901,21.889999,21.969999,21.969999,10248,DJIA,0.365463,2023-08-28
69,83.830002,84.684998,83.529999,84.160004,84.160004,11921946,DIS,0.959697,2023-08-28


In [None]:
fin_data = {x:np.round(y,2) for x, y in zip(ddf.ticker, ddf.change_close)}
fin_data
# for index, row in ddf.iterrows():
  # print(index, row[ticker], row[change_close])

{'NVDA': 1.78,
 'AMZN': -0.09,
 'AAPL': 0.88,
 'TSLA': 0.1,
 'SPY': 0.63,
 'GOOG': 0.84,
 'AMD': 0.35,
 'PANW': 0.72,
 'DJIA': 0.37,
 'DIS': 0.96,
 'QQQ': 0.75}

In [None]:
min_date = datetime.strptime(rr[0], '%Y-%m-%dT%H:%M:%S%z').strftime('%Y-%m-%d')
max_date = datetime.strptime(rr[1], '%Y-%m-%dT%H:%M:%S%z').strftime('%Y-%m-%d')

print(min_date,max_date)

2023-08-18 2023-08-28


In [None]:
# Use TextWrap for pretty print not wider than 80 characters
# https://stackoverflow.com/questions/16430200/a-good-way-to-make-long-strings-wrap-to-newline
import pprint

daily_indiv_stocks_summary = f' Start date for the articles: {min_date};'
daily_indiv_stocks_summary+= f' End date for the articles: {max_date} \n'
for (k,r) in zip(responses_keys, responses):
  ret_1d = None
  if k[0] in fin_data.keys():
    ret_1d=fin_data[k[0]]
  daily_indiv_stocks_summary+=f'NEWS SUMMARY for {k}, which changed on {ret_1d}% last trading day: \n'
  daily_indiv_stocks_summary+=r +'\n'

pprint.pprint(daily_indiv_stocks_summary.splitlines())
# print(daily_indiv_stocks_summary.splitlines())

# for line in textwrap.wrap(daily_indiv_stocks_summary,80):
#   print(line)


# textwrap.wrap(daily_indiv_stocks_summary,80)
#   print(f'NEWS SUMMARY for {k}, which changed on {fin_data[k[0]]}% last trading day:')
#   print(textwrap.fill(r.replace('\n',''),80))
#   print(' ')

# print()

[' Start date for the articles: 2023-08-18; End date for the articles: '
 '2023-08-28 ',
 "NEWS SUMMARY for ('NVDA', 52), which changed on 1.78% last trading day: ",
 'Nvidia (NVDA) has been receiving positive attention from investors and '
 "analysts. The company's strong financial performance and above-average "
 'growth make it an appealing investment option. Despite failing to acquire '
 "Arm for $40 billion, Nvidia's success and hype around AI have been cited as "
 'reasons to be optimistic about the stock. Wall Street analysts have given '
 'overly optimistic recommendations for NVDA, raising questions about the '
 'effectiveness of these recommendations. Overall, Nvidia is well-positioned '
 'for future growth and is viewed favorably by both investors and analysts.',
 "NEWS SUMMARY for ('AMZN', 23), which changed on -0.09% last trading day: ",
 'Analysts are optimistic about the future performance of Amazon (AMZN) stock, '
 'citing its above-average growth and positive earnings 

In [None]:
print(f'{daily_indiv_stocks_summary}')


 Start date for the articles: 2023-08-18; End date for the articles: 2023-08-28 
NEWS SUMMARY for ('NVDA', 52), which changed on 1.78% last trading day: 
Nvidia (NVDA) has been receiving positive attention from investors and analysts. The company's strong financial performance and above-average growth make it an appealing investment option. Despite failing to acquire Arm for $40 billion, Nvidia's success and hype around AI have been cited as reasons to be optimistic about the stock. Wall Street analysts have given overly optimistic recommendations for NVDA, raising questions about the effectiveness of these recommendations. Overall, Nvidia is well-positioned for future growth and is viewed favorably by both investors and analysts.
NEWS SUMMARY for ('AMZN', 23), which changed on -0.09% last trading day: 
Analysts are optimistic about the future performance of Amazon (AMZN) stock, citing its above-average growth and positive earnings estimate revisions. The stock has seen a strong reboun

In [None]:
r_summary, usage = chat(prompt = daily_indiv_stocks_summary,
                        system_message_adj = 'Get the summary of an overall market, stock indexes, and individual stocks - not more than 10 sentences')
print(f'Usage: {usage}')

Usage: 1662


In [None]:
pprint.pprint('TL;DR: ' + r_summary)

('TL;DR: Nvidia (NVDA) has been receiving positive attention from investors '
 'and analysts due to its strong financial performance and above-average '
 'growth. Wall Street analysts have given overly optimistic recommendations '
 'for NVDA, raising questions about the effectiveness of these '
 'recommendations. Amazon (AMZN) stock is also optimistic, with above-average '
 'growth and positive earnings estimate revisions. Apple (AAPL) is looking to '
 'increase its domestic component production in India, showing a commitment to '
 'manufacturing iPhones in the country. Tesla (TSLA) could benefit from '
 'reduced import taxes for electric vehicles in India. The Euro Stoxx 50 index '
 'introduces zero-day options and there are discussions about inflation '
 'concerns and the possibility of raising interest rates. The Delhi High Court '
 "in India has dismissed petitions to halt Google Pay's operations. AMD "
 'announces the release of two new graphics cards and showcases its enterprise 

In [None]:
print('TL;DR: ' + r_summary)

TL;DR: Nvidia (NVDA) has been receiving positive attention from investors and analysts due to its strong financial performance and above-average growth. Wall Street analysts have given overly optimistic recommendations for NVDA, raising questions about the effectiveness of these recommendations. Amazon (AMZN) stock is also optimistic, with above-average growth and positive earnings estimate revisions. Apple (AAPL) is looking to increase its domestic component production in India, showing a commitment to manufacturing iPhones in the country. Tesla (TSLA) could benefit from reduced import taxes for electric vehicles in India. The Euro Stoxx 50 index introduces zero-day options and there are discussions about inflation concerns and the possibility of raising interest rates. The Delhi High Court in India has dismissed petitions to halt Google Pay's operations. AMD announces the release of two new graphics cards and showcases its enterprise data center momentum. Palo Alto Networks (PANW) re

# 2) Market summary

In [None]:
# https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions

def do_market_news_analysis(chunks, prompts, model = "gpt-3.5-turbo-16k"):
  responses_market = []
  used_tokens = []
  print(f'Chunks of news :{chunks}')
  for i,prompt in enumerate(prompts):
    count_words = len(prompt.split(" "))
    if len(chunks)==1:
      print(f'Analyzing MARKET news in 1 chunk, total count of words:{count_words}')
    else:
      print(f' Analyzing MARKET news for chunk {i} (news range: {chunks[i],chunks[i+1]}): total count of words for a current chunk {count_words}')

    if i==0:
      system_adj_message = 'Find market expectations of future events, sentiment, and big trends. Not more than 20 sentences' #specify more details
    else:
      system_adj_message = f'Find big trends, no more than 20 bullet points or sentences, and avoid duplicated messages while prioritising the most popular and discussed news. Improve previous findings: {response} ' #specify more details

    # passing params to the downstream call
    response = chat(prompt = prompt,
                    system_message_adj = system_adj_message,
                    model=model)
    responses_market.append(response)

    time.sleep(1)
  return responses_market, used_tokens

## 1day market summary

In [None]:
# Feel the inputs to the summarization engine - manually check top 5 news: <date, tickers, title, description> for each
market_summary_desc_last_day[market_summary_key_last_day][0:5]

["2023-08-28T20:27:00Z| ['DJIA', 'DHR', 'MMM', 'RAD', 'HE', 'COMP', 'ABCM']| S&P 500 and Nasdaq end higher, attempting to beat back worst month since December| Stocks see gains Monday as investors continue to weigh cautious comments from Federal Reserve Chairman Jerome Powell and get ready for another big data week.",
 "2023-08-28T20:07:00Z| ['DJIA', 'COMP']| The stock market is set up for a relief rally. Don't chase the bounce, says technician.| The potential reward from chasing a stock-market bounce isn't attractive, according to Tyler Richey, co-editor at Sevens Report Research.",
 "2023-08-28T20:01:00Z| ['FIVE', 'DG', 'DLTR']| Five Below And Dollar General Q2 Earnings Preview: Aiming To Emulate Dollar Tree's Beat| Two leading discount retailers are set to report quarterly earnings this week, aiming to emulate a peer company that recently beat\xa0analysts’ estimates.\nHere’s a look at what investors should know ahead of second quarter financial results from Five Below Inc (NASDAQ: F

In [None]:
print(f'Chunks and len prompts: {chunks_day}, {len(market_summary_prompts_last_day[0])}')

Chunks and len prompts: [0, 96, 222], 40390


In [None]:
# how many news?
market_summary_key_last_day

('multiple_tickers', 231)

In [None]:
%%time
summary_market_one_day, used_tokens_market_one_day= do_market_news_analysis(chunks_day, market_summary_prompts_last_day)
print(used_tokens_market_one_day)

Chunks of news :[0, 96, 222]
 Analyzing MARKET news for chunk 0 (news range: (0, 96)): total count of words for a current chunk 5828
 Analyzing MARKET news for chunk 1 (news range: (96, 222)): total count of words for a current chunk 5830
[]
CPU times: user 124 ms, sys: 19.8 ms, total: 144 ms
Wall time: 20.6 s


In [None]:
%%time
summary_market_one_day_gpt4, used_tokens_market_one_day_gpt4= do_market_news_analysis(chunks_day_for_gpt4, market_summary_prompts_last_day_for_gpt4, model='gpt-4')
print(used_tokens_market_one_day_gpt4)

Chunks of news :[0, 38, 96, 155, 222]
 Analyzing MARKET news for chunk 0 (news range: (0, 38)): total count of words for a current chunk 2943
 Analyzing MARKET news for chunk 1 (news range: (38, 96)): total count of words for a current chunk 2886
 Analyzing MARKET news for chunk 2 (news range: (96, 155)): total count of words for a current chunk 2906
 Analyzing MARKET news for chunk 3 (news range: (155, 222)): total count of words for a current chunk 2925
[]
CPU times: user 1.04 s, sys: 124 ms, total: 1.17 s
Wall time: 3min 5s


In [None]:
pprint.pprint(summary_market_one_day)

[('The S&P 500 and Nasdaq ended higher on Monday, attempting to recover from '
  'their worst month since December. Investors were cautious following '
  'comments from Federal Reserve Chairman Jerome Powell and were preparing for '
  'another big week of economic data. Technician Tyler Richey warned against '
  "chasing a stock-market bounce, stating that the potential reward isn't "
  'attractive. Discount retailers Five Below and Dollar General are set to '
  "report quarterly earnings with expectations for Five Below's revenue to be "
  "$758.3 million and Dollar General's revenue to be $9.9 billion. American "
  'Airlines was fined $4.1 million for leaving thousands of passengers '
  'stranded on the tarmac. Uber Eats is preparing to roll out an AI-powered '
  'chatbot to help users decide what to eat for their next meal. Analysts have '
  'recommended three penny stocks to buy, with price targets of up to 1,494%. '
  'Hawaiian Electric and GD Culture Group saw their shares surge,

In [None]:
# last chunk of One day summary (MODEL = gpt-3.5-turbo-16k)
print(f'MARKET NEWS SUMMARY {market_summary_key_last_day} for the last 24 hours before {datetime.now(timezone.utc).strftime("%d/%m/%Y %H:%M")} UTC time:')
print(summary_market_one_day[len(summary_market_one_day)-1][0])

MARKET NEWS SUMMARY ('multiple_tickers', 231) for the last 24 hours from 28/08/2023 20:59 UTC time:
- The stock market is consolidating slightly above support levels, with investors cautious after Jerome Powell's speech at Jackson Hole and focusing on the narrative of no recession.
- Economic data has been strong, postponing the possibility of a recession for now.
- Lower income households are maintaining spending by borrowing while higher income households continue to spend as stock prices and house prices remain high.
- Excessive government borrowing and spending is providing more stimulus to the economy than expected.
- Recession probability is estimated at 45%, soft landing at 35%, and no landing at 20%.
- Nikola stock and Plug Power stock are growing rapidly in terms of revenue but are also seeing increasing losses.
- Federal Reserve Chair Jerome Powell warned of potential interest rate hikes until inflation is within the target range.
- U.S. stocks opened higher on Monday, with t

In [None]:
# last chunk of One day summary (MODEL = gpt-4)
print(f'MARKET NEWS SUMMARY {market_summary_key_last_day} for the last 24 hours before {datetime.now(timezone.utc).strftime("%d/%m/%Y %H:%M")} UTC time:')
print(summary_market_one_day_gpt4[len(summary_market_one_day_gpt4)-1][0])

MARKET NEWS SUMMARY ('multiple_tickers', 231) for the last 24 hours from 28/08/2023 20:59 UTC time:
- Consumer-centric stocks LYV, RCL, DKNG, MAR, PEP continue to show strong potential for 2023.
- Oncology treatment market is experiencing significant growth, with companies REGN, MRK, NVS and AZN taking the lead.
- CHK and LNG are fundamentally sound investments amid uncertainty within the natural gas market.
- Three biotech stocks AGEN, BCRX and NNOX highlighted as attractive buys.
- High demand trends have allegedly boosted Ciena's (CIEN) fiscal Q3 performance.
- Lennar and Quanta Services are part of the Zacks Earnings Preview article.
- Novo Nordisk stock's latest financial report suggests it's only getting started.
- Wall Street delivered mixed performances last week due to rising rates.
- Caterpillar (CAT), Applied Materials (AMAT), Toll Brothers (TOL), Walmart (WMT) and Dr. Reddy's (RDY) are solid choices for investors amid market weakness.
- XPeng (XPEV) plans to accelerate adop

PROMPT TO CHAT GPT : "compare 2 summaries by GPT-3.5-turbo-16k vs. gpt-4"
RESULT:
*
Both summaries cover a range of market events, including stock upgrades and downgrades, corporate actions, macroeconomic developments, and notable earnings reports. GPT-4 seems to provide slightly more specific details about each event and offers a bit more in-depth insights, while GPT-3.5-Turbo-16k tends to be a bit more concise and general in its descriptions.


## 1 week summary - chain of calls

In [None]:
print(f'Chunks and len prompts: {chunks_week}, {len(market_summary_prompts_week[0])}')

Chunks and len prompts: [0, 96, 222, 367, 483, 608, 681, 809, 931, 1034, 1145, 1292, 1402, 1504, 1625, 1730, 1851, 1988, 2107], 40390


In [None]:
# how many news?
print( market_summary_key_week)

('multiple_tickers', 2208)


In [None]:
len(market_summary_prompts_week)

18

In [None]:
# Monday is 0 https://docs.python.org/3/library/datetime.html#datetime.datetime.weekday
# Run only on Monday to save $$ and time
%%time
if datetime.today().weekday()==0:
  summary_market_one_week, used_tokens_market_one_week= do_market_news_analysis(chunks_week, market_summary_prompts_week)
  print(used_tokens_market_one_week)

Chunks of news :[0, 96, 222, 367, 483, 608, 681, 809, 931, 1034, 1145, 1292, 1402, 1504, 1625, 1730, 1851, 1988, 2107]
 Analyzing MARKET news for chunk 0 (news range: (0, 96)): total count of words for a current chunk 5828
 Analyzing MARKET news for chunk 1 (news range: (96, 222)): total count of words for a current chunk 5830
 Analyzing MARKET news for chunk 2 (news range: (222, 367)): total count of words for a current chunk 5819
 Analyzing MARKET news for chunk 3 (news range: (367, 483)): total count of words for a current chunk 5851
 Analyzing MARKET news for chunk 4 (news range: (483, 608)): total count of words for a current chunk 5699
 Analyzing MARKET news for chunk 5 (news range: (608, 681)): total count of words for a current chunk 5827
 Analyzing MARKET news for chunk 6 (news range: (681, 809)): total count of words for a current chunk 5872
 Analyzing MARKET news for chunk 7 (news range: (809, 931)): total count of words for a current chunk 5855
 Analyzing MARKET news for ch

In [None]:
# run only on Monday
if datetime.today().weekday()==0:
  for i,summary in enumerate(summary_market_one_week):
    print(f'Iteration {i}')
    print('--------------------')
    pprint.pprint(summary[0].replace('\n',' '))

Iteration 0
--------------------
('The S&P 500 and Nasdaq ended higher on Monday, attempting to beat back their '
 'worst month since December. Investors continue to weigh cautious comments '
 'from Federal Reserve Chairman Jerome Powell and prepare for another big data '
 'week. Technician Tyler Richey warns against chasing the stock market bounce, '
 "stating that the potential reward isn't attractive. This week, discount "
 'retailers Five Below and Dollar General are set to report their '
 'second-quarter earnings. Five Below is expected to report revenue of $758.3 '
 'million, while Dollar General is expected to report revenue of $9.9 billion. '
 'American Airlines Group Inc. was fined $4.1 million for keeping thousands of '
 'passengers sitting in planes on the tarmac for hours without a chance to '
 'deplane. Uber Eats is preparing to roll out an AI-powered chatbot that will '
 'help users decide what to eat for their next meal. Analysts have identified '
 'three penny stocks to

In [None]:
# CHECK ONLY THE LATEST ITERATION - after the last call
if datetime.today().weekday()==0:
  print(f'MARKET NEWS SUMMARY {market_summary_key_week} \nfor the period {min_date} to {max_date}: \n {summary_market_one_week[len(summary_market_one_week)-1][0]}')


MARKET NEWS SUMMARY ('multiple_tickers', 2208) 
for the period 2023-08-18 to 2023-08-28: 
 - Amphastar Pharma (AMPH) shares are up 89.1% year to date due to the acquisition of Baqsimi from Lilly and FDA approval for Naloxone.
- Capital One confirms another big sale of its office loans as fallout in the sector intensifies in the face of higher interest rates and tumbling property values.
- SpartanNash (SPTN) posts higher sales for the second quarter of 2023 on increased sales across both segments and solid comparable sales.
- Dividend payout and strong liquidity boost Canadian National (CNI).
- The Children's Place (PLCE) second-quarter fiscal 2023 results reflect a year-over-year decline in both top and bottom lines due to a tough macroeconomic environment.
- Truist Securities downgraded Crestwood Equity Partners LP (CEQP) to Hold from Buy, but analysts favor CEQP's acquisition by Energy Transfer LP (ET).
- High labor costs and weak demand-induced volume woes are hurting UPS stock sign

In [None]:
# how many tokens used (only for the last-out-of-20 call)?
if datetime.today().weekday()==0:
  print(f'Used tokens for the last iteration: {summary_market_one_week[len(summary_market_one_week)-1][1]}')


Used tokens for the last iteration: 13920


In [None]:
# Another use-case: I want to generate text description of a market summary to be used for the logo-image-generation
# generate summary to use for text-to-image generation of an Article Logo

if datetime.today().weekday()==0:
  r_summary_for_pic, usage = chat(prompt=summary_market_one_week[len(summary_market_one_week)-1][1], system_message_adj='I have a market summary of financial news in 1 week. Generate a word description of it to feed the image generator for the article head image')
  print(f'Usage: {usage}')
  print(r_summary_for_pic)

Usage: 158
Due to high volatility and economic uncertainties, the financial market experienced a rollercoaster ride in the past week. Global stock markets faced a significant decline due to concerns over inflation, rising interest rates, and potential trade tensions. Investors shifted their focus towards safe-haven assets, resulting in a surge in gold and government bond prices. Cryptocurrencies also faced a downward trend as regulatory concerns continued to weigh on the market. The week ended with a mixed sentiment as optimism regarding COVID-19 vaccinations and fiscal stimulus clashed with the fears of a potential market correction.
