### PIB CoPilot
PIBs are also used to create a pitchbook by assessing a company's strategy, competitive positioning, review of financial statements, industry dynamics, and trends within the industry. 

- News releases: News articles that may affect a company's stock price or growth prospect would be something that analysts look into, particularly within a 6-12 month time horizon.
- SEC filings: These regulatory documents require a company to file Form 10-K and Form 10-Q with the SEC on an ongoing basis. Form 10-K is a financial overview and commentary for the last year, usually found on the company's website. Form 10-Q is similar to form 10-K, but it is a report for the last quarter instead of the previous year.
- Equity research reports: Look into key forecasts for metrics like Revenue, EBITDA, and EPS for the company or competing firms to form a consensus estimate. 
- Investor Presentations: Companies provide historical information as an important foundation from which forecasts are made to guide key forecasting drivers. 
- Press Release: Can be found in the investor relations section of most companies' websites and contains the financial statements which are used in forms 10-K and 10-Q. 
- Conference calls: The same day a company issues its quarterly press release, it will also hold a conference call. On the call, analysts often learn details about management guidance. These conference calls are transcribed by several service providers and can be accessed by subscribers of large financial data providers.

In [1]:
import os  
import json  
import openai
from Utilities.envVars import *

# Set Search Service endpoint, index name, and API key from environment variables
indexName = SearchIndex

# Set OpenAI API key and endpoint
openai.api_type = "azure"
openai.api_version = OpenAiVersion
openai_api_key = OpenAiKey
assert openai_api_key, "ERROR: Azure OpenAI Key is missing"
openai.api_key = openai_api_key
openAiEndPoint = f"https://{OpenAiService}.openai.azure.com"
assert openAiEndPoint, "ERROR: Azure OpenAI Endpoint is missing"
assert "openai.azure.com" in openAiEndPoint.lower(), "ERROR: Azure OpenAI Endpoint should be in the form: \n\n\t<your unique endpoint identifier>.openai.azure.com"
openai.api_base = openAiEndPoint
davincimodel = OpenAiDavinci


In [2]:
import typing
from Utilities.fmp import *
apikey = FmpKey
symbol: str = "AAPL"
symbols: typing.List[str] = ["AAPL", "CSCO", "QQQQ"]
exchange: str = "NYSE"
exchanges: typing.List[str] = ["NYSE", "NASDAQ"]
query: str = "AA"
limit: int = 3
period: str = "quarter"
download: bool = True
filing_type: str = "10-K"

In [3]:
from datetime import datetime
from pytz import timezone
from dateutil.relativedelta import relativedelta
from datetime import timedelta
from Utilities.cogSearch import createEarningCallIndex, indexDocs, createPressReleaseIndex, createStockNewsIndex

central = timezone('US/Central')
today = datetime.now(central)
currentYear = today.year
historicalDate = today - relativedelta(years=3)
historicalYear = historicalDate.year
historicalDate = historicalDate.strftime("%Y-%m-%d")
totalYears = currentYear - historicalYear

#### Get the Earnings Call Transcript for each quarter for last 3 years

In [16]:
# For now we are calling API to get data, but otherwise we need to ensure the data is not persisted in our 
# index repository before calling again, if it is persisted then we need to delete it first
i = 0
earningsData = []
earningIndexName = 'earningcalls'
symbol = 'AMZN'
# Create the index if it does not exist
createEarningCallIndex(SearchService, SearchKey, earningIndexName)
for i in range(totalYears):
    print(f"Processing ticker : {symbol}")
    processYear = historicalYear + i
    Quarters = ['Q1', 'Q2', 'Q3', 'Q4']
    for quarter in Quarters:
        print(f"Processing year and Quarter : {processYear}-{quarter}")
        earningTranscript = earning_call_transcript(apikey=apikey, symbol=symbol, year=str(processYear), quarter=quarter)
        for transcript in earningTranscript:
            symbol = transcript['symbol']
            quarter = transcript['quarter']
            year = transcript['year']
            callDate = transcript['date']
            content = transcript['content']
            todayYmd = today.strftime("%Y-%m-%d")
            id = f"{symbol}-{year}-{quarter}-{todayYmd}"
            earningsData.append({
                "id": id,
                "symbol": symbol,
                "quarter": str(quarter),
                "year": str(year),
                "calldate": callDate,
                "content": content,
                "inserteddate": datetime.now(central).strftime("%Y-%m-%d"),
            })
# Index the documents in the earning calls index
indexDocs(SearchService, SearchKey, earningIndexName, earningsData)

Search index earningcalls already exists
Processing ticker : AMZN
Processing year and Quarter : 2020-Q1
Processing year and Quarter : 2020-Q2
Processing year and Quarter : 2020-Q3
Processing year and Quarter : 2020-Q4
Processing ticker : AMZN
Processing year and Quarter : 2021-Q1
Processing year and Quarter : 2021-Q2
Processing year and Quarter : 2021-Q3
Processing year and Quarter : 2021-Q4
Processing ticker : AMZN
Processing year and Quarter : 2022-Q1
Processing year and Quarter : 2022-Q2
Processing year and Quarter : 2022-Q3
Processing year and Quarter : 2022-Q4
Total docs: 12
	Indexed 12 sections, 12 succeeded


### Get the Press Release - Limit it to 200 and most likely that will cover for last 3 years

In [11]:
# For now we are calling API to get data, but otherwise we need to ensure the data is not persisted in our 
# index repository before calling again, if it is persisted then we need to delete it first
counter = 0
pressReleases = []
pressReleaseIndexName = 'pressreleases'
#symbol = 'AMZN'
#symbol = 'TSLA'
#symbol = 'AAPL'
symbol = 'MSFT'
# Create the index if it does not exist
createPressReleaseIndex(SearchService, SearchKey, pressReleaseIndexName)
print(f"Processing ticker : {symbol}")
pr = press_releases(apikey=apikey, symbol=symbol, limit=200)
for pressRelease in pr:
    symbol = pressRelease['symbol']
    releasedate = pressRelease['date']
    title = pressRelease['title']
    content = pressRelease['text']
    todayYmd = today.strftime("%Y-%m-%d")
    id = f"{symbol}-{todayYmd}-{counter}"
    pressReleases.append({
        "id": id,
        "symbol": symbol,
        "releasedate": releasedate,
        "title": title,
        "content": content,
        "inserteddate": datetime.now(central).strftime("%Y-%m-%d"),
    })
    counter = counter + 1

# Index the documents in the earning calls index
indexDocs(SearchService, SearchKey, pressReleaseIndexName, pressReleases)

Search index pressreleases already exists
Processing ticker : MSFT
Total docs: 200
	Indexed 200 sections, 200 succeeded


### Get Stock News - Limit it to 5000 and most likely that will cover for current year

In [7]:
# For now we are calling API to get data, but otherwise we need to ensure the data is not persisted in our 
# index repository before calling again, if it is persisted then we need to delete it first
counter = 0
stocknews = []
stockNewsIndexName = 'stocknews'
#symbol = 'AMZN'
#symbol = 'TSLA'
#symbol = 'AAPL'
symbol = 'MSFT'
# Create the index if it does not exist
createStockNewsIndex(SearchService, SearchKey, stockNewsIndexName)
print(f"Processing ticker : {symbol}")
sn = stock_news(apikey=apikey, tickers=symbol, limit=5000)
for news in sn:
    symbol = news['symbol']
    publisheddate = news['publishedDate']
    title = news['title']
    image = news['image']
    site = news['site']
    content = news['text']
    url = news['url']
    todayYmd = today.strftime("%Y-%m-%d")
    id = f"{symbol}-{todayYmd}-{counter}"
    stocknews.append({
        "id": id,
        "symbol": symbol,
        "publisheddate": publisheddate,
        "title": title,
        "image": image,
        "site": site,
        "content": content,
        "url": url,
        "inserteddate": datetime.now(central).strftime("%Y-%m-%d"),
    })
    counter = counter + 1

# Index the documents in the earning calls index
indexDocs(SearchService, SearchKey, stockNewsIndexName, stocknews)

Search index stocknews already exists
Processing ticker : MSFT
Total docs: 5000
	Indexed 1000 sections, 1000 succeeded
	Indexed 1000 sections, 1000 succeeded
	Indexed 1000 sections, 1000 succeeded
	Indexed 1000 sections, 1000 succeeded
	Indexed 1000 sections, 1000 succeeded


#### Process the SEC Filings that are stored in JSON Format

In [24]:
from Utilities.azureBlob import upsertMetadata, getBlob, getAllBlobs

def GetAllFiles():
    # Get all files in the container from Azure Blob Storage
    # Create the BlobServiceClient object
    blobList = getAllBlobs(OpenAiDocConnStr, SecDocContainer)
    files = []
    for file in blobList:
        if (file.metadata == None):
            files.append({
            "filename" : file.name,
            "embedded": "false",
            })
        else:
            files.append({
                "filename" : file.name,
                "embedded": file.metadata["embedded"] if "embedded" in file.metadata else "false",
                })
    print(f"Found {len(files)} files in the container")
    return files

In [25]:
filesData = GetAllFiles()
filesData = list(filter(lambda x : x['embedded'] == "false", filesData))
filesData = list(map(lambda x: {'filename': x['filename']}, filesData))
print(f"Found {len(filesData)} files to embed")

Found 101477 files in the container
Found 83575 files to embed


In [16]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import *
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from itertools import islice

In [17]:
def createSearchIndex(indexName):
    indexClient = SearchIndexClient(endpoint=f"https://{SearchService}.search.windows.net/",
            credential=AzureKeyCredential(SearchKey))
    if indexName not in indexClient.list_index_names():
        index = SearchIndex(
            name=indexName,
            fields=[
                        SimpleField(name="id", type=SearchFieldDataType.String, key=True),
                        SimpleField(name="cik", type=SearchFieldDataType.String),
                        SimpleField(name="company", type=SearchFieldDataType.String),
                        SimpleField(name="filing_type", type=SearchFieldDataType.String),
                        SimpleField(name="filing_date", type=SearchFieldDataType.String),
                        SimpleField(name="period_of_report", type=SearchFieldDataType.String),
                        SimpleField(name="sic", type=SearchFieldDataType.String),
                        SimpleField(name="state_of_inc", type=SearchFieldDataType.String),
                        SimpleField(name="state_location", type=SearchFieldDataType.String),
                        SimpleField(name="fiscal_year_end", type=SearchFieldDataType.String),
                        SimpleField(name="filing_html_index", type=SearchFieldDataType.String),
                        SimpleField(name="htm_filing_link", type=SearchFieldDataType.String),
                        SimpleField(name="complete_text_filing_link", type=SearchFieldDataType.String),
                        SimpleField(name="filename", type=SearchFieldDataType.String),
                        SimpleField(name="item_1", type=SearchFieldDataType.String),
                        SimpleField(name="item_1A", type=SearchFieldDataType.String),
                        SimpleField(name="item_1B", type=SearchFieldDataType.String),
                        SimpleField(name="item_2", type=SearchFieldDataType.String),
                        SimpleField(name="item_3", type=SearchFieldDataType.String),
                        SimpleField(name="item_4", type=SearchFieldDataType.String),
                        SimpleField(name="item_5", type=SearchFieldDataType.String),
                        SimpleField(name="item_6", type=SearchFieldDataType.String),
                        SimpleField(name="item_7", type=SearchFieldDataType.String),
                        SimpleField(name="item_7A", type=SearchFieldDataType.String),
                        SimpleField(name="item_8", type=SearchFieldDataType.String),
                        SimpleField(name="item_9", type=SearchFieldDataType.String),
                        SimpleField(name="item_9A", type=SearchFieldDataType.String),
                        SimpleField(name="item_9B", type=SearchFieldDataType.String),
                        SimpleField(name="item_10", type=SearchFieldDataType.String),
                        SimpleField(name="item_11", type=SearchFieldDataType.String),
                        SimpleField(name="item_12", type=SearchFieldDataType.String),
                        SimpleField(name="item_13", type=SearchFieldDataType.String),
                        SimpleField(name="item_14", type=SearchFieldDataType.String),
                        SimpleField(name="item_15", type=SearchFieldDataType.String),
                        SimpleField(name="metadata", type=SearchFieldDataType.String),
                        SearchableField(name="content", type=SearchFieldDataType.String,
                                        searchable=True, retrievable=True, analyzer_name="en.microsoft"),
                        # SearchField(name="contentVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                        #             searchable=True, dimensions=1536, vector_search_configuration="vectorConfig"),
                        SimpleField(name="sourcefile", type="Edm.String", filterable=True, facetable=True),
            ],
            vector_search = VectorSearch(
                algorithm_configurations=[
                    VectorSearchAlgorithmConfiguration(
                        name="vectorConfig",
                        kind="hnsw",
                        hnsw_parameters={
                            "m": 4,
                            "efConstruction": 400,
                            "efSearch": 500,
                            "metric": "cosine"
                        }
                    )
                ]
            ),
            semantic_settings=SemanticSettings(
                configurations=[SemanticConfiguration(
                    name='semanticConfig',
                    prioritized_fields=PrioritizedFields(
                        title_field=None, prioritized_content_fields=[SemanticField(field_name='content')]))])
        )

        try:
            print(f"Creating {indexName} search index")
            indexClient.create_index(index)
        except Exception as e:
            print(e)

In [18]:
def chunkAndEmbed(indexName, secDoc, fullPath):
    fullData = []
    text = secDoc['item_1'] + secDoc['item_1A'] + secDoc['item_7'] + secDoc['item_7A']
    text = text.replace("\n", " ")

    secCommonData = {
            "id": f"{fullPath}".replace(".", "_").replace(" ", "_").replace(":", "_").replace("/", "_").replace(",", "_").replace("&", "_"),
            "cik": secDoc['cik'],
            "company": secDoc['company'],
            "filing_type": secDoc['filing_type'],
            "filing_date": secDoc['filing_date'],
            "period_of_report": secDoc['period_of_report'],
            "sic": secDoc['sic'],
            "state_of_inc": secDoc['state_of_inc'],
            "state_location": secDoc['state_location'],
            "fiscal_year_end": secDoc['fiscal_year_end'],
            "filing_html_index": secDoc['filing_html_index'],
            "htm_filing_link": secDoc['htm_filing_link'],
            "complete_text_filing_link": secDoc['complete_text_filing_link'],
            "filename": secDoc['filename'],
            "item_1": secDoc['item_1'],
            "item_1A": secDoc['item_1A'],
            "item_1B": secDoc['item_1B'],
            "item_2": secDoc['item_2'],
            "item_3": secDoc['item_3'],
            "item_4": secDoc['item_4'],
            "item_5": secDoc['item_5'],
            "item_6": secDoc['item_6'],
            "item_7": secDoc['item_7'],
            "item_7A": secDoc['item_7A'],
            "item_8": secDoc['item_8'],
            "item_9": secDoc['item_9'],
            "item_9A": secDoc['item_9A'],
            "item_9B": secDoc['item_9B'],
            "item_10": secDoc['item_10'],
            "item_11": secDoc['item_11'],
            "item_12": secDoc['item_12'],
            "item_13": secDoc['item_13'],
            "item_14": secDoc['item_14'],
            "item_15": secDoc['item_15'],
            "content": text,
            #"contentVector": [],
            "metadata" : json.dumps({"cik": secDoc['cik'], "source": secDoc['filename'], "filingType": secDoc['filing_type'], "reportDate": secDoc['period_of_report']}),
            "sourcefile": fullPath
        }
    # Comment for now on not generating embeddings
    #secCommonData['contentVector'] = generateEmbeddings(embeddingModelType, text)
    fullData.append(secCommonData)

    searchClient = SearchClient(endpoint=f"https://{SearchService}.search.windows.net/",
                                index_name=indexName,
                                credential=AzureKeyCredential(SearchKey))
    results = searchClient.upload_documents(fullData)
    succeeded = sum([1 for r in results if r.succeeded])
    #print(f"\tIndexed {len(results)} sections, {succeeded} succeeded")

    return None

In [19]:
import asyncio
import time

def background(f):
    def wrapped(*args, **kwargs):
        return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)

    return wrapped

In [20]:
@background
def indexDocuments(file):
    fileName = file['filename']
    print(f"Indexing {fileName}")
    readBytes = getBlob(OpenAiDocConnStr, SecDocContainer, fileName)
    secDoc = json.loads(readBytes.decode("utf-8"))           
    #createSearchIndex(indexName)
    chunkAndEmbed(indexName, secDoc, os.path.basename(fileName))
    metadata = {'embedded': 'true', 'indexType': "cogsearchvs", "indexName": indexName}
    upsertMetadata(OpenAiDocConnStr, SecDocContainer, fileName, metadata)

In [21]:
res = filesData[:10000]

In [22]:
indexName = 'secfilings'
i = 0
for file in res:
    indexDocuments(file)

In [8]:
print(f"Company Profile: {company_profile(apikey=apikey, symbol=symbol)=}")

Company Profile: company_profile(apikey=apikey, symbol=symbol)=[{'symbol': 'MSFT', 'price': 337.3225, 'beta': 0.931034, 'volAvg': 29673707, 'mktCap': 2508158005362, 'lastDiv': 2.72, 'range': '213.43-338.55', 'changes': 1.9225, 'companyName': 'Microsoft Corporation', 'currency': 'USD', 'cik': '0000789019', 'isin': 'US5949181045', 'cusip': '594918104', 'exchange': 'NASDAQ Global Select', 'exchangeShortName': 'NASDAQ', 'industry': 'Software—Infrastructure', 'website': 'https://www.microsoft.com', 'description': 'Microsoft Corporation develops, licenses, and supports software, services, devices, and solutions worldwide. The company operates in three segments: Productivity and Business Processes, Intelligent Cloud, and More Personal Computing. The Productivity and Business Processes segment offers Office, Exchange, SharePoint, Microsoft Teams, Office 365 Security and Compliance, Microsoft Viva, and Skype for Business; Skype, Outlook.com, OneDrive, and LinkedIn; and Dynamics 365, a set of cl

In [9]:
print(f"Key Executives: {key_executives(apikey=apikey, symbol=symbol)=}")

Key Executives: key_executives(apikey=apikey, symbol=symbol)=[{'title': 'Pres & Vice Chairman', 'name': 'Mr. Bradford L. Smith LCA', 'pay': 4655274, 'currencyPay': 'USD', 'gender': 'male', 'yearBorn': 1959, 'titleSince': None}, {'title': 'Executive Vice President & Chief Financial Officer', 'name': 'Ms. Amy E. Hood', 'pay': 4637915, 'currencyPay': 'USD', 'gender': 'female', 'yearBorn': 1972, 'titleSince': None}, {'title': 'Gen. Mang. of Investor Relations', 'name': 'Brett  Iversen', 'pay': None, 'currencyPay': 'USD', 'gender': '', 'yearBorn': None, 'titleSince': None}, {'title': 'Corporation Vice President & Chief Accounting Officer', 'name': 'Ms. Alice L. Jolla', 'pay': None, 'currencyPay': 'USD', 'gender': 'female', 'yearBorn': 1967, 'titleSince': None}, {'title': 'Executive Vice President of Bus. Devel., Strategy & Ventures', 'name': 'Mr. Christopher David Young', 'pay': 4588876, 'currencyPay': 'USD', 'gender': 'male', 'yearBorn': 1972, 'titleSince': None}, {'title': 'Executive Vice

In [11]:
print(f"SEC Filings: {sec_filings(apikey=apikey, symbol=symbol, filing_type=filing_type)=}")

SEC Filings: sec_filings(apikey=apikey, symbol=symbol, filing_type=filing_type)=[{'symbol': 'MSFT', 'fillingDate': '2022-07-28 00:00:00', 'acceptedDate': '2022-07-28 16:06:19', 'cik': '0000789019', 'type': '10-K', 'link': 'https://www.sec.gov/Archives/edgar/data/789019/000156459022026876/0001564590-22-026876-index.htm', 'finalLink': 'https://www.sec.gov/Archives/edgar/data/789019/000156459022026876/msft-10k_20220630.htm'}, {'symbol': 'MSFT', 'fillingDate': '2021-07-29 00:00:00', 'acceptedDate': '2021-07-29 16:21:55', 'cik': '0000789019', 'type': '10-K', 'link': 'https://www.sec.gov/Archives/edgar/data/789019/000156459021039151/0001564590-21-039151-index.htm', 'finalLink': 'https://www.sec.gov/Archives/edgar/data/789019/000156459021039151/msft-10k_20210630.htm'}, {'symbol': 'MSFT', 'fillingDate': '2020-07-30 00:00:00', 'acceptedDate': '2020-07-30 20:44:46', 'cik': '0000789019', 'type': '10-K', 'link': 'https://www.sec.gov/Archives/edgar/data/789019/000156459020034944/0001564590-20-03494