### PIB CoPilot
PIBs are also used to create a pitchbook by assessing a company's strategy, competitive positioning, review of financial statements, industry dynamics, and trends within the industry. 
1. Company Overview and Executive Bio - A brief description of the company and its key executives with biographies.
2. Conference calls: The same day a company issues its quarterly press release, it will also hold a conference call. On the call, analysts often learn details about management guidance. These conference calls are transcribed by several service providers and can be accessed by subscribers of large financial data providers.
3. Press Release: Can be found in the investor relations section of most companies' websites and contains the financial statements which are used in forms 10-K and 10-Q. 
4. News: News articles that may affect a company's stock price or growth prospect would be something that analysts look into, particularly within a 6-12 month time horizon.
5. SEC filings: These regulatory documents require a company to file Form 10-K and Form 10-Q with the SEC on an ongoing basis. Form 10-K is a financial overview and commentary for the last year, usually found on the company's website. Form 10-Q is similar to form 10-K, but it is a report for the last quarter instead of the previous year.
6. Equity research reports: Look into key forecasts for metrics like Revenue, EBITDA, and EPS for the company or competing firms to form a consensus estimate. 
7. Investor Presentations: Companies provide historical information as an important foundation from which forecasts are made to guide key forecasting drivers. 

#### 0 -  Pre-requsite and imports

In [1]:
import os  
import json  
import openai
from Utilities.envVars import *
import uuid
# Set Search Service endpoint, index name, and API key from environment variables
indexName = SearchIndex

# Set OpenAI API key and endpoint
openai.api_type = "azure"
openai.api_version = OpenAiVersion
openai_api_key = OpenAiKey
assert openai_api_key, "ERROR: Azure OpenAI Key is missing"
openai.api_key = openai_api_key
openAiEndPoint = f"https://{OpenAiService}.openai.azure.com"
assert openAiEndPoint, "ERROR: Azure OpenAI Endpoint is missing"
assert "openai.azure.com" in openAiEndPoint.lower(), "ERROR: Azure OpenAI Endpoint should be in the form: \n\n\t<your unique endpoint identifier>.openai.azure.com"
openai.api_base = openAiEndPoint
davincimodel = OpenAiDavinci


In [2]:
# Parameters
embeddingModelType = "azureopenai"
temperature = 0
tokenLength = 1000
symbol = 'AAPL'
apikey = FmpKey
os.environ['BING_SUBSCRIPTION_KEY'] = BingKey
os.environ['BING_SEARCH_URL'] = BingUrl
pibIndexName = 'pibdata'

In [8]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms.openai import AzureOpenAI, OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
from IPython.display import display, HTML
from langchain.utilities import BingSearchAPIWrapper
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pandas as pd
from langchain.prompts import PromptTemplate
from datetime import datetime
from pytz import timezone
from dateutil.relativedelta import relativedelta
from datetime import timedelta
from Utilities.pibCopilot import indexDocs, createPressReleaseIndex, createStockNewsIndex, mergeDocs, createPibIndex, findPibData, findEarningCalls, deletePibData
from Utilities.pibCopilot import indexEarningCallSections, createEarningCallVectorIndex, createEarningCallIndex, performCogSearch, createSecFilingIndex, findSecFiling
import typing
from Utilities.fmp import *
from langchain.chat_models import AzureChatOpenAI, ChatOpenAI

# Flexibility to change the call to OpenAI or Azure OpenAI

if (embeddingModelType == 'azureopenai'):
    openai.api_type = "azure"
    openai.api_key = OpenAiKey
    openai.api_version = OpenAiVersion
    openai.api_base = OpenAiBase

    llm = AzureOpenAI(deployment_name=OpenAiDavinci,
            temperature=temperature,
            openai_api_key=OpenAiKey,
            max_tokens=tokenLength,
            batch_size=10, 
            max_retries=12)
    
    llmChat = AzureChatOpenAI(
                openai_api_base=openai.api_base,
                openai_api_version=OpenAiVersion,
                deployment_name=OpenAiChat,
                temperature=temperature,
                openai_api_key=OpenAiKey,
                openai_api_type="azure",
                max_tokens=tokenLength)
    
    logging.info("LLM Setup done")
    embeddings = OpenAIEmbeddings(deployment=OpenAiEmbedding, chunk_size=1, openai_api_key=OpenAiKey)
elif embeddingModelType == "openai":
    openai.api_type = "open_ai"
    openai.api_base = "https://api.openai.com/v1"
    openai.api_version = '2020-11-07' 
    openai.api_key = OpenAiApiKey
    llm = OpenAI(temperature=temperature,
            openai_api_key=OpenAiApiKey,
            max_tokens=tokenLength)
    embeddings = OpenAIEmbeddings(openai_api_key=OpenAiApiKey)

    llmChat = ChatOpenAI(temperature=temperature,
        openai_api_key=OpenAiApiKey,
        model_name="gpt-3.5-turbo",
        max_tokens=tokenLength)
    
    embeddings = OpenAIEmbeddings(openai_api_key=OpenAiApiKey)

In [4]:
central = timezone('US/Central')
today = datetime.now(central)
currentYear = today.year
historicalDate = today - relativedelta(years=3)
historicalYear = historicalDate.year
historicalDate = historicalDate.strftime("%Y-%m-%d")
totalYears = currentYear - historicalYear

In [5]:
#find CIK based on Symbol
cik = str(int(searchCik(apikey=apikey, ticker=symbol)[0]["companyCik"]))

In [6]:
#symbol: str = "AAPL"
#cik = "320193"
#symbols: typing.List[str] = ["AAPL", "CSCO", "QQQQ"]
#exchange: str = "NYSE"
#exchanges: typing.List[str] = ["NYSE", "NASDAQ"]
#query: str = "AA"
#limit: int = 3
#period: str = "quarter"
#download: bool = True

#### 1. Paid Data - Company Profile and Key Executives

In [7]:
# Get the information about the company and list of all executives.
# Check if we have already created record for Profile
createPibIndex(SearchService, SearchKey, pibIndexName)
step = "1"
s1Data = []
r = findPibData(SearchService, SearchKey, pibIndexName, cik, step, returnFields=['id', 'symbol', 'cik', 'step', 'description', 'insertedDate',
                                                                   'pibData'])
if r.get_count() == 0:
    step1Profile = []
    profile = companyProfile(apikey=apikey, symbol=symbol)
    df = pd.DataFrame.from_dict(pd.json_normalize(profile))
    sData = {
            'id' : str(uuid.uuid4()),
            'symbol': symbol,
            'cik': cik,
            'step': step,
            'description': 'Company Profile',
            'insertedDate': today.strftime("%Y-%m-%d"),
            'pibData' : str(df[['symbol', 'mktCap', 'companyName', 'currency', 'cik', 'isin', 'exchange', 'industry', 'sector', 'address', 'city', 'state', 'zip', 'website', 'description']].to_dict('records'))
    }
    step1Profile.append(sData)
    s1Data.append(sData)
    # Insert data into pibIndex
    mergeDocs(SearchService, SearchKey, pibIndexName, step1Profile)

    # Get the list of all executives and generate biography for each of them
    executives = keyExecutives(apikey=apikey, symbol=symbol)
    df = pd.DataFrame.from_dict(pd.json_normalize(executives),orient='columns')
    df = df.drop_duplicates(subset='name', keep="first")

    step1Biography = []
    tools = []
    topK = 1
    step1Executives = []
    #### With the company profile and key executives, we can ask Bing Search to get the biography of the all Key executives and 
    # ask OpenAI to summarize it - Public Data
    for executive in executives:
        name = executive['name']
        title = executive['title']
        query = f"Give me brief biography of {name} who is {title} at {symbol}. Biography should be restricted to {symbol} and summarize it as 2 paragraphs."
        qaPromptTemplate = """
            Rephrase the following question asked by user to perform intelligent internet search
            {query}
            """
        optimizedPrompt = qaPromptTemplate.format(query=query)
        completion = openai.Completion.create(
                    engine=OpenAiDavinci,
                    prompt=optimizedPrompt,
                    temperature=temperature,
                    max_tokens=100,
                    n=1)
        q = completion.choices[0].text
        bingSearch = BingSearchAPIWrapper(k=25)
        results = bingSearch.run(query=q)
        chain = load_summarize_chain(llm, chain_type="map_reduce")
        docs = [Document(page_content=results)]
        summary = chain.run(docs)
        step1Executives.append({
            "name": name,
            "title": title,
            "biography": summary
        })

    sData = {
            'id' : str(uuid.uuid4()),
            'symbol': symbol,
            'cik': cik,
            'step': step,
            'description': 'Biography of Key Executives',
            'insertedDate': today.strftime("%Y-%m-%d"),
            'pibData' : str(step1Executives)
    }
    step1Biography.append(sData)
    s1Data.append(sData)
    mergeDocs(SearchService, SearchKey, pibIndexName, step1Biography)
else:
    for s in r:
        s1Data.append(
            {
                'id' : s['id'],
                'symbol': s['symbol'],
                'cik': s['cik'],
                'step': s['step'],
                'description': s['description'],
                'insertedDate': s['insertedDate'],
                'pibData' : s['pibData']
            })

print(s1Data)

Search index pibdata already exists
[{'id': 'b7cb44e7-4ec7-4cf3-b715-390e36b29330', 'symbol': 'AAPL', 'cik': '320193', 'step': '1', 'description': 'Biography of Key Executives', 'insertedDate': '2023-07-15', 'pibData': '[{\'name\': "Ms. Deirdre  O\'Brien", \'title\': \'Senior Vice President of People & Retail\', \'biography\': " Deirdre O\'Brien is Apple\'s Senior Vice President of Retail and People, reporting to CEO Tim Cook. She has been with Apple for 30 years and holds a Bachelor\'s Degree in Operations Management from Michigan State University and a Master of Business Administration from San Jose State University. She oversees Apple\'s retail and online teams, and Apple\'s People team, with a focus on creating exceptional experiences for Apple customers and is estimated to have a net worth of $76.99M."}, {\'name\': \'Mr. Timothy D. Cook\', \'title\': \'Chief Executive Officer & Director\', \'biography\': " Tim Cook is the CEO of Apple Inc. since 2011 and was previously the COO und

#### 2. Paid Data -  Get the Earnings Call Transcript for each quarter for last 3 years

In [45]:
# Call the paid data (FMP) API
# Get the earning call transcripts for the last 3 years and merge documents into the index.
i = 0
earningsData = []
step = "2"
earningIndexName = 'earningcalls'
# Create the index if it does not exist
createEarningCallIndex(SearchService, SearchKey, earningIndexName)
for i in range(totalYears + 1):
    print(f"Processing ticker : {symbol}")
    processYear = historicalYear + i
    Quarters = ['Q1', 'Q2', 'Q3', 'Q4']
    for quarter in Quarters:
        print(f"Processing year and Quarter : {processYear}-{quarter}")
        r = findEarningCalls(SearchService, SearchKey, earningIndexName, symbol, quarter.replace('Q', ''), str(processYear), returnFields=['id', 'symbol', 
                    'quarter', 'year', 'callDate', 'content'])
        if r.get_count() == 0:
            insertEarningCall = []
            earningTranscript = earningCallTranscript(apikey=apikey, symbol=symbol, year=str(processYear), quarter=quarter)
            print(earningTranscript)
            for transcript in earningTranscript:
                symbol = transcript['symbol']
                quarter = transcript['quarter']
                year = transcript['year']
                callDate = transcript['date']
                content = transcript['content']
                todayYmd = today.strftime("%Y-%m-%d")
                id = f"{symbol}-{year}-{quarter}"
                earningRecord = {
                    "id": id,
                    "symbol": symbol,
                    "quarter": str(quarter),
                    "year": str(year),
                    "callDate": callDate,
                    "content": content,
                    #"inserteddate": datetime.now(central).strftime("%Y-%m-%d"),
                }
                earningsData.append(earningRecord)
                insertEarningCall.append(earningRecord)
                mergeDocs(SearchService, SearchKey, earningIndexName, insertEarningCall)
        else:
            print(f"Found {r.get_count()} records for {symbol} for {quarter} {year}")
            for s in r:
                earningsData.append(
                    {
                        'id' : s['id'],
                        'symbol': s['symbol'],
                        'quarter': s['quarter'],
                        'year': s['year'],
                        'callDate': s['callDate'],
                        'content': s['content']
                    })

Search index earningcalls already exists
Processing ticker : AAPL
Processing year and Quarter : 2020-Q1
Found 1 records for AAPL for Q1 2023
Processing year and Quarter : 2020-Q2
Found 1 records for AAPL for Q2 2023
Processing year and Quarter : 2020-Q3
Found 1 records for AAPL for Q3 2023
Processing year and Quarter : 2020-Q4
Found 1 records for AAPL for Q4 2023
Processing ticker : AAPL
Processing year and Quarter : 2021-Q1
Found 1 records for AAPL for Q1 2023
Processing year and Quarter : 2021-Q2
Found 1 records for AAPL for Q2 2023
Processing year and Quarter : 2021-Q3
Found 1 records for AAPL for Q3 2023
Processing year and Quarter : 2021-Q4
Found 1 records for AAPL for Q4 2023
Processing ticker : AAPL
Processing year and Quarter : 2022-Q1
Found 1 records for AAPL for Q1 2023
Processing year and Quarter : 2022-Q2
Found 1 records for AAPL for Q2 2023
Processing year and Quarter : 2022-Q3
Found 1 records for AAPL for Q3 2023
Processing year and Quarter : 2022-Q4
Found 1 records for A

#### Split the transcripts as per Split Method, Chunk Size and Overlap

In [47]:
earningsData[-1]['content']

"Operator: Good day, and welcome to the Apple’s Q2 Fiscal Year 2023 Earnings Conference Call. Today's call is being recorded. At this time, for opening remarks and introductions, I would like to turn the call over to Suhasini Chandramouli, Director of Investor Relations. Please go ahead.\nSuhasini Chandramouli : Thank you. Good afternoon and thank you for joining us. Speaking first today is Apple's CEO, Tim Cook; and he'll be followed by CFO, Luca Maestri. After that, we'll open the call to questions from analysts. Please note that some of the information you'll hear during our discussion today will consist of forward-looking statements, including, without limitation, those regarding revenue, gross margin, operating expense, other income and expense, taxes, capital allocation, and future business outlook, including the potential impact of macroeconomic conditions on the company's business and results of operations. These statements involve risks and uncertainties that may cause actual 

In [46]:
# Let's just use the latest earnings call transcript to create the documents that we want to use it for generative AI tasks
splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)

print("Last earning call transcripts was on :", earningsData[-1]['callDate'])
rawDocs = splitter.create_documents([earningsData[-1]['content']])
docs = splitter.split_documents(rawDocs)
print("Number of documents chunks generated from Call transcript : ", len(docs))


Last earning call transcripts was on : 2023-05-04 21:35:32
Number of documents chunks generated from Call transcript :  40


#### Create the vector store embedding data for chunked sections

In [20]:
# Store the last index of the earning call transcript in vector Index
earningVectorIndexName = 'latestearningcalls'
createEarningCallVectorIndex(SearchService, SearchKey, earningVectorIndexName)

indexEarningCallSections(OpenAiService, OpenAiKey, OpenAiVersion, OpenAiApiKey, SearchService, SearchKey,
                         embeddingModelType, OpenAiEmbedding, earningVectorIndexName, docs,
                         earningsData[-1]['callDate'], earningsData[-1]['symbol'], earningsData[-1]['year'],
                         earningsData[-1]['quarter'])

Search index latestearningcalls already exists
Total docs: 40
Found 40 sections for AAPL 2023 Q2
Already indexed 40 sections for AAPL 2023 Q2


In [21]:
# Helper function to find the answer to a question
def findAnswer(chainType, topK, question, indexName):
    # Since we already index our document, we can perform the search on the query to retrieve "TopK" documents
    r = performCogSearch(OpenAiService, OpenAiKey, OpenAiVersion, OpenAiApiKey, SearchService, SearchKey, embeddingModelType, OpenAiEmbedding, question, 
                         indexName, topK, returnFields=['id', 'symbol', 'quarter', 'year', 'callDate', 'content'])

    if r == None:
        docs = [Document(page_content="No results found")]
    else :
        docs = [
            Document(page_content=doc['content'], metadata={"id": doc['id'], "source": ''})
            for doc in r
            ]

    if chainType == "map_reduce":
        # Prompt for MapReduce
        qaTemplate = """Use the following portion of a long document to see if any of the text is relevant to answer the question.
                Return any relevant text.
                {context}
                Question: {question}
                Relevant text, if any :"""

        qaPrompt = PromptTemplate(
            template=qaTemplate, input_variables=["context", "question"]
        )

        combinePromptTemplate = """Given the following extracted parts of a long document and a question, create a final answer.
        If you don't know the answer, just say that you don't know. Don't try to make up an answer.
        If the answer is not contained within the text below, say \"I don't know\".

        QUESTION: {question}
        =========
        {summaries}
        =========
        """
        combinePrompt = PromptTemplate(
            template=combinePromptTemplate, input_variables=["summaries", "question"]
        )

        qaChain = load_qa_with_sources_chain(llm, chain_type=chainType, question_prompt=qaPrompt, 
                                            combine_prompt=combinePrompt, 
                                            return_intermediate_steps=True)
        answer = qaChain({"input_documents": docs, "question": question})
        outputAnswer = answer['output_text']

    elif chainType == "stuff":
    # Prompt for ChainType = Stuff
        template = """
                Given the following extracted parts of a long document and a question, create a final answer. 
                If you don't know the answer, just say that you don't know. Don't try to make up an answer. 
                If the answer is not contained within the text below, say \"I don't know\".

                QUESTION: {question}
                =========
                {summaries}
                =========
                """
        qaPrompt = PromptTemplate(template=template, input_variables=["summaries", "question"])
        qaChain = load_qa_with_sources_chain(llm, chain_type=chainType, prompt=qaPrompt)
        answer = qaChain({"input_documents": docs, "question": question}, return_only_outputs=True)
        outputAnswer = answer['output_text']
    elif chainType == "default":
        # Default Prompt
        qaChain = load_qa_with_sources_chain(llm, chain_type="stuff")
        answer = qaChain({"input_documents": docs, "question": question}, return_only_outputs=True)
        outputAnswer = answer['output_text']

    return outputAnswer

#### Top questions to ask during earning call - Let's see if we can find the answers to these questions in the transcripts
- What are some of the current and looming threats to the business?
- What is the debt level or debt ratio of the company right now?
- How do you feel about the upcoming product launches or new products?
- How are you managing or investing in your human capital?
- How do you track the trends in your industry?
- Are there major slowdowns in the production of goods?
- How will you maintain or surpass this performance in the next few quarters?
- What will your market look like in five years as a result of using your product or service?
- How are you going to address the risks that will affect the long-term growth of the company?
- How is the performance this quarter going to affect the long-term goals of the company?

#### Another specific question to ask
- Revenue: Provide key information about revenue for the quarter
- Profitability: Provide key information about profits and losses (P&L) for the quarter
- Industry Trends: Provide key information about industry trends for the quarter
- Trend: Provide key information about business trends discussed on the call
- Risk: Provide key information about risk discussed on the call
- AI: Provide key information about AI discussed on the call
- M&A: Provide any information about mergers and acquisitions (M&A) discussed on the call.
- Guidance: Provide key information about guidance discussed on the call

#### Since we have the lastest transcripts in the document format, let's summarize the information with following specific summary

In [43]:
r = findPibData(SearchService, SearchKey, pibIndexName, cik, step, returnFields=['id', 'symbol', 'cik', 'step', 'description', 'insertedDate',
                                                                   'pibData'])
s2Data = []
if r.get_count() == 0:

    earningCallQa = []
    commonQuestions = [
        "What are some of the current and looming threats to the business?",
        "What is the debt level or debt ratio of the company right now?",
        "How do you feel about the upcoming product launches or new products?",
        "How are you managing or investing in your human capital?",
        "How do you track the trends in your industry?",
        "Are there major slowdowns in the production of goods?",
        "How will you maintain or surpass this performance in the next few quarters?",
        "What will your market look like in five years as a result of using your product or service?",
        "How are you going to address the risks that will affect the long-term growth of the company?",
        "How is the performance this quarter going to affect the long-term goals of the company?"
    ]

    for question in commonQuestions:
        answer = findAnswer('map_reduce', 3, question, earningVectorIndexName)
        if "I don't know" not in answer:
            earningCallQa.append({"question": question, "answer": answer})

    commonQuestions = [
        "Provide key information about revenue for the quarter",
        "Provide key information about profits and losses (P&L) for the quarter",
        "Provide key information about industry trends for the quarter",
        "Provide key information about business trends discussed on the call",
        "Provide key information about risk discussed on the call",
        "Provide key information about AI discussed on the call",
        "Provide any information about mergers and acquisitions (M&A) discussed on the call.",
        "Provide key information about guidance discussed on the call"
    ]

    for question in commonQuestions:
        answer = findAnswer('map_reduce', 3, question, earningVectorIndexName)
        if "I don't know" not in answer:
            earningCallQa.append({"question": question, "answer": answer})

    # With the data indexed, let's summarize the information
    # While we are using the standard prompt by langchain, you can modify the prompt to suit your needs
    # 1. Financial Results Summary: Please provide a summary of the financial results.
    # 2. Business Highlights: Please provide a summary of the business highlights.
    # 3. Future Outlook: Please provide a summary of the future outlook.
    # 4. Business Risks: Please provide a summary of the business risks.
    # 5. Management Positive Sentiment: Please provide a summary of the what management is confident about.
    # 6. Management Negative Sentiment: Please provide a summary of the what management is concerned about.
    # 7. Future Growth Strategies : Please generate a concise and comprehensive strategies summary that includes the information in  bulleted format.
    # 8. Risk Increase: Please provide a summary of the risks that have increased.
    # 9. Risk Decrease: Please provide a summary of the risks that have decreased.
    # 10. Opportunity Increase: Please provide a summary of the opportunities that have increased.
    # 11. Opportunity Decrease: Please provide a summary of the opportunities that have decreased.
    commonSummary = [
        "Financial Results",
        "Business Highlights",
        "Future Outlook",
        "Business Risks",
        "Management Positive Sentiment",
        "Management Negative Sentiment",
        "Future Growth Strategies"
    ]

    promptTemplate = """You are an AI assistant tasked with summarizing financial information from earning call transcript. 
            Your summary should accurately capture the key information in the document while avoiding the omission of any domain-specific words. 
            Please generate a concise and comprehensive summary on the following topics. 
            {summarize}
            Please remember to use clear language and maintain the integrity of the original information without missing any important details:
            {text}
            """
    for summary in commonSummary:
        customPrompt = PromptTemplate(template=promptTemplate.replace('{summarize}', summary), input_variables=["text"])
        chainType = "map_reduce"
        summaryChain = load_summarize_chain(llm, chain_type=chainType, return_intermediate_steps=False, 
                                    map_prompt=customPrompt, combine_prompt=customPrompt)
        summaryOutput = summaryChain({"input_documents": docs}, return_only_outputs=True)
        outputAnswer = summaryOutput['output_text'].replace('Summary:', '')
        if "I don't know" not in answer and len(outputAnswer) > 0:
            earningCallQa.append({"question": summary, "answer": outputAnswer})

    s2Data.append({
                'id' : str(uuid.uuid4()),
                'symbol': symbol,
                'cik': cik,
                'step': step,
                'description': 'Earning Call Q&A',
                'insertedDate': today.strftime("%Y-%m-%d"),
                'pibData' : str(earningCallQa)
        })
    mergeDocs(SearchService, SearchKey, pibIndexName, s2Data)
else:
    print('Found existing data')
    for s in r:
        s2Data.append(
            {
                'id' : s['id'],
                'symbol': s['symbol'],
                'cik': s['cik'],
                'step': s['step'],
                'description': s['description'],
                'insertedDate': s['insertedDate'],
                'pibData' : s['pibData']
            })
        
print(s2Data)

Found existing data
[{'id': '85405b23-64e3-4067-b505-62cbbc3eab16', 'symbol': 'AAPL', 'cik': '320193', 'step': '2', 'description': 'Earning Call Q&A', 'insertedDate': '2023-07-16', 'pibData': '[{\'question\': \'What are some of the current and looming threats to the business?\', \'answer\': \'\\nThe current and looming threats to the business include changes in global economic and geopolitical conditions, recessionary fears, inflation, interest rates, regional labor market constraints, world events, the rate of growth of the internet, online commerce and cloud services, a slowdown in digital advertising and mobile gaming, and elevated usage during the COVID years.\'}, {\'question\': \'Provide key information about revenue for the quarter\', \'answer\': \'\\nThe total revenue for the March quarter was $94.8 billion, down 3% from last year. iPhone revenue was $51.3 billion, up 2% year-over-year, and Mac revenue was $7.2 billion, down 31% year-over-year. Services revenue was $20.9 billion

#### In case if we wanted to see summary of summary, run code below

In [27]:
# # For the chaintype of MapReduce and Refine, we can also get insight into intermediate steps of the pipeline.
# # This way you can inspect the results from map_reduce chain type, each top similar chunk summary
# intermediateSteps = summary['intermediate_steps']
# for step in intermediateSteps:
#         display(HTML("<b>Chunk Summary:</b> " + step))

#### 3. Paid Data - Press Releases - Get the Press Releases for last year

In [48]:
# For now we are calling API to get data, but otherwise we need to ensure the data is not persisted in our 
# index repository before calling again, if it is persisted then we need to delete it first
counter = 0
pressReleasesList = []
pressReleaseIndexName = 'pressreleases'
# Create the index if it does not exist
createPressReleaseIndex(SearchService, SearchKey, pressReleaseIndexName)
print(f"Processing ticker : {symbol}")
pr = pressReleases(apikey=apikey, symbol=symbol, limit=200)
for pressRelease in pr:
    symbol = pressRelease['symbol']
    releaseDate = pressRelease['date']
    title = pressRelease['title']
    content = pressRelease['text']
    todayYmd = today.strftime("%Y-%m-%d")
    id = f"{symbol}-{counter}"
    pressReleasesList.append({
        "id": id,
        "symbol": symbol,
        "releaseDate": releaseDate,
        "title": title,
        "content": content,
    })
    counter = counter + 1

mergeDocs(SearchService, SearchKey, pressReleaseIndexName, pressReleasesList)

Search index pressreleases already exists
Processing ticker : AAPL
Total docs: 164
	Indexed 164 sections, 164 succeeded


In [49]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)
rawPressReleasesDoc = [Document(page_content=t['content']) for t in pressReleasesList[:25]]
pressReleasesDocs = splitter.split_documents(rawPressReleasesDoc)
print("Number of documents chunks generated from Press releases : ", len(pressReleasesDocs))

Number of documents chunks generated from Press releases :  25


In [50]:
# With the data indexed, let's summarize the information
promptTemplate = """You are an AI assistant tasked with summarizing company's press releases and performing sentiments on those. 
        Your summary should accurately capture the key information in the press-releases while avoiding the omission of any domain-specific words. 
        Please generate a concise and comprehensive summary and sentiment with score with range of 0 to 10. 
        Your response should be in JSON object with following keys.  All JSON properties are required.
        summary: 
        sentiment:
        sentiment score: 
        {text}
        """
customPrompt = PromptTemplate(template=promptTemplate, input_variables=["text"])
chainType = "map_reduce"
summaryChain = load_summarize_chain(llm, chain_type=chainType, return_intermediate_steps=True, 
                                    map_prompt=customPrompt, combine_prompt=customPrompt)
summary = summaryChain({"input_documents": pressReleasesDocs}, return_only_outputs=True)
outputAnswer = summary['output_text']
print(outputAnswer)





In [62]:
# For the chaintype of MapReduce and Refine, we can also get insight into intermediate steps of the pipeline.
# This way you can inspect the results from map_reduce chain type, each top similar chunk summary
pressReleasesPib = []
last25PressReleases = pressReleasesList[:25]
intermediateSteps = summary['intermediate_steps']
i = 0
for step in intermediateSteps:
        jsonStep = json.loads(step)
        pressReleasesPib.append({
                "releaseDate": last25PressReleases[i]['releaseDate'],
                "title": last25PressReleases[i]['title'],
                "summary": jsonStep['summary'],
                "sentiment": jsonStep['sentiment'],
                "sentimentScore": jsonStep['sentiment score']
        })
        i = i + 1
        #display(HTML("<b>Chunk Summary:</b> " + step))

In [None]:
step = "3"
s3Data = []
# We are deleting the data as the Press-releases could be dynamic and we want the latest data
deletePibData(SearchService, SearchKey, pibIndexName, cik, step, returnFields=['id', 'symbol', 'cik', 'step', 'description', 'insertedDate',
                                                                   'pibData'])
s3Data.append({
                'id' : str(uuid.uuid4()),
                'symbol': symbol,
                'cik': cik,
                'step': step,
                'description': 'Press Releases',
                'insertedDate': today.strftime("%Y-%m-%d"),
                'pibData' : str(pressReleasesPib)
        })
mergeDocs(SearchService, SearchKey, pibIndexName, s3Data)

### 4. Paid Data - Get Stock News - Limit it to cover for current year

In [32]:
# For now we are calling API to get data, but otherwise we need to ensure the data is not persisted in our 
# index repository before calling again, if it is persisted then we need to delete it first
counter = 0
stockNewsList = []
stockNewsIndexName = 'stocknews'
# Create the index if it does not exist
createStockNewsIndex(SearchService, SearchKey, stockNewsIndexName)
print(f"Processing ticker : {symbol}")
sn = stockNews(apikey=apikey, tickers=symbol, limit=5000)
for news in sn:
    symbol = news['symbol']
    publishedDate = news['publishedDate']
    title = news['title']
    image = news['image']
    site = news['site']
    content = news['text']
    url = news['url']
    todayYmd = today.strftime("%Y-%m-%d")
    id = f"{symbol}-{todayYmd}-{counter}"
    stockNewsList.append({
        "id": id,
        "symbol": symbol,
        "publishedDate": publishedDate,
        "title": title,
        "image": image,
        "site": site,
        "content": content,
        "url": url,
    })
    counter = counter + 1
mergeDocs(SearchService, SearchKey, stockNewsIndexName, stockNewsList)

Search index stocknews already exists
Processing ticker : AAPL
Total docs: 5000
	Indexed 1000 sections, 1000 succeeded
	Indexed 1000 sections, 1000 succeeded
	Indexed 1000 sections, 1000 succeeded
	Indexed 1000 sections, 1000 succeeded
	Indexed 1000 sections, 1000 succeeded


In [33]:
# Group our news by Date and summarize the content and sentimet per day
stocksDf = pd.DataFrame.from_dict(pd.json_normalize(stockNewsList))
stocksDf['publishedDate'] = pd.to_datetime(stocksDf['publishedDate']).dt.date
stocksNewsDailyDf = stocksDf.sort_values('publishedDate').groupby('publishedDate')['content'].apply('\n'.join).reset_index()
splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)
rawNewsDocs = [Document(page_content=row['content']) for index, row in stocksNewsDailyDf.tail(10).iterrows()]
newsDocs = splitter.split_documents(rawNewsDocs)
print("Number of documents chunks generated from Press releases : ", len(newsDocs))

# With the data indexed, let's summarize the information
promptTemplate = """You are an AI assistant tasked with summarizing news related to company and performing sentiments on those. 
        Your summary should accurately capture the key information in the document while avoiding the omission of any domain-specific words. 
        Please generate a concise and comprehensive summary and sentiment with score with range of 0 to 10. Your response should be in JSON format with following keys.
        summary: 
        sentiment:
        sentiment score:
        Please remember to use clear language and maintain the integrity of the original information without missing any important details.
        {text}
        """
customPrompt = PromptTemplate(template=promptTemplate, input_variables=["text"])
chainType = "map_reduce"
summaryChain = load_summarize_chain(llm, chain_type=chainType, return_intermediate_steps=True, 
                                    map_prompt=customPrompt, combine_prompt=customPrompt)
summary = summaryChain({"input_documents": newsDocs}, return_only_outputs=True)
outputAnswer = summary['output_text']
print(outputAnswer)

Number of documents chunks generated from Press releases :  16



In [34]:
# For the chaintype of MapReduce and Refine, we can also get insight into intermediate steps of the pipeline.
# This way you can inspect the results from map_reduce chain type, each top similar chunk summary
intermediateSteps = summary['intermediate_steps']
for step in intermediateSteps:
        display(HTML("<b>Chunk Summary:</b> " + step))

#### 5. Public Data - Get the SEC Filings - Limit it to cover for last 3 year

In [35]:
filingType = "10-K"
secFilingsList = secFilings(apikey=apikey, symbol=symbol, filing_type=filingType)

In [36]:
latestFilingDateTime = datetime.strptime(secFilingsList[0]['fillingDate'], '%Y-%m-%d %H:%M:%S')
latestFilingDate = latestFilingDateTime.strftime("%Y-%m-%d")
secFilingIndexName = 'secdata'
secFilingList = []
emptyBody = {
        "values": [
            {
                "recordId": 0,
                "data": {
                    "text": ""
                }
            }
        ]
}

secExtractBody = {
    "values": [
        {
            "recordId": 0,
            "data": {
                "text": {
                    "edgar_crawler": {
                        "start_year": int(historicalYear),
                        "end_year": int(currentYear),
                        "quarters": [1,2,3,4],
                        "filing_types": [
                            "10-K"
                        ],
                        "cik_tickers": [cik],
                        "user_agent": "Your name (your email)",
                        "raw_filings_folder": "RAW_FILINGS",
                        "indices_folder": "INDICES",
                        "filings_metadata_file": "FILINGS_METADATA.csv",
                        "skip_present_indices": True
                    },
                    "extract_items": {
                        "raw_filings_folder": "RAW_FILINGS",
                        "extracted_filings_folder": "EXTRACTED_FILINGS",
                        "filings_metadata_file": "FILINGS_METADATA.csv",
                        "items_to_extract": ["1","1A","1B","2","3","4","5","6","7","7A","8","9","9A","9B","10","11","12","13","14","15"],
                        "remove_tables": True,
                        "skip_extracted_filings": True
                    }
                }
            }
        }
    ]
}

# Check if we have already processed the latest filing, if yes then skip
createSecFilingIndex(SearchService, SearchKey, secFilingIndexName)
r = findSecFiling(SearchService, SearchKey, secFilingIndexName, cik, filingType, latestFilingDate, returnFields=['id', 'cik', 'company', 'filingType', 'filingDate',
                                                                                                                 'periodOfReport', 'sic', 'stateOfInc', 'fiscalYearEnd',
                                                                                                                 'filingHtmlIndex', 'htmFilingLink', 'completeTextFilingLink',
                                                                                                                 'item1', 'item1A', 'item1B', 'item2', 'item3', 'item4', 'item5',
                                                                                                                 'item6', 'item7', 'item7A', 'item8', 'item9', 'item9A', 'item9B',
                                                                                                                 'item10', 'item11', 'item12', 'item13', 'item14', 'item15',
                                                                                                                 'sourcefile'])
if r.get_count() == 0:
    # Call Azure Function to perform Web-scraping and store the JSON in our blob
    secExtract = requests.post(SecExtractionUrl, json = secExtractBody)
    # Once the JSON is created, call the function to process the JSON and store the data in our index
    docPersistUrl = SecDocPersistUrl + "&indexType=cogsearchvs&indexName=" + secFilingIndexName + "&embeddingModelType=" + embeddingModelType
    secPersist = requests.post(docPersistUrl, json = emptyBody)
    r = findSecFiling(SearchService, SearchKey, secFilingIndexName, cik, filingType, latestFilingDate, returnFields=['id', 'cik', 'company', 'filingType', 'filingDate',
                                                                                                                 'periodOfReport', 'sic', 'stateOfInc', 'fiscalYearEnd',
                                                                                                                 'filingHtmlIndex', 'htmFilingLink', 'completeTextFilingLink',
                                                                                                                 'item1', 'item1A', 'item1B', 'item2', 'item3', 'item4', 'item5',
                                                                                                                 'item6', 'item7', 'item7A', 'item8', 'item9', 'item9A', 'item9B',
                                                                                                                 'item10', 'item11', 'item12', 'item13', 'item14', 'item15',
                                                                                                                 'sourcefile'])

# Retrieve the latest filing from our index
for filing in r:
    secFilingList.append({
        "id": filing['id'],
        "cik": filing['cik'],
        "company": filing['company'],
        "filingType": filing['filingType'],
        "filingDate": filing['filingDate'],
        "periodOfReport": filing['periodOfReport'],
        "sic": filing['sic'],
        "stateOfInc": filing['stateOfInc'],
        "fiscalYearEnd": filing['fiscalYearEnd'],
        "filingHtmlIndex": filing['filingHtmlIndex'],
        "completeTextFilingLink": filing['completeTextFilingLink'],
        "item1": filing['item1'],
        "item1A": filing['item1A'],
        "item1B": filing['item1B'],
        "item2": filing['item2'],
        "item3": filing['item3'],
        "item4": filing['item4'],
        "item5": filing['item5'],
        "item6": filing['item6'],
        "item7": filing['item7'],
        "item7A": filing['item7A'],
        "item8": filing['item8'],
        "item9": filing['item9'],
        "item9A": filing['item9A'],
        "item9B": filing['item9B'],
        "item10": filing['item10'],
        "item11": filing['item11'],
        "item12": filing['item12'],
        "item13": filing['item13'],
        "item14": filing['item14'],
        "item15": filing['item15'],
        "sourcefile": filing['sourcefile']
    })

In [37]:
def generateSummaries(docs):
    # With the data indexed, let's summarize the information
    promptTemplate = """You are an AI assistant tasked with summarizing financial report related to company. 
            Your summary should accurately capture the key information in the document while avoiding the omission of any domain-specific words. 
            Please generate a concise and comprehensive summary of the following document.
            Please remember to use clear language and maintain the integrity of the original information without missing any important details.
            Summarize it at an average of 10 lines.
            {text}
            """
    customPrompt = PromptTemplate(template=promptTemplate, input_variables=["text"])
    chainType = "map_reduce"
    summaryChain = load_summarize_chain(llm, chain_type=chainType, return_intermediate_steps=True, 
                                        map_prompt=customPrompt, combine_prompt=customPrompt)
    summary = summaryChain({"input_documents": docs}, return_only_outputs=True)
    return summary

In [38]:
# For different section of extracted data, process summarization and generate common answers to questions
splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)

# Item 1 - Describes the business of the company
rawItemDocs = [Document(page_content=secFilingList[0]['item1'])]
itemDocs = splitter.split_documents(rawItemDocs)
print("Number of documents chunks generated from Item1 : ", len(itemDocs))
summary = generateSummaries(itemDocs)
outputAnswer = summary['output_text']
print("Business Description : " + outputAnswer)

# Item 1A - Risk Factors
rawItemDocs = [Document(page_content=secFilingList[0]['item1A'])]
itemDocs = splitter.split_documents(rawItemDocs)
print("Number of documents chunks generated from Item1A : ", len(itemDocs))
summary = generateSummaries(itemDocs)
outputAnswer = summary['output_text']
print("Risk Factors : " + outputAnswer)

# Item 6 - Consolidated Financial Data
rawItemDocs = [Document(page_content=secFilingList[0]['item6'])]
itemDocs = splitter.split_documents(rawItemDocs)
print("Number of documents chunks generated from Item6 : ", len(itemDocs))
summary = generateSummaries(itemDocs)
outputAnswer = summary['output_text']
print("Financial Data : " + outputAnswer)

# Item 7 - Management's Discussion and Analysis of Financial Condition and Results of Operations
rawItemDocs = [Document(page_content=secFilingList[0]['item7'])]
itemDocs = splitter.split_documents(rawItemDocs)
print("Number of documents chunks generated from Item7 : ", len(itemDocs))
summary = generateSummaries(itemDocs)
outputAnswer = summary['output_text']
print("Management Discussion : " + outputAnswer)

# Item 7a - Market risk disclosures
rawItemDocs = [Document(page_content=secFilingList[0]['item7A'])]
itemDocs = splitter.split_documents(rawItemDocs)
print("Number of documents chunks generated from Item7A : ", len(itemDocs))
summary = generateSummaries(itemDocs)
outputAnswer = summary['output_text']
print("Risk Disclosures : " + outputAnswer)

# Item 9 - Disagreements with accountants and changes in accounting
rawItemDocs = [Document(page_content=secFilingList[0]['item9'])]
itemDocs = splitter.split_documents(rawItemDocs)
print("Number of documents chunks generated from Item9 : ", len(itemDocs))
summary = generateSummaries(itemDocs)
outputAnswer = summary['output_text']
print("Accounting Disclosures : " + outputAnswer)

Number of documents chunks generated from Item1 :  12
Business Description : 
Apple Inc. is a technology company that designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories. It offers a range of services, including advertising, AppleCare®, cloud services, digital content, and payment services. The Company sells its products and resells third-party products in most of its major markets directly to consumers, small and mid-sized businesses, and education, enterprise and government customers through its retail and online stores and its direct sales force. It also employs a variety of indirect distribution channels. The main competitive factors for Apple include price, product and service features, relative price and performance, product and service quality and reliability, design innovation, a strong third-party software and accessories ecosystem, marketing and distribution capability, service and support, and corporate reputation. The Comp

#### 6. Private Data - Equity Research Reports

In [5]:
companyRating = rating(apikey=apikey, symbol=symbol)
fScore = financialScore(apikey=apikey, symbol=symbol)
esgScores = esgScore(apikey=apikey, symbol=symbol)
esgRating = esgRatings(apikey=apikey, symbol=symbol)
ugConsensus = upgradeDowngrades(apikey=apikey, symbol=symbol)
priceConsensus = priceTarget(apikey=apikey, symbol=symbol)
#ratingsDf = pd.DataFrame.from_dict(pd.json_normalize(companyRating))
researchReport = []

researchReport.append({
    "symbol": companyRating[0]['symbol'],
    "Overall Recommendation": companyRating[0]['ratingRecommendation'],
    "DCF Recommendation": companyRating[0]['ratingDetailsDCFRecommendation'],
    "ROE Recommendation": companyRating[0]['ratingDetailsROERecommendation'],
    "ROA Recommendation": companyRating[0]['ratingDetailsROARecommendation'],
    "PB Recommendation": companyRating[0]['ratingDetailsPBRecommendation'],
    "PE Recommendation": companyRating[0]['ratingDetailsPERecommendation'],
    "Altman ZScore" : fScore[0]['altmanZScore'],
    "Piotroski Score" : fScore[0]['piotroskiScore'],
    "Environmental Score" : esgScores[0]['environmentalScore'],
    "Social Score" : esgScores[0]['socialScore'],
    "Governance Score" : esgScores[0]['governanceScore'],
    "ESG Score" : esgScores[0]['ESGScore'],
    "ESG Risk Rating": esgRating[0]['ESGRiskRating'],
    "Analyst Consensus Buy": ugConsensus[0]['buy'],
    "Analyst Consensus Sell": ugConsensus[0]['sell'],
    "Analyst Consensus Strong Buy": ugConsensus[0]['strongBuy'],
    "Analyst Consensus Strong Sell": ugConsensus[0]['strongSell'],
    "Analyst Consensus Hold": ugConsensus[0]['hold'],
    "Analyst Consensus": ugConsensus[0]['consensus'],
    "Price Target Consensus": priceConsensus[0]['targetConsensus'],
    "Price Target Median": priceConsensus[0]['targetMedian'],
})
researchReport

[{'symbol': 'AAPL',
  'Overall Recommendation': 'Strong Buy',
  'DCF Recommendation': 'Strong Buy',
  'ROE Recommendation': 'Strong Buy',
  'ROA Recommendation': 'Neutral',
  'PB Recommendation': 'Strong Buy',
  'PE Recommendation': 'Strong Buy',
  'Altman ZScore': 8.750505326729295,
  'Piotroski Score': 7,
  'Environmental Score': 50,
  'Social Score': 50,
  'Governance Score': 63.3,
  'ESG Score': 54.43,
  'ESG Risk Rating': 'B',
  'Analyst Consensus Buy': 27,
  'Analyst Consensus Sell': 1,
  'Analyst Consensus Strong Buy': 0,
  'Analyst Consensus Strong Sell': 0,
  'Analyst Consensus Hold': 6,
  'Analyst Consensus': 'Buy',
  'Price Target Consensus': 182.79,
  'Price Target Median': 182.5}]

#### 7. Paid Data - Investor Presentations - Financial Reports (Balance Sheet, Income Statement and Cash Flow) for last 3 years?