# Earnings Call Summary POC

POC for summarizing the transcripts of an earnings call

## Requirements
#### Package Requirements
This notebook was created with the following packages
- python                    3.11
- llama-index               0.12.25
- ragas                     0.2.14

#### Other Requirements
- Environment variable `OPENAI_API_KEY`.  This is needed for LLaMA Index to use its default GPT-3.5 to provide an answer to the query.
- Environment variable `DEEPINFRA_API_KEY`.  This is needed for REST API access LLM models in DeepInfra.

In [29]:
import pandas as pd

## Set up Environment

Setting up environment specific parameters.  Modify these to suit your local environment.

In [30]:
#
# Locations of the data sources
#

data_root = "../data"         # Directory to the data
ec_dir = "earning_calls"
working_dir = "working"
reports_dir = "reports"


In [31]:
import os

# Keys for LLM access
openai_key = os.environ.get("OPENAI_API_KEY")
# hf_key = os.environ.get("HUGGINGFACEHUB_API_TOKEN")
di_key = os.environ.get("DEEPINFRA_API_KEY")

if not openai_key:
    raise EnvironmentError(f"OPENAI_API_KEY must be provided for this notebook to work.  Needed by LLaMA index.")

# if not hf_key:
#     raise EnvironmentError(f"Need HuggingFace token for this notebook to work.  Needed for query extension with DeepSeek-R1"  )

if not di_key:
    raise  EnvironmentError(f"DEEPINFRA_API_KEY is needed to run models in DeepInfra")

In [32]:
#
# Tweak these values
#

# Chunking size
chunk_size = 1000
chunk_overlap = 200

# Type of article
article_type = "transcript of the earnings call"
article_name = "MSFT_EC_2Q25"
article_file = "msft/MSFT_FY2Q25__1__m4a_Good_Tape_2025-03-19.txt"

# Summarization scopes
scope = "Microsoft financial and operational reports"

# Sections that shall be in the report
report_sections = """
1. Conclusion
2. Key Points
3. Contents
3.1. Analysis
3.2. Future Outlook
3.3. Operation Highlights
"""

# LLM models
# llm_model_name = "gpt-4"
# llm_model_name = "gpt-4.5"
llm_model_name = "llama-3"
# llm_model_name = "gemini-2"

# Generation temperature
temperature = 0.4

In [33]:
# These are steps in this notebook that we want to force refreshing.
# Many of the steps are time-consuming, so I save their results in the data directory.
# If the saved results exists, I will reload them instead of recalculating them.
# Setting any of the steps to True forces the code to recalculate the result for that step.
steps = {
    "chunking": False,                       # Input the article and do chunking
    "chunk_summaries": False,                # Per chunk summarization
    "final_summary": True                   # Summarize the chunk summaries
}

## Reading and Chunking

Read the transcript and chunk it.

In [34]:
from llama_index.core.node_parser import SentenceSplitter

article_path = os.path.join(data_root, ec_dir, article_file)
chunk_path = os.path.join(data_root, working_dir, f"{article_name}_chunks.parquet")

if steps["chunking"] or not os.path.exists(chunk_path):

    # Input
    with open(article_path, "r", encoding="utf-8") as tfd:
        transcript_content = tfd.read()

    # Initialize the SentenceSplitter
    sentence_splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    # Split the text into chunks
    chunks = sentence_splitter.split_text(transcript_content)

    # Put into Pandas
    chunk_ids = [f"{article_name}_{i:04d}" for i in range(len(chunks))]
    chunk_df = pd.DataFrame(zip(chunk_ids, chunks), columns=["chunk_id", "content"])

    # Save the results
    chunk_df.to_parquet(chunk_path)
else:
    chunk_df = pd.read_parquet(chunk_path)

In [35]:
chunk_df

Unnamed: 0,chunk_id,content,summary_gpt-4,summary_gpt-4.5,summary_gemini-2,summary_llama-3
0,MSFT_EC_2Q25_0000,"MSFT_FY2Q25 (1).m4a\n\noperator assistance, pl...",Microsoft's Q2 FY25 earnings call included com...,Microsoft reported strong financial performanc...,Microsoft's FY2Q25 earnings call discussed fin...,Here is a concise summary of the Microsoft fin...
1,MSFT_EC_2Q25_0001,"From now on, it's a more\ncontinuous cycle gov...",Microsoft is enhancing its presence in every l...,Microsoft significantly expanded Azure data ce...,Microsoft is expanding data center capacity to...,Here is a concise summary of the Microsoft fin...
2,MSFT_EC_2Q25_0002,When you look at customers who purchased Copil...,"During the first quarter of availability, cust...",Microsoft reported robust financial and operat...,Microsoft is seeing strong customer adoption a...,Here is a concise summary of the Microsoft ear...
3,MSFT_EC_2Q25_0003,More professionals than ever are engaging in h...,Microsoft reported an increase in LinkedIn eng...,Microsoft reported strong financial and operat...,Key Microsoft financial and operational highli...,Here is a concise summary of the Microsoft fin...
4,MSFT_EC_2Q25_0004,Microsoft Cloud gross margin percentage was 70...,Microsoft reported that Cloud gross margin per...,Microsoft reported a gross margin percentage o...,"Microsoft's Cloud gross margin was 70%, decrea...",Here is a concise summary of the Microsoft fin...
5,MSFT_EC_2Q25_0005,Segment gross margin dollars increased 12% and...,Microsoft reported a 12-13% increase in segmen...,Microsoft's quarterly financial and operationa...,Here's a concise summary of the Microsoft fina...,Here is a concise summary of the transcript se...
6,MSFT_EC_2Q25_0006,We expect consistent execution\nacross our cor...,Microsoft expects consistent performance in it...,Microsoft provided forward financial guidance ...,Microsoft anticipates its Q3 financial perform...,Here is a concise summary of Microsoft's finan...
7,MSFT_EC_2Q25_0007,We expect Xbox content services revenue growth...,Microsoft predicts Xbox content services reven...,Microsoft expects modest growth in Xbox conten...,Microsoft expects:\n\n* **Xbox content and s...,Here is a concise summary of the transcript se...
8,MSFT_EC_2Q25_0008,Please proceed. Thank you guys for taking the ...,Microsoft had a strong quarter in terms of com...,Microsoft reported solid commercial bookings f...,"Microsoft's commercial bookings were solid, bu...",Here is a concise summary of the transcript se...
9,MSFT_EC_2Q25_0009,But what we're seeing is waiting to see just h...,"On the earnings call, Microsoft discussed the ...",Microsoft's AI revenue significantly exceeded ...,Microsoft is seeing strong growth in its AI in...,Here is a concise summary of the section withi...


## Summarizing the Chunks

In [36]:
import bots

llm = bots.of(llm_model_name)


In [37]:
from tqdm.notebook import tqdm

summary_field = f"summary_{llm_model_name}"

if steps["chunk_summaries"] or summary_field not in chunk_df.columns:


    # Ask LLM to summarize per chunk
    chunk_summary_prompt = """
    Concisely summarize the following section of a {article_type}.
    Only summarize within the scope of {scope}.
    ===
    {text}
    """

    chunk_summaries = []
    for chunk_content in tqdm(chunk_df["content"], desc="Summarize chunks"):
        result = llm.react(
            chunk_summary_prompt,
            arguments={
                "article_type": article_type,
                "text": chunk_content,
                "scope": scope,
            },
            temperature=temperature,
        )
        chunk_summaries.append(result["content"])

    chunk_df[summary_field] = chunk_summaries

    # Save the results
    chunk_df.to_parquet(chunk_path)
else:
    chunk_df = pd.read_parquet(chunk_path)

In [38]:
chunk_df

Unnamed: 0,chunk_id,content,summary_gpt-4,summary_gpt-4.5,summary_gemini-2,summary_llama-3
0,MSFT_EC_2Q25_0000,"MSFT_FY2Q25 (1).m4a\n\noperator assistance, pl...",Microsoft's Q2 FY25 earnings call included com...,Microsoft reported strong financial performanc...,Microsoft's FY2Q25 earnings call discussed fin...,Here is a concise summary of the Microsoft fin...
1,MSFT_EC_2Q25_0001,"From now on, it's a more\ncontinuous cycle gov...",Microsoft is enhancing its presence in every l...,Microsoft significantly expanded Azure data ce...,Microsoft is expanding data center capacity to...,Here is a concise summary of the Microsoft fin...
2,MSFT_EC_2Q25_0002,When you look at customers who purchased Copil...,"During the first quarter of availability, cust...",Microsoft reported robust financial and operat...,Microsoft is seeing strong customer adoption a...,Here is a concise summary of the Microsoft ear...
3,MSFT_EC_2Q25_0003,More professionals than ever are engaging in h...,Microsoft reported an increase in LinkedIn eng...,Microsoft reported strong financial and operat...,Key Microsoft financial and operational highli...,Here is a concise summary of the Microsoft fin...
4,MSFT_EC_2Q25_0004,Microsoft Cloud gross margin percentage was 70...,Microsoft reported that Cloud gross margin per...,Microsoft reported a gross margin percentage o...,"Microsoft's Cloud gross margin was 70%, decrea...",Here is a concise summary of the Microsoft fin...
5,MSFT_EC_2Q25_0005,Segment gross margin dollars increased 12% and...,Microsoft reported a 12-13% increase in segmen...,Microsoft's quarterly financial and operationa...,Here's a concise summary of the Microsoft fina...,Here is a concise summary of the transcript se...
6,MSFT_EC_2Q25_0006,We expect consistent execution\nacross our cor...,Microsoft expects consistent performance in it...,Microsoft provided forward financial guidance ...,Microsoft anticipates its Q3 financial perform...,Here is a concise summary of Microsoft's finan...
7,MSFT_EC_2Q25_0007,We expect Xbox content services revenue growth...,Microsoft predicts Xbox content services reven...,Microsoft expects modest growth in Xbox conten...,Microsoft expects:\n\n* **Xbox content and s...,Here is a concise summary of the transcript se...
8,MSFT_EC_2Q25_0008,Please proceed. Thank you guys for taking the ...,Microsoft had a strong quarter in terms of com...,Microsoft reported solid commercial bookings f...,"Microsoft's commercial bookings were solid, bu...",Here is a concise summary of the transcript se...
9,MSFT_EC_2Q25_0009,But what we're seeing is waiting to see just h...,"On the earnings call, Microsoft discussed the ...",Microsoft's AI revenue significantly exceeded ...,Microsoft is seeing strong growth in its AI in...,Here is a concise summary of the section withi...


In [39]:
# Print the chunk summaries to read them better
for _, row in chunk_df.iterrows():
    print(f"==={row['chunk_id']}===\n{row[summary_field]}")

===MSFT_EC_2Q25_0000===
Here is a concise summary of the Microsoft financial and operational reports:

* Microsoft Cloud revenue surpassed $40 billion for the first time, up 21% year-over-year.
* AI business revenue has surpassed an annual revenue run rate of $13 billion, up 175% year-over-year.
* The company has more than doubled its data center capacity in the last three years and added more capacity last year than any other year in its history.
* Azure continues to expand its infrastructure layer for AI, with investments in data center capacity, networks, and silicon innovation.
===MSFT_EC_2Q25_0001===
Here is a concise summary of the Microsoft financial and operational reports:

* Azure data center capacity has more than doubled in the last three years, with the most capacity added last year.
* Microsoft Fabric has over 19,000 paid customers and is the fastest-growing analytics product in company history.
* Power BI has over 30 million monthly active users, up 40% year-over-year.
*

## Summarize the Summaries

In [40]:
report_en_path = os.path.join(data_root, reports_dir, f"{article_name}_report_en_{llm_model_name}.txt")

if steps["final_summary"] or not os.path.exists(report_en_path):
    final_summary_prompt = """
    You are to generate a report from {article_type}.
    Organize related points in sections, with the following formats:
    {sections}
    Report quantitatively if values are provided.
    ===
    {text}
    """

    chunk_summaries = "\n\n".join(list(chunk_df[summary_field]))

    result = llm.react(
        final_summary_prompt,
        arguments={
            "article_type": article_type,
            "text": chunk_summaries,
            "scope": scope,
            "sections": report_sections,
        },
        temperature=temperature,
    )
    report_en = result["content"]

    with open(report_en_path, "w", encoding="utf-8") as fd:
        fd.write(report_en)

else:
    with open(report_en_path, "r", encoding="utf-8") as fd:
        report_en = fd.read()

print(report_en)

**Conclusion**
Microsoft's earnings call transcript highlights the company's strong financial and operational performance, with significant growth in cloud revenue, AI business, and commercial bookings. The company's investments in AI infrastructure, data center capacity, and software optimizations have driven revenue growth and improved efficiency. Microsoft's focus on innovation, customer adoption, and partnerships has positioned the company for long-term success.

**Key Points**
- Cloud revenue surpassed $40 billion, up 21% year-over-year.
- AI business revenue reached an annual run rate of $13 billion, up 175% year-over-year.
- Commercial bookings grew 67% year-over-year, with a 34% increase in commercial remaining performance obligation.
- Azure data center capacity more than doubled in the last three years.
- Microsoft 365 commercial cloud revenue grew 16% year-over-year.
- LinkedIn revenue increased 9% year-over-year.

**Contents**
### Analysis
Microsoft's financial performance 

In [42]:
report_zh_path = os.path.join(data_root, reports_dir, f"{article_name}_report_zh_{llm_model_name}.txt")

if steps["final_summary"] or not os.path.exists(report_en_path):
    translation_prompt = """
    Translate the following text in traditional Chinese.
    Do not translate technical terminologies.
    ===
    {text}
    """

    result = llm.react(translation_prompt, arguments={
        "text": report_en,
    })
    report_zh = result["content"]

    with open(report_zh_path, "w", encoding="utf-8") as fd:
        fd.write(report_zh)

else:
    with open(report_zh_path, "r", encoding="utf-8") as fd:
        report_zh = fd.read()

print(report_zh)

**結論**
Microsoft 的財報電話會議紀錄凸顯了公司優異的財務和營運表現，尤其是在雲端收入、AI 商業和商業訂單方面取得顯著成長。公司在 AI 基礎設施、數據中心容量和軟體最佳化方面的投資推動了收入增長和效率改善。Microsoft 對創新的重視、客戶採用和合作夥伴關係使公司順利邁向長期成功。

**重點**
- 雲端收入超過 40 億美元，同比增長 21%。
- AI 商業收入達到年增率 13 億美元，同比增長 175%。
- 商業訂單增長 67%，商業剩餘貢獻額增加 34%。
- Azure 數據中心容量在過去三年內增加了一倍以上。
- Microsoft 365 商業雲端收入增長 16%。
- LinkedIn 收入增加 9%。

**內容**
### 分析
Microsoft 的財務表現優異，收入同比增長 12% 至 696 億美元。公司的毛利金額同比增長 13%，營運收入增加 17%。每股收益為 3.23 美元，同比增長 10%。公司的雲端收入增長由 Azure 驅動，Azure 擴展了其基礎設施層以支援 AI，而 Microsoft 365 則加速了客戶採用。

### 未來展望
Microsoft 對 Q3 的預測包括多個分部的收入增長，包括生產力和商業流程、Intelligent Cloud 和個人電腦。公司預期營運利潤率會增加，推動因素是銷售結構向高利潤率商業的轉變。Microsoft 還預期在 FY25 中看到兩位數的收入和營運收入增長，營運利潤率也會同比增加。

### 營運亮點
- Azure 和其他雲端服務收入同比增長 31%，Azure AI 服務增長 157%。
- Microsoft 365 商業雲端收入同比增長 16%，由 E5 和 Microsoft 365 Copilot 驅動。
- LinkedIn 收入增加 9%，Dynamics 365 收入增加 19%。
- GitHub 擁有 1.5 億開發人員，與過去兩年相比增長 50%，GitHub Copilot 在第一周內有超過 100 萬人的註冊。
- Power BI 擁有超過 3000 萬月活躍用戶，同比增長 40%。
