# Earnings Call Summary POC

POC for summarizing the transcripts of an earnings call

## Requirements
#### Package Requirements
This notebook was created with the following packages
- python                    3.11
- llama-index               0.12.25
- ragas                     0.2.14

#### Other Requirements
- Environment variable `OPENAI_API_KEY`.  This is needed for LLaMA Index to use its default GPT-3.5 to provide an answer to the query.
- Environment variable `DEEPINFRA_API_KEY`.  This is needed for REST API access LLM models in DeepInfra.

In [14]:
import pandas as pd

## Set up Environment

Setting up environment specific parameters.  Modify these to suit your local environment.

In [15]:
#
# Locations of the data sources
#

data_root = "../data"         # Directory to the data
ec_dir = "earning_calls"
working_dir = "working"
reports_dir = "reports"


In [16]:
import os

# Keys for LLM access
openai_key = os.environ.get("OPENAI_API_KEY")
# hf_key = os.environ.get("HUGGINGFACEHUB_API_TOKEN")
di_key = os.environ.get("DEEPINFRA_API_KEY")

if not openai_key:
    raise EnvironmentError(f"OPENAI_API_KEY must be provided for this notebook to work.  Needed by LLaMA index.")

# if not hf_key:
#     raise EnvironmentError(f"Need HuggingFace token for this notebook to work.  Needed for query extension with DeepSeek-R1"  )

if not di_key:
    raise  EnvironmentError(f"DEEPINFRA_API_KEY is needed to run models in DeepInfra")

In [17]:
#
# Tweak these values
#

# Chunking size
chunk_size = 1000
chunk_overlap = 200

# Type of article
article_type = "transcript of the earnings call"
article_name = "MSFT_EC_2Q25"
article_file = "msft/MSFT_FY2Q25__1__m4a_Good_Tape_2025-03-19.txt"

# Summarization scopes
scope = "Microsoft financial and operational reports"

# Sections that shall be in the report
report_sections = [
    "Executive Summary", "Future Outlook",
]

# LLM models
# llm_model_name = "gpt-4"
# llm_model_name = "gpt-4.5"
# llm_model_name = "llama-3"
llm_model_name = "gemini-2"

In [18]:
# These are steps in this notebook that we want to force refreshing.
# Many of the steps are time-consuming, so I save their results in the data directory.
# If the saved results exists, I will reload them instead of recalculating them.
# Setting any of the steps to True forces the code to recalculate the result for that step.
steps = {
    "chunking": False,                       # Input the article and do chunking
    "chunk_summaries": False,                # Per chunk summarization
    "final_summary": True                   # Summarize the chunk summaries
}

## Reading and Chunking

Read the transcript and chunk it.

In [19]:
from llama_index.core.node_parser import SentenceSplitter

article_path = os.path.join(data_root, ec_dir, article_file)
chunk_path = os.path.join(data_root, working_dir, f"{article_name}_chunks.parquet")

if steps["chunking"] or not os.path.exists(chunk_path):

    # Input
    with open(article_path, "r", encoding="utf-8") as tfd:
        transcript_content = tfd.read()

    # Initialize the SentenceSplitter
    sentence_splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    # Split the text into chunks
    chunks = sentence_splitter.split_text(transcript_content)

    # Put into Pandas
    chunk_ids = [f"{article_name}_{i:04d}" for i in range(len(chunks))]
    chunk_df = pd.DataFrame(zip(chunk_ids, chunks), columns=["chunk_id", "content"])

    # Save the results
    chunk_df.to_parquet(chunk_path)
else:
    chunk_df = pd.read_parquet(chunk_path)

In [20]:
chunk_df

Unnamed: 0,chunk_id,content,summary,summary_gpt-4,summary_gpt-4.5
0,MSFT_EC_2Q25_0000,"MSFT_FY2Q25 (1).m4a\n\noperator assistance, pl...",Here is a concise summary of the section withi...,Microsoft's Q2 FY25 earnings call included com...,Microsoft reported strong financial performanc...
1,MSFT_EC_2Q25_0001,"From now on, it's a more\ncontinuous cycle gov...",Here is a concise summary of the transcript se...,Microsoft is enhancing its presence in every l...,Microsoft significantly expanded Azure data ce...
2,MSFT_EC_2Q25_0002,When you look at customers who purchased Copil...,Here is a concise summary of the Microsoft ear...,"During the first quarter of availability, cust...",Microsoft reported robust financial and operat...
3,MSFT_EC_2Q25_0003,More professionals than ever are engaging in h...,Here is a concise summary of the section withi...,Microsoft reported an increase in LinkedIn eng...,Microsoft reported strong financial and operat...
4,MSFT_EC_2Q25_0004,Microsoft Cloud gross margin percentage was 70...,Here is a concise summary of Microsoft's finan...,Microsoft reported that Cloud gross margin per...,Microsoft reported a gross margin percentage o...
5,MSFT_EC_2Q25_0005,Segment gross margin dollars increased 12% and...,Here is a concise summary of the Microsoft fin...,Microsoft reported a 12-13% increase in segmen...,Microsoft's quarterly financial and operationa...
6,MSFT_EC_2Q25_0006,We expect consistent execution\nacross our cor...,Here is a concise summary of Microsoft's finan...,Microsoft expects consistent performance in it...,Microsoft provided forward financial guidance ...
7,MSFT_EC_2Q25_0007,We expect Xbox content services revenue growth...,Here is a concise summary of the transcript wi...,Microsoft predicts Xbox content services reven...,Microsoft expects modest growth in Xbox conten...
8,MSFT_EC_2Q25_0008,Please proceed. Thank you guys for taking the ...,Here is a concise summary of the section withi...,Microsoft had a strong quarter in terms of com...,Microsoft reported solid commercial bookings f...
9,MSFT_EC_2Q25_0009,But what we're seeing is waiting to see just h...,Here is a concise summary of the section withi...,"On the earnings call, Microsoft discussed the ...",Microsoft's AI revenue significantly exceeded ...


## Summarizing the Chunks

In [21]:
import bots

llm = bots.of(llm_model_name)


In [22]:
from tqdm.notebook import tqdm

summary_field = f"summary_{llm_model_name}"

if steps["chunk_summaries"] or summary_field not in chunk_df.columns:


    # Ask LLM to summarize per chunk
    chunk_summary_prompt = """
    Concisely summarize the following section of a {article_type}.
    Only summarize within the scope of {scope}.
    ===
    {text}
    """

    chunk_summaries = []
    for chunk_content in tqdm(chunk_df["content"], desc="Summarize chunks"):
        result = llm.react(
            chunk_summary_prompt,
            arguments={
                "article_type": article_type,
                "text": chunk_content,
                "scope": scope,
            })
        chunk_summaries.append(result["content"])

    chunk_df[summary_field] = chunk_summaries

    # Save the results
    chunk_df.to_parquet(chunk_path)
else:
    chunk_df = pd.read_parquet(chunk_path)

Summarize chunks:   0%|          | 0/14 [00:00<?, ?it/s]

In [23]:
chunk_df

Unnamed: 0,chunk_id,content,summary,summary_gpt-4,summary_gpt-4.5,summary_gemini-2
0,MSFT_EC_2Q25_0000,"MSFT_FY2Q25 (1).m4a\n\noperator assistance, pl...",Here is a concise summary of the section withi...,Microsoft's Q2 FY25 earnings call included com...,Microsoft reported strong financial performanc...,Microsoft's FY2Q25 earnings call discussed fin...
1,MSFT_EC_2Q25_0001,"From now on, it's a more\ncontinuous cycle gov...",Here is a concise summary of the transcript se...,Microsoft is enhancing its presence in every l...,Microsoft significantly expanded Azure data ce...,Microsoft is expanding data center capacity to...
2,MSFT_EC_2Q25_0002,When you look at customers who purchased Copil...,Here is a concise summary of the Microsoft ear...,"During the first quarter of availability, cust...",Microsoft reported robust financial and operat...,Microsoft is seeing strong customer adoption a...
3,MSFT_EC_2Q25_0003,More professionals than ever are engaging in h...,Here is a concise summary of the section withi...,Microsoft reported an increase in LinkedIn eng...,Microsoft reported strong financial and operat...,Key Microsoft financial and operational highli...
4,MSFT_EC_2Q25_0004,Microsoft Cloud gross margin percentage was 70...,Here is a concise summary of Microsoft's finan...,Microsoft reported that Cloud gross margin per...,Microsoft reported a gross margin percentage o...,"Microsoft's Cloud gross margin was 70%, decrea..."
5,MSFT_EC_2Q25_0005,Segment gross margin dollars increased 12% and...,Here is a concise summary of the Microsoft fin...,Microsoft reported a 12-13% increase in segmen...,Microsoft's quarterly financial and operationa...,Here's a concise summary of the Microsoft fina...
6,MSFT_EC_2Q25_0006,We expect consistent execution\nacross our cor...,Here is a concise summary of Microsoft's finan...,Microsoft expects consistent performance in it...,Microsoft provided forward financial guidance ...,Microsoft anticipates its Q3 financial perform...
7,MSFT_EC_2Q25_0007,We expect Xbox content services revenue growth...,Here is a concise summary of the transcript wi...,Microsoft predicts Xbox content services reven...,Microsoft expects modest growth in Xbox conten...,Microsoft expects:\n\n* **Xbox content and s...
8,MSFT_EC_2Q25_0008,Please proceed. Thank you guys for taking the ...,Here is a concise summary of the section withi...,Microsoft had a strong quarter in terms of com...,Microsoft reported solid commercial bookings f...,"Microsoft's commercial bookings were solid, bu..."
9,MSFT_EC_2Q25_0009,But what we're seeing is waiting to see just h...,Here is a concise summary of the section withi...,"On the earnings call, Microsoft discussed the ...",Microsoft's AI revenue significantly exceeded ...,Microsoft is seeing strong growth in its AI in...


In [24]:
# Print the chunk summaries to read them better
for _, row in chunk_df.iterrows():
    print(f"==={row['chunk_id']}===\n{row[summary_field]}")

===MSFT_EC_2Q25_0000===
Microsoft's FY2Q25 earnings call discussed financial results and operational performance. Key points include:

*   Microsoft Cloud revenue surpassed $40 billion, up 21% year-over-year.
*   AI business annual revenue run rate exceeded $13 billion, up 175% year-over-year.
*   Focus on scaling the global data center fleet to balance AI training/inference and geographic distribution.
*   Data center capacity has more than doubled in the last three years, with the largest capacity increase in the most recent year.
*   Azure continues to see cloud migration from customers like UBS.
===MSFT_EC_2Q25_0001===
Microsoft is expanding data center capacity to meet AI demand, having more than doubled capacity in the last three years and adding more capacity this past year than any other. Customers like UBS are migrating workloads to Azure, which remains the preferred cloud for mission-critical apps. Microsoft Fabric is experiencing rapid growth with over 19,000 paid customers 

## Summarize the Summaries

In [25]:
report_en_path = os.path.join(data_root, reports_dir, f"{article_name}_report_en_{llm_model_name}.txt")

if steps["final_summary"] or not os.path.exists(report_en_path):
    final_summary_prompt = """
    You are to generate a report from {article_type}.
    Organize related points in sections.
    The sections shall include, but not limited to, {sections}.
    ===
    {text}
    """

    chunk_summaries = "\n\n".join(list(chunk_df[summary_field]))

    result = llm.react(final_summary_prompt, arguments={
        "article_type": article_type,
        "text": chunk_summaries,
        "scope": scope,
        "sections": ", ".join(report_sections),
    })
    report_en = result["content"]

    with open(report_en_path, "w", encoding="utf-8") as fd:
        fd.write(report_en)

else:
    with open(report_en_path, "r", encoding="utf-8") as fd:
        report_en = fd.read()

print(report_en)

## Microsoft FY2Q25 Earnings Call Report

### Executive Summary

Microsoft's FY2Q25 earnings call highlighted strong financial results, driven by robust growth in its cloud and AI businesses. Microsoft Cloud revenue surpassed $40 billion, while the AI business annual revenue run rate exceeded $13 billion. The company is aggressively expanding its data center capacity to meet burgeoning AI demand. Key growth areas include Azure, Microsoft Fabric, Power BI, and Microsoft 365 Copilot. Financial metrics showcased solid revenue growth, increased commercial bookings, and healthy operating margins. While Azure revenue was at the lower end of guidance due to non-AI related sales execution issues, these were offset by strong AI services performance. The outlook for Q3 and the full fiscal year remains positive, with continued investments in AI infrastructure and a focus on driving efficiency and scalability.

### Financial Performance

*   **Overall Results:** Revenue was \$69.6 billion, up 12%.

In [26]:
report_zh_path = os.path.join(data_root, reports_dir, f"{article_name}_report_zh_{llm_model_name}.txt")

if steps["final_summary"] or not os.path.exists(report_en_path):
    translation_prompt = """
    Translate the following text in traditional Chinese.
    Do not translate technical terminologies.
    ===
    {text}
    """

    result = llm.react(translation_prompt, arguments={
        "text": report_en,
    })
    report_zh = result["content"]

    with open(report_zh_path, "w", encoding="utf-8") as fd:
        fd.write(report_zh)

else:
    with open(report_zh_path, "r", encoding="utf-8") as fd:
        report_zh = fd.read()

print(report_zh)

## 微軟 2025 財年第二季度財報電話會議報告

### 執行摘要

微軟 2025 財年第二季度財報電話會議重點闡述了強勁的財務業績，這得益於其雲端和人工智慧業務的穩健增長。Microsoft Cloud 營收超過 400 億美元，而人工智慧業務的年度營收運營率超過 130 億美元。該公司正在積極擴張其資料中心容量，以滿足不斷增長的人工智慧需求。主要增長領域包括 Azure、Microsoft Fabric、Power BI 和 Microsoft 365 Copilot。財務指標顯示出穩健的營收增長、增加的商業預訂量和健康的營業利潤率。儘管由於非人工智慧相關的銷售執行問題，Azure 營收處於指導範圍的下限，但這些問題被強勁的人工智慧服務表現所抵消。第三季度和整個財年的前景依然樂觀，將持續對人工智慧基礎設施進行投資，並專注於提高效率和可擴展性。

### 財務表現

*   **整體業績：** 營收為 696 億美元，增長 12%。
*   **人工智慧營收：** 人工智慧業務年度營收運營率超過 130 億美元，超出預期（同比增長 175%）。
*   **Microsoft Cloud：** 營收達到 409 億美元，增長 21%。毛利率為 70%，同比下降兩個百分點，原因是擴展人工智慧基礎設施。
*   **商業預訂量：** 超出預期，增長 67%（按固定匯率計算增長 75%），受益於 OpenAI 的 Azure 承諾。
*   **剩餘履約義務 (RPO)：** 增加至 2980 億美元，增長 34%。
*   **毛利率：** 整體毛利率同比略有增長，達到 69%。
*   **營業利潤率：** 同比增長兩個百分點，達到 45%。
*   **營業費用：** 增長 5%。
*   **員工人數：** 同比增長 2%。
*   **生產力和業務流程：** 營收為 294 億美元，增長 14%。Microsoft 365 商業雲營收增長 16%。
*   **智能雲：** 營收為 255 億美元，增長 19%。Azure 和其他雲服務營收增長 31%，其中包括來自人工智慧服務的 13 個百分點（增長 157%）。非人工智慧服務的增長略低於預期。
*   **更多個人運算：** 營收為 147 億美元，相對持平。Windows OEM 營收增長 4%。搜