# Earnings Call Summary POC

POC for summarizing the transcripts of an earnings call

## Requirements
#### Package Requirements
This notebook was created with the following packages
- python                    3.11
- llama-index               0.12.25
- ragas                     0.2.14

#### Other Requirements
- Environment variable `OPENAI_API_KEY`.  This is needed for LLaMA Index to use its default GPT-3.5 to provide an answer to the query.
- Environment variable `DEEPINFRA_API_KEY`.  This is needed for REST API access LLM models in DeepInfra.

In [1]:
import pandas as pd

## Set up Environment

Setting up environment specific parameters.  Modify these to suit your local environment.

In [2]:
#
# Locations of the data sources
#

data_root = "../data"         # Directory to the data
ec_dir = "earning_calls"
working_dir = "working"
reports_dir = "reports"


In [3]:
import os

# Keys for LLM access
openai_key = os.environ.get("OPENAI_API_KEY")
# hf_key = os.environ.get("HUGGINGFACEHUB_API_TOKEN")
di_key = os.environ.get("DEEPINFRA_API_KEY")

if not openai_key:
    raise EnvironmentError(f"OPENAI_API_KEY must be provided for this notebook to work.  Needed by LLaMA index.")

# if not hf_key:
#     raise EnvironmentError(f"Need HuggingFace token for this notebook to work.  Needed for query extension with DeepSeek-R1"  )

if not di_key:
    raise  EnvironmentError(f"DEEPINFRA_API_KEY is needed to run models in DeepInfra")

In [4]:
#
# Tweak these values
#

# Chunking size
chunk_size = 1000
chunk_overlap = 200

# Type of article
article_type = "transcript of the earnings call"
article_name = "MSFT_EC_2Q25"
article_file = "msft/MSFT_FY2Q25__1__m4a_Good_Tape_2025-03-19.txt"

# Summarization scopes
scope = "Microsoft financial and operational reports"

# Sections that shall be in the report
report_sections = """
1. Conclusion
2. Key Points
3. Contents
3.1. Situation Analysis
3.2. Future Outlook
3.3. Operation Highlights
"""

# LLM models
# llm_model_name = "gpt-4"
# llm_model_name = "gpt-4.5"
llm_model_name = "llama-3"
# llm_model_name = "gemini-2"

In [5]:
# These are steps in this notebook that we want to force refreshing.
# Many of the steps are time-consuming, so I save their results in the data directory.
# If the saved results exists, I will reload them instead of recalculating them.
# Setting any of the steps to True forces the code to recalculate the result for that step.
steps = {
    "chunking": False,                       # Input the article and do chunking
    "chunk_summaries": False,                # Per chunk summarization
    "final_summary": True                   # Summarize the chunk summaries
}

## Reading and Chunking

Read the transcript and chunk it.

In [6]:
from llama_index.core.node_parser import SentenceSplitter

article_path = os.path.join(data_root, ec_dir, article_file)
chunk_path = os.path.join(data_root, working_dir, f"{article_name}_chunks.parquet")

if steps["chunking"] or not os.path.exists(chunk_path):

    # Input
    with open(article_path, "r", encoding="utf-8") as tfd:
        transcript_content = tfd.read()

    # Initialize the SentenceSplitter
    sentence_splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    # Split the text into chunks
    chunks = sentence_splitter.split_text(transcript_content)

    # Put into Pandas
    chunk_ids = [f"{article_name}_{i:04d}" for i in range(len(chunks))]
    chunk_df = pd.DataFrame(zip(chunk_ids, chunks), columns=["chunk_id", "content"])

    # Save the results
    chunk_df.to_parquet(chunk_path)
else:
    chunk_df = pd.read_parquet(chunk_path)

In [7]:
chunk_df

Unnamed: 0,chunk_id,content,summary,summary_gpt-4,summary_gpt-4.5,summary_gemini-2
0,MSFT_EC_2Q25_0000,"MSFT_FY2Q25 (1).m4a\n\noperator assistance, pl...",Here is a concise summary of the section withi...,Microsoft's Q2 FY25 earnings call included com...,Microsoft reported strong financial performanc...,Microsoft's FY2Q25 earnings call discussed fin...
1,MSFT_EC_2Q25_0001,"From now on, it's a more\ncontinuous cycle gov...",Here is a concise summary of the transcript se...,Microsoft is enhancing its presence in every l...,Microsoft significantly expanded Azure data ce...,Microsoft is expanding data center capacity to...
2,MSFT_EC_2Q25_0002,When you look at customers who purchased Copil...,Here is a concise summary of the Microsoft ear...,"During the first quarter of availability, cust...",Microsoft reported robust financial and operat...,Microsoft is seeing strong customer adoption a...
3,MSFT_EC_2Q25_0003,More professionals than ever are engaging in h...,Here is a concise summary of the section withi...,Microsoft reported an increase in LinkedIn eng...,Microsoft reported strong financial and operat...,Key Microsoft financial and operational highli...
4,MSFT_EC_2Q25_0004,Microsoft Cloud gross margin percentage was 70...,Here is a concise summary of Microsoft's finan...,Microsoft reported that Cloud gross margin per...,Microsoft reported a gross margin percentage o...,"Microsoft's Cloud gross margin was 70%, decrea..."
5,MSFT_EC_2Q25_0005,Segment gross margin dollars increased 12% and...,Here is a concise summary of the Microsoft fin...,Microsoft reported a 12-13% increase in segmen...,Microsoft's quarterly financial and operationa...,Here's a concise summary of the Microsoft fina...
6,MSFT_EC_2Q25_0006,We expect consistent execution\nacross our cor...,Here is a concise summary of Microsoft's finan...,Microsoft expects consistent performance in it...,Microsoft provided forward financial guidance ...,Microsoft anticipates its Q3 financial perform...
7,MSFT_EC_2Q25_0007,We expect Xbox content services revenue growth...,Here is a concise summary of the transcript wi...,Microsoft predicts Xbox content services reven...,Microsoft expects modest growth in Xbox conten...,Microsoft expects:\n\n* **Xbox content and s...
8,MSFT_EC_2Q25_0008,Please proceed. Thank you guys for taking the ...,Here is a concise summary of the section withi...,Microsoft had a strong quarter in terms of com...,Microsoft reported solid commercial bookings f...,"Microsoft's commercial bookings were solid, bu..."
9,MSFT_EC_2Q25_0009,But what we're seeing is waiting to see just h...,Here is a concise summary of the section withi...,"On the earnings call, Microsoft discussed the ...",Microsoft's AI revenue significantly exceeded ...,Microsoft is seeing strong growth in its AI in...


## Summarizing the Chunks

In [8]:
import bots

llm = bots.of(llm_model_name)


In [9]:
from tqdm.notebook import tqdm

summary_field = f"summary_{llm_model_name}"

if steps["chunk_summaries"] or summary_field not in chunk_df.columns:


    # Ask LLM to summarize per chunk
    chunk_summary_prompt = """
    Concisely summarize the following section of a {article_type}.
    Only summarize within the scope of {scope}.
    ===
    {text}
    """

    chunk_summaries = []
    for chunk_content in tqdm(chunk_df["content"], desc="Summarize chunks"):
        result = llm.react(
            chunk_summary_prompt,
            arguments={
                "article_type": article_type,
                "text": chunk_content,
                "scope": scope,
            })
        chunk_summaries.append(result["content"])

    chunk_df[summary_field] = chunk_summaries

    # Save the results
    chunk_df.to_parquet(chunk_path)
else:
    chunk_df = pd.read_parquet(chunk_path)

Summarize chunks:   0%|          | 0/14 [00:00<?, ?it/s]

In [10]:
chunk_df

Unnamed: 0,chunk_id,content,summary,summary_gpt-4,summary_gpt-4.5,summary_gemini-2,summary_llama-3
0,MSFT_EC_2Q25_0000,"MSFT_FY2Q25 (1).m4a\n\noperator assistance, pl...",Here is a concise summary of the section withi...,Microsoft's Q2 FY25 earnings call included com...,Microsoft reported strong financial performanc...,Microsoft's FY2Q25 earnings call discussed fin...,Here is a concise summary of the transcript se...
1,MSFT_EC_2Q25_0001,"From now on, it's a more\ncontinuous cycle gov...",Here is a concise summary of the transcript se...,Microsoft is enhancing its presence in every l...,Microsoft significantly expanded Azure data ce...,Microsoft is expanding data center capacity to...,Here is a concise summary of the transcript se...
2,MSFT_EC_2Q25_0002,When you look at customers who purchased Copil...,Here is a concise summary of the Microsoft ear...,"During the first quarter of availability, cust...",Microsoft reported robust financial and operat...,Microsoft is seeing strong customer adoption a...,Here is a concise summary of the earnings call...
3,MSFT_EC_2Q25_0003,More professionals than ever are engaging in h...,Here is a concise summary of the section withi...,Microsoft reported an increase in LinkedIn eng...,Microsoft reported strong financial and operat...,Key Microsoft financial and operational highli...,Here is a concise summary of the Microsoft fin...
4,MSFT_EC_2Q25_0004,Microsoft Cloud gross margin percentage was 70...,Here is a concise summary of Microsoft's finan...,Microsoft reported that Cloud gross margin per...,Microsoft reported a gross margin percentage o...,"Microsoft's Cloud gross margin was 70%, decrea...",Here is a concise summary of Microsoft's finan...
5,MSFT_EC_2Q25_0005,Segment gross margin dollars increased 12% and...,Here is a concise summary of the Microsoft fin...,Microsoft reported a 12-13% increase in segmen...,Microsoft's quarterly financial and operationa...,Here's a concise summary of the Microsoft fina...,Here is a concise summary of the Microsoft fin...
6,MSFT_EC_2Q25_0006,We expect consistent execution\nacross our cor...,Here is a concise summary of Microsoft's finan...,Microsoft expects consistent performance in it...,Microsoft provided forward financial guidance ...,Microsoft anticipates its Q3 financial perform...,Here is a concise summary of Microsoft's finan...
7,MSFT_EC_2Q25_0007,We expect Xbox content services revenue growth...,Here is a concise summary of the transcript wi...,Microsoft predicts Xbox content services reven...,Microsoft expects modest growth in Xbox conten...,Microsoft expects:\n\n* **Xbox content and s...,Here is a concise summary of the Microsoft fin...
8,MSFT_EC_2Q25_0008,Please proceed. Thank you guys for taking the ...,Here is a concise summary of the section withi...,Microsoft had a strong quarter in terms of com...,Microsoft reported solid commercial bookings f...,"Microsoft's commercial bookings were solid, bu...",Here is a concise summary of the section withi...
9,MSFT_EC_2Q25_0009,But what we're seeing is waiting to see just h...,Here is a concise summary of the section withi...,"On the earnings call, Microsoft discussed the ...",Microsoft's AI revenue significantly exceeded ...,Microsoft is seeing strong growth in its AI in...,Here is a concise summary of the section withi...


In [11]:
# Print the chunk summaries to read them better
for _, row in chunk_df.iterrows():
    print(f"==={row['chunk_id']}===\n{row[summary_field]}")

===MSFT_EC_2Q25_0000===
Here is a concise summary of the transcript section within the scope of Microsoft's financial and operational reports:

* Microsoft Cloud revenue surpassed $40 billion for the first time, up 21% year-over-year.
* AI business revenue has surpassed an annual run rate of $13 billion, up 175% year-over-year.
* The company has more than doubled its data center capacity in the last three years, with significant additions in the last year.
* Azure continues to expand, with customers like UBS migrating workloads to the platform.
===MSFT_EC_2Q25_0001===
Here is a concise summary of the transcript section within the scope of Microsoft financial and operational reports:

* Azure data center capacity has more than doubled in the last three years, with the most capacity added last year.
* Microsoft Fabric has over 19,000 paid customers and is the fastest-growing analytics product in Microsoft's history.
* Power BI has over 30 million monthly active users, up 40% year-over-ye

## Summarize the Summaries

In [12]:
report_en_path = os.path.join(data_root, reports_dir, f"{article_name}_report_en_{llm_model_name}.txt")

if steps["final_summary"] or not os.path.exists(report_en_path):
    final_summary_prompt = """
    You are to generate a report from {article_type}.
    Organize related points in sections, with the following formats:
    {sections}
    ===
    {text}
    """

    chunk_summaries = "\n\n".join(list(chunk_df[summary_field]))

    result = llm.react(final_summary_prompt, arguments={
        "article_type": article_type,
        "text": chunk_summaries,
        "scope": scope,
        "sections": report_sections,
    })
    report_en = result["content"]

    with open(report_en_path, "w", encoding="utf-8") as fd:
        fd.write(report_en)

else:
    with open(report_en_path, "r", encoding="utf-8") as fd:
        report_en = fd.read()

print(report_en)

**Conclusion**

Microsoft's financial and operational reports indicate a strong performance, with revenue surpassing $69 billion and a 12% increase year-over-year. The company's cloud revenue, particularly Azure, has shown significant growth, with a 21% increase year-over-year. Microsoft's AI business has also seen substantial growth, with a 175% increase year-over-year. The company's investments in AI infrastructure and capacity expansion are expected to drive growth in the second half of the year.

**Key Points**

* Microsoft Cloud revenue surpassed $40 billion, up 21% year-over-year
* AI business revenue grew 175% year-over-year, with an annual run rate of $13 billion
* Azure continues to expand, with customers migrating workloads to the platform
* Microsoft 365 Copilot has seen accelerated customer adoption, with customers increasing their seats by over 10x in 18 months
* Commercial bookings grew 67%, with a commercial remaining performance obligation of $298 billion, up 34% year-o

In [13]:
report_zh_path = os.path.join(data_root, reports_dir, f"{article_name}_report_zh_{llm_model_name}.txt")

if steps["final_summary"] or not os.path.exists(report_en_path):
    translation_prompt = """
    Translate the following text in traditional Chinese.
    Do not translate technical terminologies.
    ===
    {text}
    """

    result = llm.react(translation_prompt, arguments={
        "text": report_en,
    })
    report_zh = result["content"]

    with open(report_zh_path, "w", encoding="utf-8") as fd:
        fd.write(report_zh)

else:
    with open(report_zh_path, "r", encoding="utf-8") as fd:
        report_zh = fd.read()

print(report_zh)

**結論**

Microsoft 的財務和操作報告表明出色成績，收入超過 690億美元，同比增長 12%。公司的雲端收入，特別是 Azure，顯示出顯著增長，同比增長 21%。Microsoft 的 AI 業務也出現了實質增長，同比增長 175%。公司在 AI 基礎設施和容量擴增方面的投資預計將推動公司在今年下半年的成長。

**關鍵點**

* Microsoft 雲端收入超過 400億美元，同比增長 21%
* AI 業務收入增長 175%，年營收率達 130億美元
* Azure 不斷擴大，客戶將工作負載遷移至該平台
* Microsoft 365 Copilot 客戶採用加速，客戶在 18 個月內將座位數增加了 10 倍以上
* 商業訂單增長 67%，商業剩余履約義務達 2980億美元，同比增長 34%

**內容**

### 情況分析

Microsoft 的財務和操作報告表明出色成績，收入增長由雲端和 AI 業務驅動。公司的雲端收入超過 400億美元，同比增長 21%。Azure 不斷擴大，客戶將工作負載遷移至該平台。Microsoft 的 AI 業務也出現了實質增長，同比增長 175%。

### 未來展望

Microsoft 的未來展望是正面的，預計收入和經營收入增長雙位數。公司在 AI 基礎設施和容量擴增方面的投資預計將推動公司在今年下半年的成長。Microsoft 的下一季度預測包括：

* 生產力和商業流程：11-12% 收入增長（固定匯率）
* Microsoft 365 商業雲端：14-15% 收入增長（固定匯率）
* 智能雲端：19-20% 收入增長（固定匯率）
* Azure：31-32% 收入增長（固定匯率）

### 運營亮點

Microsoft 的運營亮點包括：

* 商業訂單：增長 67%
* 商業剩余履約義務：2980億美元，增長 34%
* 年度混合率：97%
* 營運費用：增長 5%，低於預期
* 營運利潤率：增長 2 個百分點，至 45%
* 員工人數：比一年前增加 2%
* Azure 資料中心容量在過去三年中增長了兩倍以上，在過去一年中有顯著增加。
* Microsoft Fabric 有超過 19,000 名付費客戶，是 Microsoft 歷史上增長最快的分析产品。
* Power BI 有超過 3,00