# Earnings Call Summary POC

POC for summarizing the transcripts of an earnings call

## Requirements
#### Package Requirements
This notebook was created with the following packages
- python                    3.11
- llama-index               0.12.25
- pandas                    2.2.2
- langchain                 0.3.21

#### Other Requirements
- Environment variable `OPENAI_API_KEY`.  This is needed for LLaMA Index to use its default GPT-3.5 to provide an answer to the query.
- Environment variable `DEEPINFRA_API_KEY`.  This is needed for REST API access LLM models in DeepInfra.

In [43]:
import pandas as pd

## Set up Environment

Setting up environment specific parameters.  Modify these to suit your local environment.

In [44]:
#
# Locations of the data sources
#

data_root = "../data"         # Directory to the data
ec_dir = "earning_calls"
working_dir = "working"
reports_dir = "reports"


In [45]:
import os

# Keys for LLM access
openai_key = os.environ.get("OPENAI_API_KEY")
# hf_key = os.environ.get("HUGGINGFACEHUB_API_TOKEN")
di_key = os.environ.get("DEEPINFRA_API_KEY")

if not openai_key:
    raise EnvironmentError(f"OPENAI_API_KEY must be provided for this notebook to work.  Needed by LLaMA index.")

# if not hf_key:
#     raise EnvironmentError(f"Need HuggingFace token for this notebook to work.  Needed for query extension with DeepSeek-R1"  )

if not di_key:
    raise  EnvironmentError(f"DEEPINFRA_API_KEY is needed to run models in DeepInfra")

In [69]:
#
# Tweak these values
#

# Chunking size
chunk_size = 1000
chunk_overlap = 200

# Type of article
article_type = "transcript of the earnings call"
article_name = "MSFT_EC_2Q25"
article_file = "msft/MSFT_FY2Q25__1__m4a_Good_Tape_2025-03-19.txt"

# Summarization scopes
scope = "Microsoft financial and operational reports"

# Sections that shall be in the report
report_sections = """
1. Conclusion
2. Key Points
3. Contents
3.1. Financial Performance Analysis
- Include reports of revenue, gross profit margin, operation profit, and EPS in bullet items.
- In each of the above reports, include their seasonal and annual increase/decrease, better/worse than the financial forecast amounts, and major contributing factors if any.
3.2. Future Outlook
3.3. Operation Highlights
"""

# LLM models
# llm_model_name = "gpt-4"
# llm_model_name = "gpt-4.5"
llm_model_name = "llama-3"
# llm_model_name = "gemini-2"

# Generation temperature
temperature = 0.4

In [47]:
# These are steps in this notebook that we want to force refreshing.
# Many of the steps are time-consuming, so I save their results in the data directory.
# If the saved results exists, I will reload them instead of recalculating them.
# Setting any of the steps to True forces the code to recalculate the result for that step.
steps = {
    "chunking": False,                       # Input the article and do chunking
    "chunk_summaries": False,                # Per chunk summarization
    "final_summary": True                   # Summarize the chunk summaries
}

def refresh(what:str):
    return what in steps and steps[what]

## Reading and Chunking

Read the transcript and chunk it.

In [48]:
from llama_index.core.node_parser import SentenceSplitter

article_path = os.path.join(data_root, ec_dir, article_file)
chunk_path = os.path.join(data_root, working_dir, f"{article_name}_chunks.parquet")

if refresh("chunking") or not os.path.exists(chunk_path):

    # Input
    with open(article_path, "r", encoding="utf-8") as tfd:
        transcript_content = tfd.read()

    # Initialize the SentenceSplitter
    sentence_splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    # Split the text into chunks
    chunks = sentence_splitter.split_text(transcript_content)

    # Put into Pandas
    chunk_ids = [f"{article_name}_{i:04d}" for i in range(len(chunks))]
    chunk_df = pd.DataFrame(zip(chunk_ids, chunks), columns=["chunk_id", "content"])

    # Save the results
    chunk_df.to_parquet(chunk_path)
else:
    chunk_df = pd.read_parquet(chunk_path)

In [49]:
chunk_df

Unnamed: 0,chunk_id,content,summary_gpt-4,summary_gpt-4.5,summary_gemini-2,summary_llama-3
0,MSFT_EC_2Q25_0000,"MSFT_FY2Q25 (1).m4a\n\noperator assistance, pl...",Microsoft's Q2 FY25 earnings call included com...,Microsoft reported strong financial performanc...,Microsoft's FY2Q25 earnings call discussed fin...,Here is a concise summary of the Microsoft fin...
1,MSFT_EC_2Q25_0001,"From now on, it's a more\ncontinuous cycle gov...",Microsoft is enhancing its presence in every l...,Microsoft significantly expanded Azure data ce...,Microsoft is expanding data center capacity to...,Here is a concise summary of the Microsoft fin...
2,MSFT_EC_2Q25_0002,When you look at customers who purchased Copil...,"During the first quarter of availability, cust...",Microsoft reported robust financial and operat...,Microsoft is seeing strong customer adoption a...,Here is a concise summary of the Microsoft ear...
3,MSFT_EC_2Q25_0003,More professionals than ever are engaging in h...,Microsoft reported an increase in LinkedIn eng...,Microsoft reported strong financial and operat...,Key Microsoft financial and operational highli...,Here is a concise summary of the Microsoft fin...
4,MSFT_EC_2Q25_0004,Microsoft Cloud gross margin percentage was 70...,Microsoft reported that Cloud gross margin per...,Microsoft reported a gross margin percentage o...,"Microsoft's Cloud gross margin was 70%, decrea...",Here is a concise summary of the Microsoft fin...
5,MSFT_EC_2Q25_0005,Segment gross margin dollars increased 12% and...,Microsoft reported a 12-13% increase in segmen...,Microsoft's quarterly financial and operationa...,Here's a concise summary of the Microsoft fina...,Here is a concise summary of the transcript se...
6,MSFT_EC_2Q25_0006,We expect consistent execution\nacross our cor...,Microsoft expects consistent performance in it...,Microsoft provided forward financial guidance ...,Microsoft anticipates its Q3 financial perform...,Here is a concise summary of Microsoft's finan...
7,MSFT_EC_2Q25_0007,We expect Xbox content services revenue growth...,Microsoft predicts Xbox content services reven...,Microsoft expects modest growth in Xbox conten...,Microsoft expects:\n\n* **Xbox content and s...,Here is a concise summary of the transcript se...
8,MSFT_EC_2Q25_0008,Please proceed. Thank you guys for taking the ...,Microsoft had a strong quarter in terms of com...,Microsoft reported solid commercial bookings f...,"Microsoft's commercial bookings were solid, bu...",Here is a concise summary of the transcript se...
9,MSFT_EC_2Q25_0009,But what we're seeing is waiting to see just h...,"On the earnings call, Microsoft discussed the ...",Microsoft's AI revenue significantly exceeded ...,Microsoft is seeing strong growth in its AI in...,Here is a concise summary of the section withi...


## Summarizing the Chunks

In [50]:
import bots

llm = bots.of(llm_model_name)


In [51]:
from tqdm.notebook import tqdm

summary_field = f"summary_{llm_model_name}"

if refresh("chunk_summaries") or summary_field not in chunk_df.columns:


    # Ask LLM to summarize per chunk
    chunk_summary_prompt = """
    Concisely summarize the following section of a {article_type}.
    Only summarize within the scope of {scope}.
    ===
    {text}
    """

    chunk_summaries = []
    for chunk_content in tqdm(chunk_df["content"], desc="Summarize chunks"):
        result = llm.react(
            chunk_summary_prompt,
            arguments={
                "article_type": article_type,
                "text": chunk_content,
                "scope": scope,
            },
            temperature=temperature,
        )
        chunk_summaries.append(result["content"])

    chunk_df[summary_field] = chunk_summaries

    # Save the results
    chunk_df.to_parquet(chunk_path)
else:
    chunk_df = pd.read_parquet(chunk_path)

Summarize chunks:   0%|          | 0/14 [00:00<?, ?it/s]

In [52]:
chunk_df

Unnamed: 0,chunk_id,content,summary_gpt-4,summary_gpt-4.5,summary_gemini-2,summary_llama-3
0,MSFT_EC_2Q25_0000,"MSFT_FY2Q25 (1).m4a\n\noperator assistance, pl...",Microsoft held an earnings call where Satya Na...,Microsoft reported strong financial performanc...,Microsoft's FY2Q25 earnings call discussed fin...,Here is a concise summary of the Microsoft fin...
1,MSFT_EC_2Q25_0001,"From now on, it's a more\ncontinuous cycle gov...",Microsoft is making significant progress acros...,Microsoft significantly expanded Azure data ce...,Microsoft is expanding data center capacity to...,Here is a concise summary of the Microsoft fin...
2,MSFT_EC_2Q25_0002,When you look at customers who purchased Copil...,Microsoft reported strong growth in its Copilo...,Microsoft reported robust financial and operat...,Microsoft is seeing strong customer adoption a...,Here is a concise summary of the Microsoft ear...
3,MSFT_EC_2Q25_0003,More professionals than ever are engaging in h...,"During the earnings call, Microsoft reported a...",Microsoft reported strong financial and operat...,Key Microsoft financial and operational highli...,Here is a concise summary of the Microsoft fin...
4,MSFT_EC_2Q25_0004,Microsoft Cloud gross margin percentage was 70...,"Microsoft Cloud gross margin was 70%, down 2 p...",Microsoft reported a gross margin percentage o...,"Microsoft's Cloud gross margin was 70%, decrea...",Here is a concise summary of the Microsoft fin...
5,MSFT_EC_2Q25_0005,Segment gross margin dollars increased 12% and...,Microsoft's segment gross margin dollars saw a...,Microsoft's quarterly financial and operationa...,Here's a concise summary of the Microsoft fina...,Here is a concise summary of the transcript se...
6,MSFT_EC_2Q25_0006,We expect consistent execution\nacross our cor...,Microsoft expects stable execution across core...,Microsoft provided forward financial guidance ...,Microsoft anticipates its Q3 financial perform...,Here is a concise summary of Microsoft's finan...
7,MSFT_EC_2Q25_0007,We expect Xbox content services revenue growth...,Microsoft's Xbox content services revenue grow...,Microsoft expects modest growth in Xbox conten...,Microsoft expects:\n\n* **Xbox content and s...,Here is a concise summary of the transcript se...
8,MSFT_EC_2Q25_0008,Please proceed. Thank you guys for taking the ...,Microsoft's recent quarter saw strong commerci...,Microsoft reported solid commercial bookings f...,"Microsoft's commercial bookings were solid, bu...",Here is a concise summary of the transcript se...
9,MSFT_EC_2Q25_0009,But what we're seeing is waiting to see just h...,Microsoft's AI growth rate is reportedly bette...,Microsoft's AI revenue significantly exceeded ...,Microsoft is seeing strong growth in its AI in...,Here is a concise summary of the section withi...


In [53]:
# Print the chunk summaries to read them better
for _, row in chunk_df.iterrows():
    print(f"==={row['chunk_id']}===\n{row[summary_field]}")

===MSFT_EC_2Q25_0000===
Microsoft held an earnings call where Satya Nadella, Chairman and CEO, noted that the company's cloud services surpassed $40 billion in revenue, a 21% year-over-year increase. This milestone was driven by enterprises scaling their use of AI, pushing the AI business to exceed an annual revenue run rate of $13 billion, a 175% year-over-year increase. Nadella also discussed the company's strategic approach to managing computational resources, emphasizing the need for balance in scaling systems for AI training and inference. The company has doubled its data center capacity in the last three years, ramping up capacity last year more significantly than in any previous year to meet increasing demand for its Azure cloud service. Efforts to optimize AI's efficiency are aimed at boosting demand. The call was attended by company officials including CFO Amy Hood and Alice Jolla, Chief Accounting Officer.
===MSFT_EC_2Q25_0001===
Microsoft is making significant progress acros

## Summarize the Summaries

In [70]:
report_en_path = os.path.join(data_root, reports_dir, f"{article_name}_report_en_{llm_model_name}.txt")

if refresh("final_summary") or not os.path.exists(report_en_path):
    final_summary_prompt = """
    You are to generate a report from {article_type}.
    Organize related points in sections, with the following formats:
    {sections}
    Report quantitatively if values are provided.
    ===
    {text}
    """

    chunk_summaries = "\n\n".join(list(chunk_df[summary_field]))

    result = llm.react(
        final_summary_prompt,
        arguments={
            "article_type": article_type,
            "text": chunk_summaries,
            "scope": scope,
            "sections": report_sections,
        },
        temperature=temperature,
    )
    report_en = result["content"]

    with open(report_en_path, "w", encoding="utf-8") as fd:
        fd.write(report_en)

else:
    with open(report_en_path, "r", encoding="utf-8") as fd:
        report_en = fd.read()

print(report_en)

1. Conclusion

Microsoft has produced a strong financial performance with significant increases in revenue from cloud services, AI business, and its commercial side. Microsoft Azure and M365 Copilot are cited as driving forces in the company's positive financial results. The company has also made headway in future growth areas, such as AI infrastructure scaling, the introduction of new product features, and fostering collaborations within and across enterprises. Microsoft expects continued growth while managing expenses and operational hurdles.

2. Key Points

- The company reported a 21% YoY increase in its cloud service revenue, surpassing $40 billion.
- Microsoft's AI business surpassed an annual revenue run rate of $13 billion, a 175% increase YoY.
- Microsoft reported strong growth in its Copilot software, with a 10x increase in usage in the last 18 months.
- LinkedIn revenues exceeded $2 billion for the first time with a nearly 50% subscriber growth over the past two years.
- Rev

In [71]:
report_zh_path = os.path.join(data_root, reports_dir, f"{article_name}_report_zh_{llm_model_name}.txt")

if refresh("final_summary") or not os.path.exists(report_en_path):
    translation_prompt = """
    Translate the following text in traditional Chinese.
    Do not translate technical or financial terminologies.
    ===
    {text}
    """

    result = llm.react(translation_prompt, arguments={
        "text": report_en,
    })
    report_zh = result["content"]

    with open(report_zh_path, "w", encoding="utf-8") as fd:
        fd.write(report_zh)

else:
    with open(report_zh_path, "r", encoding="utf-8") as fd:
        report_zh = fd.read()

print(report_zh)

1. 結論

微軟的財務表現強勁，其雲服務，AI業務以及商業端的營收都有顯著提升。微軟Azure和M365 Copilot被認為是該公司財務成果穩健的驅動力。該公司也在AI基礎設施擴展，新產品特性的引入以及加強企業內外的協作等未來增長領域取得了進展。微軟期待在管理開支和操作障礙的同時繼續增長。

2. 重點內容

- 該公司報告其雲服務收入同比增長21%，超過400億美元。
- 微軟的AI業務實現了年營收運行速度超過130億美元，同比增長175%。
- 微軟報告其Copilot軟件強勁成長，在過去18個月內使用量增加了10倍。
- LinkedIn 收入首次超過20億美元，過去兩年近50%的訂閱者增長。
- 季度收入為696億美元，同比增長12%。

3. 內容

3.1. 財務表現分析

- 微軟的雲服務營收較上年同期增加21%，超過400億美元。
- 其AI業務的年營收運行速度較上年同期增加175%，超過130億美元。
- 營業收入增加了17%。
- 總的季度收入報告為69.6億美元，比去年同期增加了12%。
- 商業訂單增加了67%的極其顯著的幅度。
- LinkedIn收入增加了9%，首次突破了20億美元的年收入。
- 微軟的Copilot軟件第一季度的使用強度增加了60%。

3.2. 未來展望

- 微軟專注於其AI基礎設施擴容，以提高軟件優化在推理上因AI擴展規模而帶來的成功率。
- 公司預期全財政年度的雙位數營收和營業收入增長。
- 預期營收增長包括生產力和業務流程 (11-12%）、微軟365商業雲 (14-15%）、LinkedIn (低至中單位數位) 和Dynamics 365 (中十位數位)。
- 微軟預期雲毛利率將在69%左右，因為其AI基礎設施擴容，使得同比有所下降。
- 公司預期第三季度的有效稅率將約為18%。

3.3. 營運亮點

- 雲平台Azure正在迅速擴展，該公司在過去三年內將其數據中心容量翻倍。
- AI Foundry在推出僅兩個月內就吸引了超過20萬的月活躍用戶。
- 微軟365 Copilot軟件的採用率快速升高，用戶數量季度環比增加一倍以上。
- LinkedIn收入首次突破20億美元，過去兩年訂閱用戶數量增長近50%。
- 微軟報告LinkedIn參與度急劇上升，評論的增長速度是其他帖子格式的兩倍。
- 微軟在C