## Advanced RAG - Data Ingestion Pipeline for PageRAG
### Page-wise Document Processing with Gemini Embeddings and Qdrant

**Learning Objectives:**
- Extract text from PDFs page by page
- Extract metadata from filename
- Store in Qdrant with rich metadata
- Use Gemini embeddings

**Use Cases:**
1. Financial Analysis: Process SEC filings (10-K, 10-Q)
2. Legal: Organize contracts and case documents
3. Research: Index academic papers
4. Enterprise: Searchable document repositories

![image.png](attachment:image.png)

### Setup and Configuration

In [None]:
from dotenv import load_dotenv
load_dotenv()

import hashlib
from pathlib import Path

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_qdrant import QdrantVectorStore, RetrievalMode, FastEmbedSparse
from langchain_core.documents import Document

In [None]:
# Configuration
MARKDOWN_DIR = "data/rag-markdown"
COLLECTION_NAME = "financial_docs"
EMBEDDING_MODEL = "models/gemini-embedding-001"

### Initialize Gemini Embeddings, BM25, and Qdrant

**Hybrid Retrieval**: Combines dense (semantic) and sparse (keyword) search for better results

In [None]:
# Dense embeddings (Gemini)
embeddings = GoogleGenerativeAIEmbeddings(model=EMBEDDING_MODEL)

# Sparse embeddings (BM25)
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Initialize vector store with hybrid retrieval
vector_store = QdrantVectorStore.from_documents(
    documents=[],
    embedding=embeddings,
    sparse_embedding=sparse_embeddings,
    collection_name=COLLECTION_NAME,
    url="http://localhost:6333",
    retrieval_mode=RetrievalMode.HYBRID,
    force_recreate=True
)

### Metadata Extraction from Filename

In [None]:
def extract_metadata_from_filename(filename: str) -> dict:
    """
    Extract metadata from filename.
    
    Expected format: {company} {doc_type} {quarter} {year}.pdf
    Examples:
    - amazon 10-k 2024.pdf
    - amazon 10-q q1 2024.pdf
    
    Returns:
        dict with company_name, doc_type, fiscal_year, fiscal_quarter
    """
    name = filename.replace('.pdf', '')
    parts = name.split()
    
    metadata = {}
    
    if len(parts) == 4:
        metadata['fiscal_quarter'] = parts[2]
        metadata['fiscal_year'] = int(parts[3])
    else:
        metadata['fiscal_quarter'] = None
        metadata['fiscal_year'] = int(parts[2])
    
    metadata['company_name'] = parts[0]
    metadata['doc_type'] = parts[1]
    
    return metadata

In [None]:
extract_metadata_from_filename('amazon 10-k 2023.pdf')

In [None]:
extract_metadata_from_filename('amazon 10-q q1 2024.pdf')

### Extract Text from PDF Pages

In [None]:
def extract_pdf_pages(pdf_path: str) -> List[str]:
    """
    Extract text from each page of PDF.
    
    Returns:
        List of page texts
    """
    converter = DocumentConverter()
    result = converter.convert(pdf_path)
    
    page_break = "<!-- page break -->"
    markdown_text = result.document.export_to_markdown(page_break_placeholder=page_break)
    
    pages = markdown_text.split(page_break)
    
    return pages

In [None]:
pages = extract_pdf_pages('data/rag-data/amazon/amazon 10-q q1 2024.pdf')
print(f"Total pages: {len(pages)}")

### Load Extracted Markdown Files

In [None]:
def extract_metadata_from_filename(filename: str) -> dict:
    """Extract metadata from markdown filename."""
    name = filename.replace('.md', '')
    parts = name.split()
    
    metadata = {}
    metadata['company_name'] = parts[0]
    metadata['doc_type'] = parts[1]
    
    if len(parts) == 4:
        metadata['fiscal_quarter'] = parts[2]
        metadata['fiscal_year'] = int(parts[3])
    else:
        metadata['fiscal_quarter'] = None
        metadata['fiscal_year'] = int(parts[2])
    
    return metadata

In [None]:
def compute_file_hash(file_path: str) -> str:
    """Compute SHA-256 hash of file content."""
    sha256_hash = hashlib.sha256()
    with open(file_path, 'rb') as f:
        for byte_block in iter(lambda: f.read(4096), b""):
            sha256_hash.update(byte_block)
    return sha256_hash.hexdigest()

### Track Processed Files

In [None]:
# Get already processed files from Qdrant
all_points = vector_store.client.scroll(
    collection_name=COLLECTION_NAME,
    limit=10000,
    with_payload=True
)

processed_hashes = set(
    point.payload.get('file_hash') 
    for point in all_points[0] 
    if point.payload.get('file_hash')
)

print(f"Already processed: {len(processed_hashes)} files")

### Document Ingestion Pipeline

In [None]:
def ingest_markdown_to_vectordb(md_path: Path):
    """Ingest markdown file into Qdrant vector store."""
    print(f"Processing: {md_path.name}")
    
    file_hash = compute_file_hash(md_path)
    if file_hash in processed_hashes:
        print(f"[SKIP] Already processed: {md_path.name}")
        return
    
    # Read markdown content
    with open(md_path, 'r', encoding='utf-8') as f:
        markdown_text = f.read()
    
    # Split by page breaks
    page_break = "<!-- page break -->"
    pages = markdown_text.split(page_break)
    
    # Get metadata from filename
    file_metadata = extract_metadata_from_filename(md_path.name)
    
    documents = []
    for page_num, page_text in enumerate(pages, start=1):
        if page_text.strip():  # Skip empty pages
            metadata = file_metadata.copy()
            metadata['page'] = page_num
            metadata['file_hash'] = file_hash
            metadata['source_file'] = md_path.name
            
            doc = Document(page_content=page_text.strip(), metadata=metadata)
            documents.append(doc)
    
    vector_store.add_documents(documents=documents)
    processed_hashes.add(file_hash)
    
    print(f"[DONE] Ingested {len(documents)} pages from {md_path.name}")

### Process All Markdown Files

In [None]:
# Find all markdown files
markdown_path = Path(MARKDOWN_DIR)
md_files = list(markdown_path.rglob("*.md"))
print(f"Found {len(md_files)} markdown files")

# Process each markdown file
for md_path in md_files:
    ingest_markdown_to_vectordb(md_path)

### Verify Ingestion

In [49]:
collection_info = vector_store.client.get_collection(COLLECTION_NAME)
print(f"Total documents in Qdrant: {collection_info.points_count}")

2025-12-12 16:20:43,717 - INFO - HTTP Request: GET http://localhost:6333/collections/financial_docs "HTTP/1.1 200 OK"


Total documents in Qdrant: 1266


In [50]:
# Hybrid search with RRF (Reciprocal Rank Fusion)
query = "What is amazon's cashflows?"
query = "What is amazon's Profit and Loss statement."
query = "asset base and earning"
results = vector_store.similarity_search(query, k=5)

2025-12-12 16:20:45,660 - INFO - HTTP Request: POST http://localhost:6333/collections/financial_docs/points/query "HTTP/1.1 200 OK"


In [51]:
results

[Document(metadata={'fiscal_quarter': 'q4', 'fiscal_year': 2023, 'company_name': 'apple', 'doc_type': '10-q', 'page': 10, 'file_hash': '60258a17806063663ac83b5fb5422d22bec255cfecbec842c5039e08992e6a46', 'source_file': 'apple 10-q q4 2023.pdf', '_id': '9eb69cea-cafa-4382-ac83-94516d4055e7', '_collection_name': 'financial_docs'}, page_content="\n\n## Note 3 - Earnings Per Share\n\nThe following table shows the computation of basic and diluted earnings per share for the three months ended December 30, 2023 and December 31, 2022 (net income in millions and shares in thousands):\n\n|                                           | Three Months Ended   | Three Months Ended   |\n|-------------------------------------------|----------------------|----------------------|\n|                                           | December 30, 2023    | December 31, 2022    |\n| Numerator:                                |                      |                      |\n| Net income                                

In [52]:
from IPython.display import display, Markdown

for res in results:
    display(Markdown(res.page_content))



## Note 3 - Earnings Per Share

The following table shows the computation of basic and diluted earnings per share for the three months ended December 30, 2023 and December 31, 2022 (net income in millions and shares in thousands):

|                                           | Three Months Ended   | Three Months Ended   |
|-------------------------------------------|----------------------|----------------------|
|                                           | December 30, 2023    | December 31, 2022    |
| Numerator:                                |                      |                      |
| Net income                                | $ 33,916             | $ 29,998             |
| Denominator:                              |                      |                      |
| Weighted-average basic shares outstanding | 15,509,763           | 15,892,723           |
| Effect of dilutive share-based awards     | 66,878               | 62,995               |
| Weighted-average diluted shares           | 15,576,641           | 15,955,718           |
| Basic earnings per share                  | $ 2.19               | $ 1.89               |
| Diluted earnings per share                | $ 2.18               | $ 1.88               |

Approximately  89  million  restricted  stock  units  ('RSUs')  were  excluded  from  the  computation  of  diluted  earnings  per  share  for  the  three  months  ended December 31, 2022 because their effect would have been antidilutive.

## Note 4 - Financial Instruments

## Cash, Cash Equivalents and Marketable Securities

The following tables show the Company's cash, cash equivalents and marketable securities by significant investment category as of December 30, 2023 and September 30, 2023 (in millions):

|                                           | December 30, 2023   | December 30, 2023   | December 30, 2023   | December 30, 2023   | December 30, 2023         | December 30, 2023             | December 30, 2023                 |
|-------------------------------------------|---------------------|---------------------|---------------------|---------------------|---------------------------|-------------------------------|-----------------------------------|
|                                           | Adjusted Cost       | Unrealized Gains    | Unrealized Losses   | Fair Value          | Cash and Cash Equivalents | Current Marketable Securities | Non-Current Marketable Securities |
| Cash                                      | $ 29,542            | $ -                 | $ -                 | $ 29,542            | $ 29,542                  | $ -                           | $ -                               |
| Level 1:                                  |                     |                     |                     |                     |                           |                               |                                   |
| Money market funds                        | 2,000               | -                   | -                   | 2,000               | 2,000                     | -                             | -                                 |
| Mutual funds                              | 448                 | 35                  | (11)                | 472                 | -                         | 472                           | -                                 |
| Subtotal                                  | 2,448               | 35                  | (11)                | 2,472               | 2,000                     | 472                           | -                                 |
| Level 2 : (1)                             |                     |                     |                     |                     |                           |                               |                                   |
| U.S. Treasury securities                  | 24,041              | 12                  | (920)               | 23,133              | 7,303                     | 4,858                         | 10,972                            |
| U.S. agency securities                    | 5,791               | -                   | (448)               | 5,343               | 243                       | 98                            | 5,002                             |
| Non-U.S. government securities            | 17,326              | 54                  | (675)               | 16,705              | -                         | 11,175                        | 5,530                             |
| Certificates of deposit and time deposits | 1,448               | -                   | -                   | 1,448               | 1,119                     | 329                           | -                                 |
| Commercial paper                          | 1,361               | -                   | -                   | 1,361               | 472                       | 889                           | -                                 |
| Corporate debt securities                 | 75,360              | 112                 | (3,964)             | 71,508              | 81                        | 13,909                        | 57,518                            |
| Municipal securities                      | 562                 | -                   | (14)                | 548                 | -                         | 185                           | 363                               |
| Mortgage- and asset-backed securities     | 22,369              | 53                  | (1,907)             | 20,515              | -                         | 425                           | 20,090                            |
| Subtotal                                  | 148,258             | 231                 | (7,928)             | 140,561             | 9,218                     | 31,868                        | 99,475                            |
| Total (2)                                 | $ 180,248           | $ 266               | $ (7,939)           | $ 172,575           | $ 40,760                  | $ 32,340                      | $ 99,475                          |





We recorded a provision for income taxes in 2023 as compared to an income tax benefit in 2022 primarily due to an increase in pretax income, a decrease in the tax impact of foreign earnings and losses driven by a decline in the favorable effects of corporate restructuring transactions, and an increase in tax shortfalls from stock-based compensation. This was partially offset by an increase in federal research and development credits, which included approximately $600 million of tax benefit recorded in 2023 related to a change in the estimated qualifying expenditures associated with our 2022 U.S. federal R&amp;D credit.

We intend to invest substantially all of our foreign subsidiary earnings, as well as our capital in our foreign subsidiaries, indefinitely outside of the U.S. in those jurisdictions in which we would incur significant, additional costs upon repatriation of such amounts.

Deferred income tax assets and liabilities are as follows (in millions):

|                                                                    | December 31,   | December 31,   |
|--------------------------------------------------------------------|----------------|----------------|
|                                                                    | 2022           | 2023           |
| Deferred tax assets (1):                                           |                |                |
| Loss carryforwards U.S. - Federal/States                           | 386            | 610            |
| Loss carryforwards - Foreign                                       | 2,831          | 2,796          |
| Accrued liabilities, reserves, and other expenses                  | 3,280          | 3,751          |
| Stock-based compensation                                           | 4,295          | 5,279          |
| Depreciation and amortization                                      | 1,009          | 1,114          |
| Operating lease liabilities                                        | 18,285         | 19,922         |
| Capitalized research and development                               | 6,824          | 14,800         |
| Other items                                                        | 1,023          | 745            |
| Tax credits                                                        | 950            | 1,582          |
| Total gross deferred tax assets                                    | 38,883         | 50,599         |
| Less valuation allowances (2)                                      | (4,374)        | (4,811)        |
| Deferred tax assets, net of valuation allowances                   | 34,509         | 45,788         |
| Deferred tax liabilities:                                          |                |                |
| Depreciation and amortization                                      | (9,039)        | (12,454)       |
| Operating lease assets                                             | (17,140)       | (18,648)       |
| Other items                                                        | (817)          | (1,489)        |
| Net deferred tax assets (liabilities), net of valuation allowances | $ 7,513        | $ 13,197       |

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

(1) Deferred tax assets are presented after tax effects and net of tax contingencies.

(2) Relates primarily to deferred tax assets that would only be realizable upon the generation of net income in certain foreign taxing jurisdictions or future capital gains, as well as tax credits.

Our valuation allowances primarily relate to foreign deferred tax assets, including substantially all of our foreign net operating loss carryforwards as of December 31, 2023. Our foreign net operating loss carryforwards for income tax purposes as of December 31, 2023 were approximately $10.2 billion before tax effects and certain of these amounts are subject to annual limitations under applicable tax law. If not utilized, a portion of these losses will begin to expire in 2024.

## Income Tax Contingencies

We are subject to income taxes in the U.S. (federal and state) and numerous foreign jurisdictions. Significant judgment is required in evaluating our tax positions and determining our provision for income taxes. During the ordinary course of business, there are many transactions and calculations for which the ultimate tax determination is uncertain. We establish reserves for tax-related uncertainties based on estimates of whether, and the extent to which, additional taxes will be due. These reserves are established when we believe that certain positions might be challenged despite our belief that our tax return positions are fully supportable. We adjust these reserves in light of changing facts and circumstances, such as the outcome of tax audits. The provision for income taxes includes the impact of reserve provisions and changes to reserves that are considered appropriate.





Net sales by groups of similar products and services, which also have similar economic characteristics, is as follows (in millions):

|                                 | Three Months Ended March 31,   | Three Months Ended March 31,   |
|---------------------------------|--------------------------------|--------------------------------|
|                                 | 2024                           | 2025                           |
| Net Sales:                      |                                |                                |
| Online stores (1)               | $ 54,670                       | $ 57,407                       |
| Physical stores (2)             | 5,202                          | 5,533                          |
| Third-party seller services (3) | 34,596                         | 36,512                         |
| Advertising services (4)        | 11,824                         | 13,921                         |
| Subscription services (5)       | 10,722                         | 11,715                         |
| AWS                             | 25,037                         | 29,267                         |
| Other (6)                       | 1,262                          | 1,312                          |
| Consolidated                    | $ 143,313                      | $ 155,667                      |

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

(1) Includes product sales and digital media content where we record revenue gross. We leverage our retail infrastructure to offer a wide selection of consumable and durable goods that includes media products available in both a physical and digital format, such as books, videos, games, music, and software. These product sales include digital products sold on a transactional basis. Digital media content subscriptions that provide unlimited viewing or usage rights are included in 'Subscription services.'

(2) Includes product sales where our customers physically select items in a store. Sales to customers who order goods online for delivery or pickup at our physical stores are included in 'Online stores.'

(3) Includes commissions and any related fulfillment and shipping fees, and other third-party seller services.

- (4) Includes sales of advertising services to sellers, vendors, publishers, authors, and others, through programs such as sponsored ads, display, and video advertising.
- (5) Includes annual and monthly fees associated with Amazon Prime memberships, as well as digital video, audiobook, digital music, e-book, and other nonAWS subscription services.
- (6) Includes sales related to various other offerings (such as shipping services, healthcare services, and certain licensing and distribution of video content) and our co-branded credit card agreements.

Total segment assets exclude corporate assets, such as cash and cash equivalents, marketable securities, other long-term investments, corporate facilities, goodwill and other acquired intangible assets, and tax assets. Technology infrastructure assets are allocated among the segments based on usage, with the majority allocated to the AWS segment. Total segment assets reconciled to consolidated amounts are as follows (in millions):

|                   | December 31, 2024   | March 31, 2025   |
|-------------------|---------------------|------------------|
| North America (1) | $ 210,120           | $ 210,198        |
| International (1) | 69,487              | 70,231           |
| AWS (2)           | 155,953             | 179,386          |
| Corporate         | 189,334             | 183,441          |
| Consolidated      | $ 624,894           | $ 643,256        |

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

(1) North America and International segment assets primarily consist of property and equipment, operating leases, inventory, accounts receivable, and digital video and music content.

(2) AWS segment assets primarily consist of property and equipment, accounts receivable, and operating leases.

Property and equipment, net by segment is as follows (in millions):

|               | December 31, 2024   | March 31, 2025   |
|---------------|---------------------|------------------|
| North America | $ 103,041           | $ 102,301        |
| International | 25,618              | 26,170           |
| AWS           | 110,683             | 130,919          |
| Corporate     | 13,323              | 13,391           |
| Consolidated  | $ 252,665           | $ 272,781        |





## Other Income (Expense), Net

Other income (expense), net, is as follows (in millions):

|                                                                        | Year Ended December 31,   | Year Ended December 31,   | Year Ended December 31,   |
|------------------------------------------------------------------------|---------------------------|---------------------------|---------------------------|
|                                                                        | 2022                      | 2023                      | 2024                      |
| Marketable equity securities valuation gains (losses)                  | $ (13,870)                | $ 984                     | $ (1,278)                 |
| Equity warrant valuation gains (losses)                                | (2,132)                   | 26                        | (192)                     |
| Upward adjustments relating to equity investments in private companies | 76                        | 40                        | 49                        |
| Foreign currency gains (losses)                                        | (340)                     | 65                        | (408)                     |
| Other, net                                                             | (540)                     | (177)                     | (421)                     |
| Total other income (expense), net                                      | $ (16,806)                | $ 938                     | $ (2,250)                 |

Included in other income (expense), net in 2022, 2023, and 2024 is a marketable equity securities valuation gain (loss) of $(12.7) billion, $797 million, and $(1.6) billion from our equity investment in Rivian Automotive, Inc. ('Rivian'). Our investment in Rivian's preferred stock was accounted for at cost, with adjustments for observable changes in prices or impairments, prior to Rivian's initial public offering in November 2021, which resulted in the conversion of our preferred stock to Class A common stock. As of December 31, 2024, we held 158 million shares of Rivian's Class A common stock, representing an approximate 14% ownership interest, and an approximate 13% voting interest. We determined that we have the ability to exercise significant influence over Rivian through our equity investment, our commercial arrangement for the purchase of electric vehicles and jointly-owned intellectual property, and one of our employees serving on Rivian's board of directors. We elected the fair value option to account for our equity investment in Rivian, which is included in 'Marketable securities' on our consolidated balance sheets, and had a fair value of $3.7 billion and $2.1 billion as of December 31, 2023 and December 31, 2024. The investment was subject to regulatory sales restrictions resulting in a discount for lack of marketability of approximately $800 million as of December 31, 2021, which expired in Q1 2022.

## Income Taxes

Income tax expense includes U.S. (federal and state) and foreign income taxes. Certain foreign subsidiary earnings and losses are subject to current U.S. taxation and the subsequent repatriation of those earnings is not subject to tax in the U.S. We intend to invest substantially all of our foreign subsidiary earnings, as well as our capital in our foreign subsidiaries, indefinitely outside of the U.S. in those jurisdictions in which we would incur significant, additional costs upon repatriation of such amounts.

Deferred income tax balances reflect the effects of temporary differences between the carrying amounts of assets and liabilities and their tax bases, as well as net operating loss and tax credit carryforwards, and are stated at enacted tax rates expected to be in effect when taxes are actually paid or recovered.

Deferred tax assets represent amounts available to reduce income taxes payable in future periods. Deferred tax assets are evaluated for future realization and reduced by a valuation allowance to the extent we believe they will not be realized. We consider many factors when assessing the likelihood of future realization of our deferred tax assets, including recent cumulative loss experience and expectations of future earnings, capital gains and investment in such jurisdiction, the carry-forward periods available to us for tax reporting purposes, and other relevant factors.

We utilize a two-step approach to recognizing and measuring uncertain income tax positions (income tax contingencies). The first step is to evaluate the tax position for recognition by determining if the weight of available evidence indicates it is more likely than not the position will be sustained on audit, including resolution of related appeals or litigation processes. The second step is to measure the tax benefit as the largest amount which is more than 50% likely of being realized upon ultimate settlement. We consider many factors when evaluating our tax positions and estimating our tax benefits, which may require periodic adjustments and which may not accurately forecast actual outcomes. We include interest and penalties related to our income tax contingencies in income tax expense.

## Fair Value of Financial Instruments

Fair value is defined as the price that would be received to sell an asset or paid to transfer a liability in an orderly transaction between market participants at the measurement date. To increase the comparability of fair value measures, the following hierarchy prioritizes the inputs to valuation methodologies used to measure fair value:

Level 1 - Valuations based on quoted prices for identical assets and liabilities in active markets.





Net sales by groups of similar products and services, which also have similar economic characteristics, is as follows (in millions):

|                                 | Three Months Ended June 30,   | Three Months Ended June 30,   | Six Months Ended June 30,   | Six Months Ended June 30,   |
|---------------------------------|-------------------------------|-------------------------------|-----------------------------|-----------------------------|
|                                 | 2024                          | 2025                          | 2024                        | 2025                        |
| Net Sales:                      |                               |                               |                             |                             |
| Online stores (1)               | $ 55,392                      | $ 61,485                      | $ 110,062                   | $ 118,892                   |
| Physical stores (2)             | 5,206                         | 5,595                         | 10,408                      | 11,128                      |
| Third-party seller services (3) | 36,201                        | 40,348                        | 70,797                      | 76,860                      |
| Advertising services (4)        | 12,771                        | 15,694                        | 24,595                      | 29,615                      |
| Subscription services (5)       | 10,866                        | 12,208                        | 21,588                      | 23,923                      |
| AWS                             | 26,281                        | 30,873                        | 51,318                      | 60,140                      |
| Other (6)                       | 1,260                         | 1,499                         | 2,522                       | 2,811                       |
| Consolidated                    | $ 147,977                     | $ 167,702                     | $ 291,290                   | $ 323,369                   |

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

(1) Includes product sales and digital media content where we record revenue gross. We leverage our retail infrastructure to offer a wide selection of consumable and durable goods that includes media products available in both a physical and digital format, such as books, videos, games, music, and software. These product sales include digital products sold on a transactional basis. Digital media content subscriptions that provide unlimited viewing or usage rights are included in 'Subscription services.'

(2) Includes product sales where our customers physically select items in a store. Sales to customers who order goods online for delivery or pickup at our physical stores are included in 'Online stores.'

(3) Includes commissions and any related fulfillment and shipping fees, and other third-party seller services.

- (4) Includes sales of advertising services to sellers, vendors, publishers, authors, and others, through programs such as sponsored ads, display, and video advertising.
- (5) Includes annual and monthly fees associated with Amazon Prime memberships, as well as digital video, audiobook, digital music, e-book, and other nonAWS subscription services.
- (6) Includes sales related to various other offerings (such as shipping services, healthcare services, and certain licensing and distribution of video content) and our co-branded credit card agreements.

Total segment assets exclude corporate assets, such as cash and cash equivalents, marketable securities, other long-term investments, corporate facilities, goodwill and other acquired intangible assets, and tax assets. Technology infrastructure assets, which are included in property and equipment, net, net additions, and the depreciation and amortization expense on these assets, are allocated among the segments based on usage, with the majority allocated to the AWS segment. Usage of technology infrastructure assets by the North America and International segments, and the related allocation of total net additions, can fluctuate on a quarter-to-quarter basis, and is affected by seasonality, peak periods, new product or service offerings, and other factors.

Total segment assets reconciled to consolidated amounts are as follows (in millions):

|                   | December 31, 2024   | June 30, 2025   |
|-------------------|---------------------|-----------------|
| North America (1) | $ 210,120           | $ 224,304       |
| International (1) | 69,487              | 78,096          |
| AWS (2)           | 155,953             | 194,295         |
| Corporate         | 189,334             | 185,475         |
| Consolidated      | $ 624,894           | $ 682,170       |

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

(1) North America and International segment assets primarily consist of property and equipment, operating leases, inventory, accounts receivable, and digital video and music content.

(2) AWS segment assets primarily consist of property and equipment, accounts receivable, and operating leases.

